Research

     My past projects focused on studying protein-protein interactions, gene expression and related algorithms. However, I am interested in the analysis of genome sequences and protein structures as well. Dr. Przytycka and I currently study how codon usage bias affects various kinds of translation errors in major model organisms. Following available biological models for translation errors, we are able to compute the probability of events necessary for the errors to occur. The relationship between codon usage bias and the probability of errors should give us some useful hint.

   Protein-Protein Interactions (PPI)

      Protein-protein interaction (PPI) data record how proteins interact with each other at a certain time point, which describes a static state of the system. It is usually hard to extract interesting dynamic information out of PPI data. However, Farach-Colton,Woolford and I successfully mined temporal relations in molecular pathways by studying PPI data. In particular, we modelled protein complex assembly pathways by interval graphs, and determined the temporal order of joining the pathway of proteins by ordering the vertices in the PPI graph. We developed a tool XRONOS and then applied it to a ribosome assembly pathway.  Our work on mining temporal relations from PPI data and modelling pathways by interval graphs is the first of its kind [1].

      As more and more evidence shows that protein complexes and biological processes can be conserved across widely divergent organisms, it becomes increasingly important to identify the conserved regions in biological networks, just as we identify conserved DNA or protein sequences across species. Farach-Colton and I studied such evolutionary conservation by a comparative analysis of PPI graphs. We introduced a graph theoretic model for the conserved PPI graphs based on an evolutionary process. We developed an algorithm for identifying graph similarity, which combines sequence similarity scores and confidence scores for PPI's to search for the conserved PPI sub graphs. We contributed the first approach for detecting the conserved complex or biological process in one organism when given the PPI graph of the corresponding complex or process in another organism [2].

      The number of interacting protein pairs is always very small compared to the number of all possible protein pairs. It will be quite helpful to provide a small set of candidates for interacting protein pairs and then test them in a wet laboratory setting. My collaborators and I developed a classifer to predict PPI's. Given a pair of proteins, we selected pairs of structural domains and function categories of two proteins, one domain or category from each protein, as their features. Then we computed a ratio score for each feature according to its distribution in the whole space. In addition, we developed a novel way of choosing non-interacting protein pairs as negative training examples. At that time, the resulting SVM was one of the most accurate classifers for PPI prediction [3].

  Lattice Construction Algorithm 

       A Galois lattice, often graphically represented by a Hasse diagram, consists of a set of concepts that are partially ordered. Since the size of a Galois lattice can be exponential in the input size, a faster algorithm for constructing Galois lattices is essential for applications using these lattices. People often measure lattice construction
algorithms by time delay complexity, which is the time between outputting two consecutive concepts. Currently the best time delay complexity for lattice construction algorithms is O(|G||M|^2), where G is the input object set and M is the input attribute set. The algorithm that Farach-Colton and I are developing improves it to O(|G||M|) [4].

   Gene-Expression Analysis

      One of the limitations of clustering of gene-expression profiles, is that it is not easy to integrate biological information with clustering. To overcome this, Choi, Laubenbacher et al. and I proposed a new method, which is based on organizing a set of genes by a Galois lattice. Unlike a gene regulatory network, each vertex in a Hasse diagram, representing a concept in the Galois lattice, corresponds to a subset of genes. They are grouped together according to their expression values and biological information related to their functions. The set of concepts and the lattice structure can reflect biological relationships in the gene expression data set. We applied our method to the microarray data derived from influenza infected mouse lung tissues and healthy controls and were able to successfully distinguish different biological stimuli. We were among the first to observe that similarities and differences between gene expression profiles can be investigated by comparing their corresponding Galois lattices according to various graph measures [5].

    It is of great interest in clinical study to identify different disease subclasses and drug responses of patients, each of which can be thought to correspond to a unique cell class. Given a set of gene-expression matrices obtained from patients, people want to identify those cell classes. Farach-Colton and I developed a tool called LABSTER to attack the problem by clustering temporal gene-expression matrices, where each row corresponds to a gene and each column corresponds to a time point. We created a Galois lattice for each matrix, which yielded a natural distance function between gene-expression matrices. Finally, we clustered the matrices based on these distances. A key advantage of our method was that it effectively handles missing values. The problem of clustering gene-expression matrices has not been addressed in the extant literature [6].

กก

Reference

[1] Martin Farach-Colton, Yang Huang, and John Woolford. Discovering temporal relations in molecular pathways using protein-protein interactions. In Proceedings of 10th International Conference on Research in Computational Molecular Biology, p150-156, 2004.
[2] Yang Huang and Martin Farach-Colton. Identifying conserved structures in protein-protein interaction graphs by comparative analysis. under preparation.
[3] Yang Huang, Dimitrij Frishman, and Ilya Muchnik. Predicting protein-protein interactions by a supervised learning classifier. Computational Biology and Chemistry, vol. 28, p291-301, 2004.
[4] Martin Farach-Colton and Yang Huang. A linear delay algorithm for constructing galois latticeis. under preparation, part of work presented at SIAM Conference on Discrete Mathematics 2006.
[5] Vicky Choi, Yang Huang, Vy Lam, Dustin Potter, Reinhard Laubenbacher, and Karen Duca. Using formal concept analysis for microarray data comparison. In 5th Asia Pacific Bioinformatics Conference, 2006.
[6] Yang Huang and Martin Farach-Colton. Lattice based clustering of temporal gene-expression matrices. In 7th SIAM International Conference on Data Mining, 2006.


Back to Home

Note that three pictures appearing on this webpage are from Internet.

PPI: http://proteome.wayne.edu/images/IGMap-tiny.jpg
Galois lattice: http://www.cmu.edu/joss/content/articles/volume1/images/fig22.gif
Gene expression: http://corbinlab.georgetown.edu/Images/Patterning_large.jpg