Integration of Bio-Chemical data

 

Predicting Genetic Regulatory Response Using Classification.

In this joint work with Christina Leslie, Chris Wiggins and theoir students we used a boosting-based classification method, called Alternating Decision Trees, to combine gene regulation data obtained using micro-arrays with information about the binding sites associated with each gene. The resulting model is a classification rule that predicts the state of each gene based on the state of the regulatory genes and the motifs that appear in the regulatory sequence downstream from the gene.


The Medusa software is available from Columbia

Identifying metabolic enzymes with multiple types of association evidence

In this joint work with Dennis Vitkup, George Church and his students we have used boosting to identify “orphan” genes in metabolic networks. The goal of the method is to rank-sort all genes in the organism according to the likelihood that the gene corresponds to a particular unidentified metabolic enzyme. The method combines a diverse set of databases to generate the ranking. As the evidence given by each database is weak, combining them using boosting is a natural approach that proves to be effective.