Project Details
Description
The tasks accomplished by this project are divided into two phases corresponding to the two tasks of gene prediction and functional annotation. In the first phase, a gene-finding system is developed and applied. This system is designed to be scalable and extensible with respect to the gene features it models, the machine learning algorithms it employs, and the range of experimental data from which it learns. This project first validates the system by applying it to the complete C.elegans genome, and then retrains the system for the more difficult task of recognizing genes in human DNA. The second phase of this project consists of two parts. First, the software framework used for the gene finding system from phase one is generalized to model families of related proteins. Second, in order to learn from non-sequential data, the project develops functional classification techniques using a discriminative learning method called support vector machines (SVM's). The statistics calculated by the sequence-based modeling system functions as one set of features used by the SVM system. Additional features will come from DNA microarray experiments, the upstream promoter region of each gene, phylogenetic profiles and similarity scores to known protein families.
Status | Finished |
---|---|
Effective start/end date | 8/15/00 → 10/31/02 |
Funding
- National Science Foundation: US$412,195.00
ASJC Scopus Subject Areas
- Artificial Intelligence
- Biochemistry, Genetics and Molecular Biology(all)