Performance Analysis of a Gene Family Classification Method for Plant Kinome Identification

Student: Joseph Emerson
Faculty Mentor: George Popescu (Mississippi State University Institute for Genomics, Biocomputing, and Biotechnology)

Different genes often share similar functions. It is useful to categorize genes in such a way that these similarities are made apparent. One way of doing so is to analyze protein sequences to determine structural and evolutionary relationships between genes. We examine the performance of a widely-used computational tool used for gene classification and implement this tool in an automated workflow, which is used to classify the genes of seven different plant species into functional groupings.


Gene families are defined by structural and evolutionary relationships and often display a common evolutionarily conserved function among family members. These qualities make this system of classification useful in many areas of functional and evolutionary genomics. Currently, plant gene family classification relies on the construction of hidden Markov models constructed from model organisms such as Arabidopsis thaliana or using curated datasets to search for gene homology. There is a lack of standardized methods for determining gene families which often leads to discrepancies in gene classification. We propose a new measure integrating homology identification, motif conservation, phylo-genomic, and integrated expression analyses to define gene families. The process requires minimal manual curation of datasets. An analysis of the MAP3K gene family from seven different plant species, five previously examined and two unexamined, was performed using this process. Results showed that our method outperformed other recent efforts for the identification of gene families in these species. Furthermore, the analysis provided new insights into the evolutionary development and function of the MAP3K gene family in the species examined.