BACKGROUND: An international community of researchers has generated a significant number of Expressed Sequence Tags (ESTs) for the Rosaceae, an economically important plant family that includes most temperate fruits such as apple, cherry, peach, and strawberry as well as other commercially valuable members. ESTs are fragments of expressed genes that can be used for gene discovery, developing markers for mapping and cultivar improvement via marker assisted selection. Efficient dissemination and integration of this data is best facilitated through a centralized and curated database with associated sequence analysis tools.

DESCRIPTION: The Genome Database for Rosaceae (GDR) was initiated to provide a curated and integrated web-based relational database for this family. I developed a key component of GDR to assemble and annotate the publicly available ESTs from the four main genera of the family (Prunus, Malus, Fragaria, Rosa). I created both genera and family level unigenes using the software CAP3 after extensive filtering, trimming and assembly. Further analysis includes marker mining for single nucleotide polymorphisms (SNPs) and simple sequence repeast (SSRs) with putative primer identification, and oligo identification for potential microarray development. Functional genomics efforts are supported with sequence similarity searching against major protein and nucleotide databases, gene product ontology assignment, and protein motif identification. I deployed the entire project on the GDR with all data available for browsing, searching, and downloading.

CONCLUSIONS: The GDR and its associated EST unigene project are meeting a major need for timely annotation and curation of sequence data for the Rosaceae community. The results of my analysis highlight major genes and pathways of interest including ripening, disease resistance, and transcription factors. The easily accessible pool of annotated coding sequences should further both functional and structural genomics characterization in Rosaceae. The unigene elucidates the levels of sequence similarity shared across different plant species and the implications for resource sharing across the family. GDR can be accessed at



