20^th International CODATA Conference

Session: Primary Biological Databases

UniProt: the Universal Protein Resource

Claire O’Donovan (odonovan@ebi.ac.uk), European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Cambridge, UK;

The ability to store and interconnect all available information on proteins is crucial to modern biological research. Accordingly, the Universal Protein Resource (UniProt) plays an ever more important role by providing a stable, comprehensive, high-quality freely accessible central resource on protein sequences and functional annotation.

UniProt is produced by the UniProt consortium, formed by European Bioinformatics Institute (EBI), Georgetown University Protein Information Resource (PIR) and Swiss Institute of Bioinformatics (SIB). The core activities of UniProt include manual curation of protein sequences assisted by automated annotation, sequence archiving, development of a user-friendly UniProt web site, and providing additional value-added information on proteins through cross-references to other databases. UniProt comprises three database components, each of which addressesa key need in protein bioinformatics. The UniProt Knowledgebase(UniProtKB) provides protein sequences with extensive annotationand cross-references. The UniProt Archive (UniParc) is the mainsequence storehouse. The UniProt Reference Clusters (UniRef)condense sequence information and annotation to facilitate bothsequence similarity searches and analyses of the results.

Keywords: Protein, amino acid sequence, database, annotation

20th International CODATA Conference

UniProt: the Universal Protein Resource

20^th International CODATA Conference