20^th International CODATA Conference

Session: Primary Biological Databases

EMBL Nucleotide Sequence Database

Guy Cochrane , (cochrane@ebi.ac.uk), EMBL Outstation Hinxton, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K.

The EMBL Nucleotide Sequence Database sets out to provide an archive of primary nucleotide sequence and annotation. Representing some 74 million sequence entries from across the living world collected over more than a quarter of a century, the database continues to grow at an exponential rate. Along with collaborators DDBJ and GenBank, the EMBL Nucleotide Sequence Database aims for comprehensive coverage of all publicly available sequence. On a nightly basis, the three collaborating databases exchange data, such that archived sequence and annotation are available through search and retrieval tools at all three sites.

Data are recruited from submitters through variety of routes, tailored to the needs of the submitters and their data. In-house curators work with submitters to strive for consistent use of annotation structures across the whole body of data.

Presentation of EMBL Nucleotide Sequence Database data at the EBI includes the provision of entry retrieval tools, whole database releases for download, sequence homology search tools and the Sequence Retrieval System, SRS, for building complex searches by specific field. Furthermore, nucleotide sequence data are integrated, through cross-referencing, with a host of other bioinformatics resources at the EBI and beyond.

In the talk, I will introduce the database, highlight a number of recent developments and discuss approaches to dealing with the ever increasing volume and diversity of data.

Keywords: nucleotide sequence, database, annotation, bioinformatics tool.

20th International CODATA Conference

20^th International CODATA Conference