Session: Primary Biological Databases
Quality of services of the primary nucleotide sequences
databases
Hideaki Sugawara, (tree_of_life@leaf.ocn.ne.jp), Center
for Information Biology and DNA Data Bank of Japan, National Institute of
Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan
Biological databases have expanded in harmony with
the progress of biology, especially thanks to the development of precise and
extensive experimental technologies. “The Molecular Biology Database
Collection: 2006 update” (Galperin 2006) introduces 858
databases in the wide range of categories from nucleotide sequences
to immunology. The number is
139 more than the one in 2005. Among them, the International Nucleotide
Sequence databases (INSD, http://www.insdc.org/)
has been one of the successful primary databases for 20 years. The success so
far is due to the understanding and support by academia, industries and
governments. However, the INSD of DDBJ, EMBL and GenBank has to be well
prepared for the increase of quality and variety of data. The data increased
10Kbp oin 1987 to 100,000,000Kbp in 2006 and keep increasing day by day. The
major data in 1987 were sequences of genes cloned and the more than half INSD
now are from projects of whole genome shotugun. Therefore, INSD has to enlarge
computer resources, improve application programs and foster experts to maintain
the quality of services. As an effort for the quality assurance, we at DDBJ
have carried out a project named Gene Trek in Procaryote Space (GTPS)to apply a
common protocol to all the bacterial genome sequences in the public domain by
use of GRID, a large scale PC cluster and expert annotators. The GTPS database
is available at http://gtps.ddbj.nig.ac.jp/.
Reference: Galperin,
Michael Y. (2006) Nucleic Acids Research,
Vol. 34, Database issue D3-D5
Keywords: nucleotide
sequence, database, prokaryote, genome, annotation