CODATA logo
CODATA 2002: Frontiers of
Scientific and Technical Data

Montréal, Canada — 29 September - 3 October
 

Biological Science Data Abstracts

Proceedings
Table of Contents

Keynote Speakers

Invited Cross-Cutting Themes

CODATA 2015

Physical Science Data

Biological Science Data

Earth and Environmental Data

Medical and Health Data

Behavioral and Social Science Data

Informatics and Technology

Data Science

Data Policy

Technical Demonstrations

Large Data Projects

Poster Sessions

Public Lectures

Program at a Glance

Detailed Program

List of Participants
[PDF File]

(To view PDF files, you must have Adobe Acrobat Reader.
)

Conference Sponsors

About the CODATA 2002 Conference

 


Track I-C-2:
Integrated Science for Environmental Decision-making: The Challenge for Biodiversity and Ecosystems Informatics

Chairs: Gladys Cotter, U.S. Geological Survey, USA and
Bonnie Carroll, Information International Associates, USA

Introductory Context: From Local to Global: We will layout the intent and overview of the session, which is to explore the issues of turning data into a viable resource for decision-making through the development of biodiversity information infrastructures and systems. Particular emphasis will be placed on issues of obtaining, managing, accessing and using data that cross differing spatial and temporal scales. Challenges of integrating current electronic monitoring data with legacy data such as museum specimens for historical context will be addressed.

 


1. Building the US National Biological Information Infrastructure: Synergy between Regional and National Initiatives
John (Jack) Hill, Houston Advanced Research Center, USA

Information concerning biodiversity and ecosystems is critical to a wide range of scientific, educational, and government uses. However, the majority of this information is not easily accessible. In 1993, the National Research Council (NRC) published a report entitled "A Biological Survey for the Nation." The report recommended that the U.S. Department of the Interior oversee the development of a National Biotic Resource Information System. The resulting system should: 1) be a distributed federation of databases designed to make existing information more accessible, 2) develop new ways to collect and distributed data and information, as well as lead in promoting data standards, 3) support continuing state efforts to develop regional and statewide environmental databases, particularly with museums, universities and similar organizations, and 4) participate in interagency initiatives to coordinate the collection and management of biodiversity data by the federal government.

In 1994, the U.S. President signed Executive Order 12906, "Coordinating Geographic Data Acquisition and Access: the National Spatial Data Infrastructure (NSDI)." The NSDI deals with the acquisition, processing, storage, and distribution of geospatial data, and is implemented by the Federal Geographic Data Committee (FGDC). At the same time, the national biotic resource information system became the NBII (web page - http://www.nbii.gov). The NBII is implemented through the auspices of the U.S. Geological Survey (USGS). The NBII works with the FGDC to increase access and dissemination of biological geospatial data through the NBII and the NSDI. The NBII biological metadata standard, is an approved "profile" or extension of the FGDC's geospatial metadata standard.

In 1998, the Biodiversity and Ecosystems Panel of the President' s Committee of Advisors on Science And Technology (PCAST) released the report titled "Teaming With Life: Investing in Science to Understand and Use America's Living Capital". The PCAST report recommended that the federal government develop the "next generation NBII" or NBII-2. This would be accomplished through a system of nodes (interconnected entry points to the NBII). In 2001, the U.S. Congress allocated the funds for the development and promotion of the node based NBII-2.

Development and implementation of the NBII nodes is underway and is being conducted in collaboration with every sector of society. There are three types of nodes. "Regional" nodes have a geographic area of responsibility and represent a regional approach to local data, environmental issues, and data collectors. Twelve (12) regional nodes are required to cover the entire U.S. "Thematic" nodes focus on a particular biological issue (i.e., bird conservation, fisheries and aquatic resources, invasive species, urban biodiversity, wildlife disease/human health, etc.). Such issues cross regional, national, and even international boundaries. "Infrastructure" nodes are focused on issues such as the creation, adoption, and implementation of standards through the development of common tool suites, hardware and software protocols, and geospatial technologies to achieve interoperability and transparent retrieval across the entire NBII network.

This presentation will highlight NBII development, implementation, lessons learned, and successful user applications of two regional nodes, the Southern Appalachian Information Node (SAIN) and the Central Southwest/Gulf Coast Node (CSGCN). Specific NBII applications will include multiple country-, regional-, county-, and local- (site specific) level biological, environmental, and natural resource management issues.

 

2. Building a Biodiversity Information Network in India — Biodiversity Informatics and Developing World: Status and Potentials
Vishwas Chavan and S. Rajan, National Chemical Laboratory, India

The most of the striking feature of Earth is the existence of life, and the most striking feature of life is its diversity. Biodiversity, and the ecosystems that support it, contribute trillions of dollars to national and global economies. The basis of all efforts to effectively conserve biodiversity and natural ecosystems lies in efficient access to knowledgebase on biodiversity and ecosystems resources and processes. Most of the developed countries are well ahead in the race to take advantage of new electronic information opportunities to manage and build their biodiversity knowledge bases, the recognized cornerstone for their future economic, social and environmental well being.

For developing nations, which harbors rich and diversified natural resources, much of the biodiversity information is neither available nor accessible. Hence there is a need for organized, well-resourced, national approach to build and manage biodiversity information through collaborative efforts by this group of Third World Nations.

This paper reviews the state of information technology applications in the field of biodiversity informatics in these nations, with India as model nation. India is one of the 12 mea-biodiversity countries bestowed with rich floral and faunal diversity. With its deteriorating status of natural resources and developmental activities, India is one of the best model nation for such a review. Attempts made by the author's group to develop and implement cost-efficient, easy-to-use tools for biological data management are described in brief. Feasibility of employing available tools, techniques and standards for biological data acquisition, organization, analysis, modeling and forecasting has been discussed keeping in view the informatics awareness amongst the biologists and ecologists as well as planners. With specific reference to Indian biodiversity, authors suggest the framework to build national information infrastructure to correlate, analyze and communicate biological information to help these nations to generate sustainable wealth from nature.

 

3. Developing and Integrating Data Resources from a North American Perspective
Jorge Soberon, CONABIO, Mexico

Biodiversity Information denotes a very heterogenous set of data formats, updating regimes, quality, and users. The data in the labels of biological specimens provide a natural organizing framework because the georeference and the taxonomic name can be used to link to geographically organized data (remote sensing, cartography) and to a variety of points of view (ecological or genetical data, legislation, traffic, etc.). Label data, however is widely distributed over hundreds of institutions. In this talk, we describe the technical and organizational problems that were solved to create REMIB (the World Network of Biodiversity Information), that links nearly 5 million specimens from 61 collections of 16 institutions in three countries. We also give one example of the use that such system may have.



4. Ecological Informatics: a Long-Term Ecological Research Perspective
William Michener, Long Term Ecological Research Program, US
A

Scientists within the Long-Term Ecological Research (LTER) Network have provided leadership in ecological informatics since the inception of LTER in 1980. The success of LTER, where research projects span wide temporal and spatial scales, depends on the quality and longevity of the data collected. Scientists have devised data collection, data entry, data access, QA/QC and archiving strategies for ensuring that high quality data are appropriately managed to meet the needs of a broad user base for decades to come. The LTER cross-site Network Information System (NIS) is being developed to foster data sharing and collaboration among sites. Recent and important milestones for LTER include adoption of Ecological Metadata Language as a standard as well as supporting metadata software. Current and future foci include developing data standardization protocols and semantic mediation engines, both of which will facilitate LTER modeling efforts.


5. The Global Biodiversity Information Facility (GBIF) — Challenges and Opportunities from a Global Perspective
Guy Baillargeon, Agriculture and Agri-Food Canada

The Global Biodiversity Information Facility (GBIF) is a new international scientific cooperative project based on an agreement between countries, economies, and international organizations. The primary goal of GBIF is to establish an interoperable, distributed network of databases containing scientific biodiversity information in order to make the world's scientific biodiversity data freely available to all. GBIF will play a crucial role in promoting the standardization, digitization and global dissemination of the world's scientific biodiversity data within an appropriate framework for property rights and due attribution. Initially, GBIF will focus on species and specimen level data in 4 priority areas: data access and data interoperability; digitization of natural history collection data, electronic catalogue of names of known organisms; outreach and capacity building. With an expected staff of only 14, GBIF will work mostly with others in order to catalyse synergistic activities between participants, generate new investments and eliminate barriers to cooperation. In its first year of activity, GBIF has been concentrating on organisational logistics, staffing, and consultations with Scientific and Technical Advisory Groups (STAGs). Initial work plans are being drafted by the Science committee and its 4 subcommittees. Once functional, GBIF will allow to unlock and liberate vast amounts of biodiversity occurrence data for use in research and environmental decision-making. Life itself, in all its diversity (from molecules, to species, to ecosystems) will provide numerous new additional sets of data layers for integrated environmental analysis, modelling and forecasting.

 


Track III-C-2:
Proteome Database

Chair: Akira Tsugita, Proteomics Research Laboratory, Tsukuba, Japan

Proteomics research is growing broadly and exponentially. Such research includes: extraction of protein mixture from cells and tissues, separation and isolation of the proteins (by 2-DE, HPLC etc.), and identification of the protein (by terminal sequence, in-gel digestion-MALDI-TOF-MS, Capillary-LC/ESI-MS-MS, etc). This research has goals such as: 1) Establishment of a protein catalogue, a complete list of all distinct proteins which include post-translational modification and multiple spliced variant and cleavage products. This information corresponds to genome information; 2) Correlation to protein/protein interaction; 3) Correlation to protein/nucleic acid interaction; 4) Establishment of structure/active motif information; 5) Tissue-specific protein expression; 6) Age-specific protein expression; and 7) Intra-cellular protein expression.
The proteome is now applied pharmacology and medicine.

Recently, the international HUPO (human proteome organisation) was established and extremely active research has been carried out. While the genome sequence is uni-dimensional and finite, the proteome information is multi-dimensional with quasi-infinite dimensions. The proteome is dynamic and constantly changing in response to various environmental factors and signals. This session is devoted to the evaluation, compilation, and dissemination of such proteome data.

1. A Proteomic Approach to the Study of Cancer
Julio E Celis, Institute of Cancer Biology, Danish Cancer Society and Danish Centre for Human Genome Research, Denmark

During the past 20 years, high resolution two dimensional polyacrylamide gel electrophoresis (2D PAGE) has been the technique of choice for analysing the protein composition of cell types, tissues and fluids, as well as for studying changes in protein expression profiles elicited by various effectors. The technique, which was originally described by O'Farrell and Klose, separates proteins both in terms of their isoelectric point (pI) and molecular weight. Usually, one chooses a condition of interest and lets the cell reveal the global protein behavioral response as all detected proteins can be analyzed both qualitatively (post translational modifications) and quantitatively (relative abundance, corregulated proteins) in relation to each other [http://biobase.dk/cgi bin/celis]. Presently, high resolution 2D PAGE provides the highest resolution for protein analysis and is a key technique in proteomics, an emerging area of research of the post-genomic era that deals with the global analysis of gene expression using a plethora of technology to resolve (2D PAGE), identify (mass spectrometry, Western immunoblotting, etc.), quantitate and characterize proteins, identify interacting partners as well as to store (comprehensive 2D PAGE databases), communicate and interlink protein and DNA mapping and sequence information from ongoing genome projects. Proteomics, together with genomics, cDNA arrays, phage antibody libraries and transgenic models belong to the armamentarium of technology comprising functional genomics. Here I will report on our efforts to apply proteomic technologies to the study of bladder cancer.

 

2. A Proposition of XML Format for Proteomics Database
Kenichi Kamijo, T. Yamazaki and A. Tsugita, Proteomics Reseach Center, NEC Corporation, Japan

The rationale and advantages of using XML: The exchange of proteome analysis data including sample preparation and experimental conditions in detail is very useful and important in order to enhance studies in proteomics. Standardized format for data exchange will accelerate collaboration among proteomics researchers.

XML (Extensible Markup Language) has been developed to be one of the most suitable and powerful language for describing complex data on the web. XML has the following features:

  • Exchangeable; XML is becoming a world wide standard so that a variety of XML tools are available, for example, powerful XML parser/processor, database tools and so on. XML documents are easy to convert to other formats. XML specification is supervised by W3C (World Wide Web Consortium).
  • Extensible; Tree structures permit flexible modification. Adding/modifying/removing a branch do not affect data in the other branches.
  • Human readable; XML is text based so that tag names can be human-readable, for example, <sample preparation>, <gel image> and so on, which makes it easier to develop XML applications.
  • Machine readable; It is easy to find target information by using 'Tag' element structure. It is also possible to find more specific information with 'Attribute' keywords step by step.

HUP-ML and HUP-ML Editor: We have proposed an XML format, HUP-ML(Human Proteome Markup Language), for proteomics database and have developed a HUP-ML Editor with which researchers can easily make HUP-ML documents. It would accelerate collaboration among proteomics researchers if a platform exchanging these data is developed on the internet.
Our concept of HUP-ML is proteome-analysis- oriented. The structure of HUP-ML models a proteome analysis protocol as well as protein identification information and the sequence information. Current HUP-ML incorporates a DTD (document type definition) for 2-dimensional electrophoresis experiments. HUP-ML and HUP-ML Editor are distributed for free at JHUPO web site (http://www.jhupo.org/) and will be updated with extension to other experiment techniques (for example, LC) and with incorporation of XML Schema, so as to be complementary with other XML formats for proteomics.

By using our XML-based model for proteomics, we have also developed web-based prototype system which consists of XML database, agent, security and graphical user interface (GUI).
Download of HUP-ML editor (version 0.43 beta): http://www.jhupo.org/

 

3. Proteomics : An Important Post-genomic Tool for Understanding Gene Function
Richard J. Simpson, L. M. Connolly, D. F. Frecklington, H. Ji, G. E. Reid, M. J. Layton, and R. L. Moritz, Joint ProteomicS Laboratory (JPSL), Ludwig Institute for Cancer Research and Walter & Eliza Hall Institute for Medical Research, Melbourne, Australia

If DNA is the blueprint to build the complex machine that is a human, then proteins are the parts of the machine that make it work. With the completion of the first draft of the DNA sequence that makes up the human genome, the challenge facing medical research now is to understand gene function. Proteomics provides a biological tool, or assay, for elucidating gene function.

While the term proteomics is often synonymous with high-throughput protein profiling of normal versus diseased tissue by 2-D gel analysis, this definition is very limiting. Increasingly, the power of proteomics is being recognized for its ability to unravel intricate protein-protein interactions associated with intracellular protein trafficking and signaling pathways (i.e., cell-mapping proteomics). The technology issues associated with expression proteomics (the study of global changes in protein expression) and cell-mapping proteomics (the systematic study of protein-protein interactions through the isolation of protein complexes) are almost identical and only differ in front-end scale-up processes. The application of proteomics for studying various biological problems will be presented with representative examples of (a) differential protein expression for identifying surrogate markers for colon cancer progression, (b) a non-2D gel approach for dissecting complex mixtures of membrane proteins, (c) proteins that inhibit cytokine signal transduction, (d) proteins that are involved in the intricate pathway that leads to programmed cell death (apoptosis).


4. Human Kidney Glomerulus Proteome and proposition of a method for native protein profiling
Akira Tsugita, K. Miyazaki, Y. Yoshida and T. Yamamoto, NEC Proteomics Reseach Center and Niigata Univ. Medical Faculty, Japan

To elucidate molecular mechanism of a chronic nephritis, the following proteome research of kidney glomeruli has been initiated. Pieces of cortex of kidney with normal appearance were obtained from patients underwent surgical nephrectomy due to renal tumor. Glumeruli preparation were carried out from the cortex by a standard sieving process using four sieves. The glomeruli on the 150 µm sieve were collected and further purified by picking up under a phase-contract microscopy. The glomeruli were spun down, homogenized in 2-DE lysis buffer and incubated.

2-DE was carried out from the glomeruli preparation in the standard method (25×20 cm) and about 1500 protein spots were separated. Identification of protein has been carried out by N-and-C-terminal sequencings and peptide mass fingerprinting with MALDI-TOF-MAS. 200 spots have been identified.

Besides, a new method has been developed to obtain native protein profiling. The first dimension is in liquid phase on an isoelectric chromato-focusing column and the second dimension is by non-polar chromatography and molecular sieving chromatography or a special designed reverse-phase chromatography.


Track III-D-2:
Genetic Data Issues

Chair: H. Sugawara

1. Genetic diversity in food legumes of Pakistan as revealed through characterization, evaluation and biochemical markers
Abdul Ghafoor and Asif Javaid, Plant Genetic Resources Institute, National Agricultural Research Center, Islamabad, Pakistan

Pakistan enjoys four distinguish seasons a year that enables to produce winter as well as summer legumes. Winter legumes consists of Chickpea (Cicer arietinum L.), lentils (Lens culinaris), peas (Pisum sativum), grass pea (Lathyrus sativus) and faba bean (Vicia faba), whereas summer legumes are mungbean (Vigna radiata), black gram (Vigna mungo), cowpea (Vigna unguiculata) and moth bean (Vigna oconotifolium). Common bean (Phaseolus vulgaris) is confined to high mountainous region of northern areas ranging the altitude 1000 to 2400 masl. These legumes have been collected and preserved in the gene bank for short duration (5-10 years) at 4 °C, medium term (15-20 years) at 0 °C and long term (more than 50 years) at -20 °C. The number preserved in the gene bank is 2065 (chickpea), 805 (lentil), 104 (peas), 100 (lathyrus), 101 (faba bean), 626 (mungbean), 646 (black gram), 199 (cowpea), 85 (moth bean) and 101 (common bean). About 80% of this germplasm has been characterized and evaluated for quantitative traits. Forty accessions of wild chickpea and one wild Vigna spp. have also been preserved.

The germplasm of black gram (250 accessions), mungbean (60 accessions), lentil (350 accessions), chickpea (350 accessions), wild chickpea (40 accessions), peas (104 accessions), cowpea (173 accessions) and wild Vigna spp. (one accession) have been evaluated for SDS-PAGE and except peas and wild chickpea, a low level of genetic diversity was observed for all the material evaluated. This situation lead to use of DNA markers, therefore 40 accessions of black gram and ten accessions of lentil were used for RAPD analysis that gave higher level of genetic diversity than SDS-PAGE. It was concluded that legume genetic resources should be characterised and evaluated along with biochemical analyses including protein and DNA markers for better gene bank management. This comprehensive data will lead to establishment of core collections. Either of the legumes mentioned above are mandate crop of one or other international centres except black gram and moth bean, although later is less important. Black gram has been identified a potential crop for most of Asian countries including India, Nepal, Bangladesh, Sri Lanka, Pakistan, Philippines, Thailand, Korea, Japan, Taiwan, China, etc. It is also recognized as an important crop in a part of African continent. Low genetic diversity coupled with low stability is a characteristic of this crop that could be minimized by developing a sound linkage between black gram growing countries and PGRI could serve as a regional gene bank for black gram preservation, evaluation and distribution of germplasm.


2. Visualization and Correction of Prokaryotic Taxonomy Using Techniques from Exploratory Data Analysis
T. G. Lilburn, American Type Culture Collection, USA
G. M. Garrity, Bergey’s Manual Trust and Department of Microbiology and Molecular Genetics, Michigan State University, USA

There are, at present, over 5,700 named prokaryotic species. There has long been a need to organize these species within a comprehensive taxonomy that relates each species to all the others. For some years, researchers have been sequencing the small subunit ribosomal RNA genes of many prokaryotes, initially to try and establish the evolutionary relationships among all prokaryotes and subsequently in order to aid in the identification of prokaryotes both known and unknown. These sequences have become an almost universal feature in the description of new species. Thus, for the purposes of classification, the sequences are probably the most useful, universally described characteristic of the prokaryotes. Small subunit rRNA gene sequences were used by the staff of the Bergey’s Manual Trust to establish prokaryotic taxonomy above the Family level only recently. This effort was facilitated by the application of techniques drawn from the field of exploratory data analysis to visualize the evolutionary relationships among large numbers of sequences and, hence, among the organisms they represent. We describe the techniques used to develop the first maps of sequence space and the techniques we are currently using to ease the placement of new organisms in the taxonomy and to uncover errors in the taxonomy or in sequence annotation. A key advantage of these techniques is that they allow us to see and use the complete data set of over 9,200 sequences. We also present plans for the development of a tool that will allow all interested researchers to participate in the maintenance and modification of the taxonomy.



3. Towards T-cell Epitope Design
Pandjassarame Kangueane, Meena K Sakharkar, Liew K. Meow, Nanyang Centre for Supercomputing and Visualisation, MPE, Nanyang Technological University, Singapore

Quantitative information on the types of inter-atomic interactions at the MHC-peptide interface will provide insights to backbone/sidechain atom preference during binding. Protein crystallographers have documented qualitative descriptions of such interactions in each complex. However, no comprehensive report is available to account for the common types of inter-atomic interactions in a set of MHC-peptide complexes characterized by MHC allele variation and peptide sequence diversity. The available x-ray crystallography data for MHC-peptide complexes in the Protein Databank (PDB) provides an opportunity to identify the prevalent types of inter-atomic interactions at the binding interface.

Two datasets, one consisting of 28 non-redundant class-I MHC-peptide complexes and another of 10 non-redundant class-II MHC-peptide complexes in the PDB were examined for inter-atomic interactions. Four types of such interactions namely - BB (backbone MHC - backbone peptide), SS (sidechain MHC - sidechain peptide), BS (backbone MHC - sidechain peptide) and SB (sidechain MHC - backbone peptide) characterize the MHC-peptide interface based on backbone and sidechain atom preference. We measured the percentage distribution of these interactions in a set of MHC-peptide complexes and identified the most common type among them.

We calculated the percentage distributions of four types of interactions at varying inter-atomic distances. The mean percentage distribution for these interactions and their standard deviation about the mean distribution is presented for each type. The prevalence of SS and SB interactions at the MHC-peptide interface is shown in this study. SB is clearly dominant at an inter-atomic distance of 3Å.

The prevalently dominant SB interaction at the interface suggests the importance of peptide backbone conformation during MHC-peptide binding. Currently available algorithms are well developed for protein side chain prediction upon fixed backbone templates. This study shows the preference of backbone atoms in MHC-peptide binding and hence emphasizes the need for accurate peptide backbone prediction in quantitative MHC-peptide binding calculations.



4. Intronless Genes in Eukaryotes
Meena Kishore Sakharkar and Pandjassarame Kangueane, Nanyang Technological University, Singapore

Eukaryotes have both intron-containing and intron-less genes and their proportion varies from species to species. Most eukaryotic genes are ‘‘multi exonic’’ with their gene structure being interrupted by introns. Introns account for a major proportion in many eukaryotic genomes. For example, the human genome is proposed to contain 24% introns and only 1.1% exons (Venter et al. 2001). Although most genes in eukaryotes contain introns, there are a substantial number of reports on intronless genes. We recently created a database (SEGE) for intronless genes in eukaryotes using GenBank 128 sequence data (http://intron.bic.nus.edu.sg/seg/). The eukaryotic subdivision files from GenBank were used to create a dataset containing entries that are reservedly considered as ‘‘single exonic’’ genes according to the ‘‘CDS’’ FEATURE convention. Single exon genes with prokaryotic architectures are of particular interest in gene evolution. Our analysis on this set of genes shows that structures are known for nearly 14% of their gene products. The characteristics and structural features of such proteins are discussed in this presentation.

Reference
Venter, C.J. et al. (2001) The sequence of the human genome. Science, 291, 1304-1351.


Track IV-A-2:
Biodiversity II


Chair: Ji Liqiang, Institute of Zoology, Chinese Academy of Sciences, China

1. Shell Biodiversity Using Animation Technology
Sung-Soo Hong, Hoseo University, Korea
Bu-Young Ahn, Kye-Jun Lee and Ji-Young Kim, Bio-Resources Informatics Department, Korea

The world’s natural history museums constitute an important storehouse of information about biodiversity. Although this information is regularly used for studies in systematic and natural history, its application to problems of importance to human well-being has been less frequent. Biodiversity is a new science that builds upon and combines the achievement at taxonomy, biology, biogeography, and ecology. It also draws on applied science such as conservation and natural resources management. A wide array at date types has been suggested as being relevant for biodiversity studies, ranging from molecular data to landuse data, early all of these data types can be structured around a core of 4 data elements : species, data, locality, and source, i.e. Theses data need to be digitized, cleaned-up, biogeography, and ecology. This paper is accompanied by a multimedia presentation of text, graphic, animation, virtual reality, and sound. This combination of data and its common visualization will provide a new insight about the interrelations among data. We developed a shell biodiversity using an animation technology (http://ruby.kisti.re.kr/~museumfs).

Cyber shell contents consists of five compartment including rare shells, marvelous shells, shell of the world, the shell of Korea and its story of shell. The database contains the pictures and related information of the shell. It implies not only animation display but also text information. The files of database were classified depending on the species, genus, family, order, and class and division of the shell. Pictures of shells are displayed and user may reach the image and virtual view information by clicking through the object displayed. This provides with various functions to multiplate, visualize and interact with image on the web. And every such transformations as translation, 360 degree rotation, and scaling can be applied in the picture interactively for the convenient and effective viewing. Information retrieval system using by corner transformation technique and multi-level grid file will be available for query search by future studies.



2. Building the Frog Contents System Using an Animation Technology
Sung-Soo Hong, Hoseo University, Korea
Bu-Young Ahn, Korea Institute of Science & Technology Information, Korea

In recent years, interest in surveying the biological resources of the country has increased greatly, with the goal of creating a national strategy to preserve biodiversity. Inventories and analyses of geographic, ecological, taxonomy and genetic diversity are key issues towards this goal. Frog dissection is mandatory part of biology or science courses offered in K-12 education and it is emphasized due to importance of the subject. Because of this, hundreds of thousands of frogs are dissected for the observation of their internal organs every year. This may not only result in environmental disruption but also has a risk of adversely affecting young students emotions as a side effect.

In the frog dissection system (http://ruby.kisti.re.kr/~museumfs), virtual dissection is enabled in order to eliminate these undesired effects and the factuality of organs is disguised using Photoshop to minimize the dislike of and aversion of students to the dissection process. In addition, the system was designed in such a way that, once a student replaces the dissected organs after observation is done, a frog is reanimated and jumps around so that the student does not treat the subject without care but instead treats it with respect for its life.



3. Biodiversity of Autotrophic Cryptogams in Antarctica
Asif Javaid, Abdul Ghafoor and Rashid Anwar, Plant Genetic Resources Institute, National Agricultural Research Center, Islamabad, Pakistan

Antarctica, the southernmost continent is a landmass of around 1.36 million square kilometers 98 percent covered by ice up to 4.7 kilometers thick. The continent remained neglected for decades after discovery, scientific research was initiated in early 1940s. Two species of phanerogams have been reported, whereas most of studies are carried out on cryptogams like algae, lichens and bryophytes. There are 700 species of terrestrial and aquatic algae in Antarctica, 250 lichens and 130 species of bryophytes including100 species of mosses and 25-30 species of liverworts. The species composition and abundance are controlled by many environmental variables, such as nutrients, availability of water and increased ultraviolet radiation resulting from the depletion of the ozone hole. These cryptogams can be found in almost all areas capable of supporting plant life in Antarctica and exhibit a number of adaptations to the Antarctic environment. There is a need to apply molecular and cellular techniques to study biodiversity and genetic characteristics of flora of this region. Biochemical techniques including DNA sequencing and microsatellite markers are being used to obtain information about the genetic structure of plant populations. These analyses are designed to assess levels of biodiversity and to provide information on the origin, evolutionary relationships and dispersal patterns. Flora of Antarctica needs to be genetically evaluated for the characters related to survival in that unique environment that can be incorporated into the economically important plants using transformation.


4. Automatic Mapping and Monitoring of Invasive Alien Plant Species, the South African Experience
J. M. K. Kandeh, J. L. Campos dos Santos and L. Kumar, International Institute for Geo-Information Science and Earth Observation, The Netherlands

Invasive alien plants are a huge problem in South Africa, affecting about 8.28% (10.1 million hectares of land) of the country. When converted to dense stands, this amounts to about 1.7 million hectares, and the problem is spreading rapidly. There is growing concern over the increasing rate at which the alien plants are replacing indigenous plants.

In response to the call of the convention on Biological diversity (UNEP, 1994), South Africa has over the years made efforts in compiling data on invasive alien plant species. A lot has been done in collating information on the distribution, abundance and habitat types of invasive alien plants, the role of biological agents in control of invasive alien plants, and modeling water use and spread of alien invasive plants.

Data on invasive alien plants in some part of the country are still weak and hence do not produce a comprehensive picture of alien plants invasion in the country. In the Greater St. Lucia Wetland Park of KwaZulu-Natal, the South African Government is implementing a mapping and control program on invasive alien plant species.

Control of invasive aliens species in the Wetland Park is also undertaken by a number of other organisations including private landowners, sugar cane farmers and forest plantation owners. There is lack of a standardised methodology with regards to data capture amongst the organisations. There are differences in data formats, map projections, little or no data exchange taking place, and most of the data on invasive alien plants held are not in computerized format. Consequently, there is very little information on the extent and distribution of invasive alien plants in the Greater St. Lucia Wetland Park.

This paper presents the development of a prototype geographic information system, which integrates data from various organisations in the Wetland Park. Integrating data from various organizations requires standardisation in data acquisition methodology, data representation and data management amongst the organisations.

In standardising data acquisition methodology, the methodology of Le Maitre and Versfed developed for mapping invasive alien plants at a 1:50,000 scale for a fynbos catchment management system was used, with the density classes grouped into four classes instead of seven without interfering with the class boundaries.

Using the Structured Systems Analysis Development Methodology, a prototype information system (APMIS) has been designed, tested and implemented. APMIS integrates data from various organisations in The Greater St. Lucia Wetland Park. APMIS is capable of providing geographic information on extent and distribution of invasive alien plants, assess eradication status of mapped areas, and provide operation maps of areas to be cleared. The APMIS strategy can be applied elsewhere where invasive alien plants are a problem and requires a coordinated approach both in mapping and control amongst all key players.

Keywords: Invasive Alien Plants, Geographic Information Systems, Biological Diversity, Systems Development Methodology



5. An Introduction of Chinese Biodiversity Information System
Ji Liqiang, Institute of Zoology, Chinese Academy of Sciences, China

Chinese Biodiversity Information System (CBIS) is a nation-wide distributed information system that collects, arranges, stores and disseminates data/information refers to biodiversity in China. It consists of a center system, 5 disciplinary divisions and dozens of data source. The Center System of CBIS is located in the Institute of Botany, Chinese Academy of Sciences, Beijing. The 5 divisions are Zoological Division (in Institute of Zoology, CAS, Beijing), Botanical Division (in Institute of Botany), Microbiological Division (in Institute of Microbiology, CAS, Beijing), Inland Wetland Biological Division (in Institute of Hydrobiology, CAS, Wuhan) and Marine Biological Division (in South China Sea Institute of Oceanology, CAS, Guangzhou). The data sources cover 15 institutes in CAS and includes botanical garden, field research station, museum, cell bank, seed bank, culture collection and research group. The Center System is response for building up and maintaining integrated and national-scale biodiversity database, environmental factor and vegetation database, model base and expert system in ecosystem level, and platform and tools of modeling and expert system. The Disciplinary Divisions are response for building up and maintaining database, model base and expert systems on their fields focused on data and information of species level. Data Sources are response for building up and maintaining database based on their local situation and disciplinary character, combining with GIS technology to present biodiversity information and data both in table and graphics.

82 databases have been set up in CBIS and been improved gradually, more than 590,000 records has been collected and inputted into CBIS database system, and most of them could be accessible from the Internet. They includes species inventory databases, endangered and protected species databases, ecosystem databases, specimen databases, botanical garden databases, culture collection database, cell bank database, economical species databases, etc.

In species inventory databases of animal, plant and microorganism, there are data of systematics, name, distribution, habitat and reference. In database of endangered and protected species, there are data of grade of protection, reason of endangered, measurement of protection, picture, etc. In database of specimen, there are data of collection, identification, storage and catalogue of species. CBIS recognizes the importance of metadata to data sharing and exchanging in its initial period and then sets up a series of standard of metadata in CBIS participating institutes. They include standard of metadata of dataset, data dictionary, metadata of institution and staff in CBIS. The metadata of dataset consists of 6 parts: information of dataset identity information, data collection, data management, data description, data accessing and metadata management. All databases of CBIS must be accompanied with a metadata file or table when they are put on the Internet or exchanged with other institutions.



6. Biodiversity Issues in Taiwan
Shang-Shyng Yang and Jong-Ching Su, National Committee for CODATA/Taiwan and Department of Agricultural Chemistry, National Taiwan University, Taiwan

In order to conserve and protect the very rich biological resources that have evolved in a unique natural environment, the government in Taiwan has set up a special committee and assigned a government agency, both at the cabinet level, to be in charge of planning and implementing relevant programs, respectively. Convening “Prospects of Biodiversity, Biodiversity-1999 and Biodiversity in the 21st Century” symposia has been the main means of building the national consensus to identify issues to be studied, which have motivated scientists to initiate the challenging task with the support of research funding from related agencies. There are 6 national parks, 18 nature reserves, 13 wildlife and 24 nature protection areas, totally covering 12.2% of the land area. The Policy Formulating Committee for Climate Changes has recommended the enforcement of education on biodiversity (including all levels of school and general public education), and formulated the working plans on the national biodiversity preservation and bioresources survey. The research programs in progress, supported by the national funding, include surveys on species, habitants, ecosystems and genetic diversities, long-term monitoring of diversity, sustainable bioresource utilization and compilation of flora of Taiwan. Increase in the number of scientific publications and increased emphasis placed by news media show the increased concern of both academic and public domains on biodiversity issue. Besides, the material and information databases related to the biological resources of various categories have been established and revised regularly. The following bioscience databases have been established in Taiwan: National plant genetic resources information system, Multimedia databank of Taiwan wildlife, Taiwan Agricultural Institute plant information system, Distribution and resources of fishes in Taiwan, Herbaria at many sites, Cell bank, Asian vegetable genetic resources and seeds, Database of pig production, Registry of pure-bred swine, Mating, furrowing, performance and transfer of ownership of pure-bred swine, Food marketing information system database, Food composition table in Taiwan, Database on heavy metals in Taiwan soils, Greenhouse gases emission from agriculture, Global change database generated in Taiwan.

Keywords: Biodiversity, national park, public education, bioscience, conservation policy, database

 


Track IV-B-2:
Bioinformatics


Chair: Takashi Kunisawa, Science University of Tokyo, Japan

Biologists are facing the challenge of organizing and integrating a vast amount of data and information, which are mainly produced by genome projects. This session focuses primarily on quality controls in sequence databases. Phylogenetic analyses of sequence data are also included in the scope.

1. Unweaving regulatory networks: automated extraction from literature and
statistical analysis

Andrey Rzhetsky, Columbia Genome Center, Columbia University, USA

In the first part of the talk I will describe our on-going effort to build a natural language processing system extracting information on interactions between genes and proteins from research articles. In the second part of the talk I will introduce an algorithm for predicting molecular networks from sequence data and stochastic models of birth of scale-free networks.



2. Genome rearrangements in the clinic and in evolution
David Sankoff, Centre de recherches mathematiques, Universite de Montreal, Canada

We analyze data on rearrangement breakpoints resulting from individual real-time cytogenetic events in order to help understand the distribution of multiple breakpoints in comparative maps. We compare breakpoint positions from four different databases, on reciprocal translocations, inversions and deletions in neoplasms, reciprocal translocations and inversions in families carrying rearrangements and the human-mouse comparative map. For each set of positions we construct breakpoint distributions for as many as possible of the the 44 autosomal arms. We identify and interpret four main types of distribution:

  1. The uniform distribution associated both with families carrying translocations or inversions, and with the comparative map,
  2. Telomerically skewed distributions of translocations or inversions detected consequent to births with malformations,
  3. Medially clustered distributions of translocation and deletion breakpoints in tumor karyotypes,
  4. Bimodal translocation breakpoint distributions for chromosome arms containing telomeric proto-oncogenes.

 

3. PIR Integrated Databases And Data-Mining Tools For Genomic And Proteomic Research
Zhang-Zhi Hu, Winona C. Barker and Cathy H. Wu, Protein Information Resource, National Biomedical Research Foundation, Georgetown University Medical Center, Washington, DC, USA

The human genome project has revolutionized the practice of biology and the future potential of medicine. With the accelerated accumulation of high-throughput genomic and proteomic data, computational approaches are increasingly important for deriving scientific knowledge and hypotheses.

As an integrated public resource of protein informatics, the Protein Information Resource (PIR) provides many databases and analytical tools to support genomic and proteomic research and scientific discovery. The Protein Sequence Database (PSD) is the major annotated protein database in the public domain, containing about 280,000 sequences covering the entire taxonomic range. To provide high quality annotation and promote database interoperability, the PIR uses rule-based and classification-driven procedures based on controlled vocabulary and accepted ontologies, and includes evidence attribution to distinguish experimentally determined from predicted protein features. PIR-NREF, a non-redundant database containing almost 1,000,000 proteins from PIR-PSD, Swiss-Prot, TrEMBL, GenPept, RefSeq, and PDB, provides a timely and comprehensive sequence collection with source attribution for protein identification, ontology development of protein names, and detection of annotation errors. The composite protein names in NREF, including synonyms and alternate names, and the bibliographic information from all underlying databases provide an invaluable knowledgebase for application of natural language processing or computational linguistics techniques to develop a protein name ontology. The iProClass database addresses the database interoperability issues arising from voluminous, heterogeneous, and distributed data. It provides comprehensive family relationships and functional and structural features for about 800,000 proteins in PIR-PSD, Swiss-Prot, and TrEMBL, with rich links to over 50 databases of protein families, functions, pathways, protein-protein interactions, post-translational modifications, structures, genomes, ontologies, literature, and taxonomy. The PIR databases are implemented in an object-relational database system and accessible online (http://pir.georgetown.edu) for exploration of proteins and their comparative analysis. It helps users to answer complex biological questions that may typically involve querying multiple sources and detect interesting relationships among protein sequences and groups.

The PIR is supported by the NIH grant P41 LM05798, iProClass is supported by the NSF grants DBI-9974855 and DBI-0138188, and the Protein Name Ontology project is supported by the NSF grant ITR-0205470.

 

4. Extraction of Phylogenetic Information from Gene Order Data
Takashi Kunisawa, Science University of Tokyo, Japan

Molecular phylogeny is frequently inferred from comparisons of nucleic or amino acid sequences of a single gene or protein family from different organisms. It is now known that there are a number of difficulties with this approach, for instance, correct alignment of sequence data, biased base (or amino acid) compositions among species, rate variation among sites and/or species, mutational saturation, and long-branch attraction artifact. Thus, development of new methods that can produce a reliable phylogenetic tree is an important issue. Here we present a simple method of reconstructing branching orders among genomes based on gene transpositions. We demonstrate that the occurrence or absence of a gene transposition event could provide empirical evidence for branching orders, being in contrast to the phenetic approaches of overall similarity or minimum distance. This approach is applied to evolutionary relationships among the completely sequenced Gram-positive bacteria. The complete genomic sequence data allow one to search for the target gene transpositions at a comprehensive level.


Last site update: 15 March 2003