CODATA 2002 Program

Informatics and Technology Abstracts

Data Policy

Technical Demonstrations

Detailed Program

List of Participants
[PDF File]
(To view PDF files, you must have Adobe Acrobat Reader.)

Conference Sponsors

About the CODATA 2002 Conference

Track I-C-5:
Data Archiving

Chair: Seamus Ross

1. Report of Activities of the CODATA Working Group on Archiving Scientific Data
William Anderson, Praxis101, Rye, NY, USA
Steve Rossouw, South African National Committee for CODATA
Co-organizers: CODATA Working Group on Archiving Scientifc Data

A Working Group on Scientific Data Archiving was formed following the 2000 International CODATA Conference in Baveno, Italy. The Working Group has (1) built a list of anotated primary references to published reports and existing scientific data archives, (2) constructed a classification scheme to help organize and expose the many issues and requirements of archiving, preserving, and maintaining access to scientific and technical data, (3) helped sponsor a workshop in South Africa on archiving scientific and technical data, and (4) proposed collaborating with the International Council for Scientific and Technical Information (ICSTI) to build and maintain an internet portal focused on scientific data and information archiving, preservation and access. The objectives of these efforts is to provide scientists and scientific data managers a framework of information and references that can assist in securing the resources and commitments needed to preserve and archive scientific data. This presentation outlines the results of these efforts with the goal of stimulating discussion of the organizing framework as well as the definitions and relationships among identified issues.

2. The NIST Data Gateway: Providing Easy Access to NIST Data Resources
Dorothy M. Blakeslee, Angela Y. Lee, and Alec J. Belsky, National Institute of Standards and Technology, USA

The National Institute of Standards and Technology (NIST) maintains a wide range of scientific and technical data resources, including free online data systems and PC databases available for purchase. However, many people are not familiar with these various NIST data collections and the types of data they contain. To help scientists, engineers, and the general public find out quickly and easily whether data they need are available at NIST, NIST has built a web portal to NIST data resources. The first version of this portal, the NIST Data Gateway (http://srdata.nist.gov/gateway), provides easy access to 26 online NIST data systems and information on 48 NIST PC databases. NIST Data Gateway users can specify a keyword, property, or substance name to find the NIST data resources that contain standard reference data meeting their search criteria. When users find a data resource they want to use, links are provided so they can access or order that resource. In this paper, we describe how version 1.0 of the NIST Data Gateway was built and discuss some of the issues that arose during the design and implementation stages. We include experience we gained that we hope will be useful to others building data portals. We also discuss future plans for the NIST Data Gateway, including efforts to provide access to additional NIST data resources.

3. Long Term Data Storage: Are We Getting Closer to a Solution?
A. Stander and N. Van der Merwe, Department of Information Systems, University of Cape Town, South Africa
Steve F. Rossouw, South Africa National Committee for CODATA, South Africa

Many scientific and socioeconomic reasons exist for the long term retention of scientific and lately also business data. To do so successfully, the solution must be affordable and also technologically flexible enough to survive the many technology changes during its useful life. This paper looks at the current status of available technology for long term data storage, more specific the standards that exist for data interchange, the creation and storage of metadata, data conversion problems and the reliability and suitability of digital storage media. Even if in the ideal format, application and database management software is needed to store and retrieve the data.

Typically the life expectancy of such software is much shorter than that of the storage media and as this has already been the cause of major data loss, possible solutions are investigated. Most research into long term data storage focus on large to very large databases. It is often forgotten that small, but very important pockets of scientific data exist on the computers of individual researchers or smaller institutions. As most of the time this is stored in application specific formats with a short lifespan, strategies for the preservation of smaller amounts of data are also looked at.

4. Prototype of TRC Integrated Information System for Physicochemical Properties of Organic Compounds: Evaluated Data, Models, and Knowledge
Xinjian Yan, Thermodynamics Research Center (TRC), National Institute of Standards and Technology, USA
Qian Dong, Xiangrong Hong, Robert D. Chirico and Michael Frenkel

Physicochemical property data are crucial for industrial process development and scientific research. However, such data that have been experimentally determined are not only very limited, but also deficient in critical evaluations. Moreover, the models developed for the prediction of physicochemical property have rarely been presented with sufficient examination. This situation makes it very difficult to understand the data that are obtained from reference books, databases or models after a time-consuming effort. Therefore, we aim at developing a comprehensive system, TRC Integrated Information System (TIIS), which consists of evaluated data, models, knowledge and functions to infer, and then to recommend, the best data and models. Additionally, it provides valuable information for users to have a better understanding of physicochemical property data, models, and theory.

Evaluated physicochemical property data in TIIS are mainly selected from the TRC Source data system, which is an extensive repository system of experimental physicochemical properties and relevant measurement information. Data uncertainty and reliability are analyzed based on scientific data principles, statistics, and highly evaluated property models. Information about experimental condition, data processing, etc., is recorded in a detailed way.

Reliability of the data predicted by a model cannot be determined without a full description of the model's ability.

Each model in TIIS is carefully examined by using evaluated data, with emphasis on the predictive ability for calculating the compounds not used in processing the model's parameters, and applicable compound classes, for which the model can produce reasonably good property data. For a given compound, the best predictive value is recommended according to models' performances in calculating evaluated data set. TIIS also provides regression analyses and optimization functions so that users are able to process model parameters by using the current best experimental data set for a particular compound.

A property value, a model or a chemical system cannot be fully understood without sufficient supporting information. Therefore, the knowledge that describes characteristics of property data, models, molecular structures, and the results from theoretical analysis and calculation, is provided by TIIS.

5. An Introduction of CODATA-China Physical and Chemical Database Information System
Xiao Yun, Secretary General, Chinese National Committee for CODATA, China
Yan Baoping, Director, Computer Network Information Center, CAS, China
Zhang Hui, Secretary, Chinese National Committee for CODATA, China
Jin Huanian, Engineer, Computer Network Information Center, CAS, China

In 2001 the Chinese Ministry of Science and Technology made the decision to bring the data center coordinated by CODATA-China into the basic work of the National Key Research Development Program, rendering long-term support for the accumulation, development and utilization of the technological basic data work by starting the special technological basic project.

A database information service system is expected to be set up within 3 to 5 years with the CODATA-Chian Physical and Chemical Database Information System as the main body, involving the subjects of agriculture, forestry, mechanism, material, biology, etc., so as to form a centered group of CODATA-China Physical and Chemical Database Information System, which, targeting the field of mathematics, physics, and chemistry, is able to provide basic and applied data for the scientific research and production. At present the data contained in CODATA-China Physical and Chemical Database Information System mainly includes: the Chinese nuclear data, the Chinese atom and molecule data, the Chinese chemistry and chemical industry data, the geothermodynamics data, the chemdynamics data, the Chinese aviation material data, and the Chinese feedstuff technology data. The Computer Network Information Center, CAS, will work as the general center,providing this project with service platform and technologic support based on centralized management assisted by distributed management. Relying on high-performance Unix server and the database management system of Oracle, the data application service platform of superb usability and efficiency will be developed based on the high-performance and transplantable software development language of JAVA. The advanced full text retrieval system in China, the TRS Full Text Retrieval System, will be used to provide highly efficient and reliable service of full text data retrieval, and the data service will be realized in the Web mode through Internet.

Track I-C-6:
Ingénierie de la veille technologique et de l'intelligence économique
(Data for Competitive Technical and Economic Intelligence)

Chair: Clément Paoli, Université MLV, France

La production d'information élaborée à partir de l'analyse mathématique et linguistique des sources d'information électroniques contenant des données scientifiques factuelles et technologiques textuelles, constitue la matière première des décisions stratégiques.

Le développement des méthodes et outils logiciels permettant un criblage systématique des sources d'information provenant des banques de données en ligne et d'Internet permet d'obtenir des corpus d'information à forte valeur ajoutée.

Le management de la connaissance repose sur l'obtention rapide et de qualité de données et d'information élaborée. Les techniques d'amélioration de ces données sont souvent associées avec les logiciels de traitement retenus pour les problèmes d'intelligence économiques.

Les problèmes de standardisation et d'interopérabilité des systèmes locaux entre eux et avec l'information externe constituent des bases de discussions et d'échanges de vue rechercher dans cette session.

Les principaux thème proposés ici sont exposés pour le ouvrir le débat, d'autres propositions seront évaluées.

Accès aux sources d'information : moteurs d'interrogation
Représentation des connaissances : traitements sémantiques - cartographie - imagerie
Méthodes et outils d'analyse statistique : analyse statique et en ligne
Analyse mathématiques de données : AFC, Classifications hiérarchisées
Méthodes et outils linguistiques : extraction terminologique (analyse sémantique)
Data Mining , Texte Mining : interopérabilité et bases hétérogènes clustérisation
Knowledge Management : l'information comme ressource
Vulnérabilités informationnelles : qualité des données et des informations, protection

1. Ingénierie de la veille pédagogique et gestion des connaissances en enseignement supérieur (Data for competitive pedagogy and knowledge management in higher education)
Jean-Paul Pinte, Université Marne La Vallée, France

L'enseignement universitaire est condamné à se renouveler, à redéfinir ses paradigmes, sinon il se sclérose.

Des changements pour ces derniers sont apparus depuis quelques années dans de nombreux domaines comme le téléphone sans fil, la médecine préventive, l'écologie, la mondialisation…

Pour ce qui est de l'enseignement nous sommes entrés dans le paradigme de l'apprentissage.
Les pressions nous viennent principalement du monde du travail avec entre autres la création de nouveaux environnements de travail, l'apparition de nouvelles caractéristiques de clientèles, une explosion des connaissances et des ressources, le développement fulgurant des Technologies de l'Information et de la Communication, et, surtout, l'arrivée de nouveaux étudiants de tout âge, de toute provenance, avec des motivations et des compétences diversifiées à l'extrême.

En dehors de l'enseignement de matières au niveau le plus élevé de la connaissance, de la recherche et de la production de savoirs, l'université assure aujourd'hui un troisième rôle économique et social ayant pour objectif la production de valeur ajoutée et débouchant sur la recherche finalisée.

L'économie du savoir supplante l'économie matérielle et les universités sont de plus en plus "entrepreneuriales".

Une remise en cause profonde de l'université est déjà en cours. Elle vise à s'ouvrir à ce nouveau rôle par la mise en place de pédagogies " actives ", de formations ouvertes et à distance (E-learning, campus virtuels, numériques, …).

Avec les TIC on ne peut plus enseigner comme avant.

Des TIC aux TIC, il nous faut maintenant passer à " Technologies pour l'Intelligence et la Connaissance " .

La veille pédagogique est une des principales clés de réussite pour accompagner ce changement.

2. L'analyse des mots associés pour l'information non scientifique
(Co-Word analysis for non scientific information)
Bertrand Delecroix and Renaud Eppstein, ISIS/CESD, Université de Marne La Vallée, France

Co-word analysis is based on a sociological theory developed by the CSI and the SERPIA (Callon,Courtial,Turner,Michelet) in the middle of the eighties. It measures association strength between terms in documents to reveal and visualise evolution of science through the construction of clusters and strategic diagram. Since, this method has been successfully pplied to investigate the structure of many scientific fields. Nowadays it occurs in many software systems which are used by companies to improve their business and define their strategy but the relevance in this kind of application has not been proved yet.

Through the example of economic and marketing information on DSL technologies from Reuters Business Briefing, this presentation gives an interpretation of co-word analysis for this kind of information. After an outlook of the software we used (Sampler and LexiMine) and after a survey of the experimental protocol, we investigate and explain each step of the co-word analysis process : terminological extraction, computation of clusters and strategic diagram. In particular, we explain the meaning of every parameter of the method : the choice of variables and similarity measures is discussed. Finally we try to give global interpretation of the method in an economic context. Further studies will be added to this work in order to allow a generalisation of these results.

Keywords : clustering, co-word analysis, competitive intelligence

3. Stratégies du partenariat scientifique entre les pays de l' UE et les pays en développement : indicateurs bibliométriques
P.L. Rossi , IRD, Centre d'Ile de France, Bondy, France

L'exploitation de la base de données bibliographique Science Citation Index (SCI) de l'ISI (Institute for Scientific Information, Philadelphie) permet de concevoir des indicateurs bibliométriques servant à caractériser les stratégies de partenariat scientifique qui existent entre les pays de l'Union européenne et les pays en développement.

Les données disponibles dans la base SCI que nous avons exploité permettent d'établir de multiples indicateurs concernant les productions scientifiques des pays, de leurs régions, de leurs institutions, des politiques scientifiques nationales, des proximités et des affinités entre différents acteurs. Ils sont regroupés dans des indicateurs de productions scientifiques, des indicateurs de spécialisation, des indicateurs relationnels.

Cette étude a été réalisée sur les données bibliographiques de la période 1987-2001 et concerne les pays de l'Union européenne en plus des pays de l'Afrique, de l'Amérique latine ainsi que de l'Asie.

En ce qui concerne les stratégies de partenariat scientifique entre les 15 pays de l'Union européenne et les pays du continent africain, trois " grandes " catégories de partenaires européen peuvent être définies :

les pays de l'Union européenne qui ont un profil de partenariat " proche " des principaux producteurs scientifiques africains : l'Autriche, l'Allemagne, l'Espagne, la Finlande et l'Italie,
les pays de l'Union européenne qui ont un profil de partenariat avec un " engagement " fort en Afrique subsaharienne : le Danemark, les Pays Bas et la Suède,
les pays de l'Union européenne qui ont un profil de partenariat avec des pays pour lesquels des relations liées à l'histoire coloniale et à la langue existent : la Belgique, la France et la Grande Bretagne.

Pour ces deux derniers pays est à signaler la différence de l'impact qu'ils ont sur les productions nationales de leurs principaux partenaires : très important pour la France, plus modéré pour la Grande Bretagne.

4. La fusion analytique Data/Texte : nouvel enjeu de l'analyse avancée de l'information (The merging of structured and unstructured data : the new challenge of advanced information analytics)
J.F. Marcotorchino, Kalima Group

Track III-C-4:
Attaining Data Interoperabilty

Chair: Richard Chinman, University Corporation for Atmospheric Research, Boulder, CO, USA

Interoperability can be characterized as the ability of two or more autonomous, heterogeneous, distributed digital entities (e.g., systems, applications, procedures, directories, inventories, data sets, ...) to communicate and cooperate among themselves despite differences in language, context, or content. These entities should be able to interact with one another in meaningful ways without special effort by the user - the data producer or consumer - be it human or machine.

By becoming interoperable the scientific and technical data communities gain the ability to better utilize their own data internally and become more visible, accessible, usable, and responsive to their increasingly sophisticated user community. When "the network is the computer", interoperability is critical to fully take advantage of data collections and repositories.

Two sets of issues affect the extent to which digital entities efficiently and conveniently interoperate: syntactic and semantic interoperability.

Syntactic interoperability involves the use of communication, transport, storage and representation standards. For digital entities to interoperate syntactically, information (metadata) about the data types and structures at the computer level, the syntax of the data, is exchanged. However, if the entities are to do something meaningful with the data, syntactic interoperability is not enough, semantic interoperability is also required.

Semantic interoperability requires that a consistent interpretation of term Usage and meaning occur. For digital entities to interoperate semantically, consistent information (metadata) about the content of the data - what the basic variable names are and mean, what their units are, what their ranges are - is exchanged. This information can be referred to as the semantic search metadata, since it can be used to search for (and locate) data of interest to the user. However, this is not the metadata that is required for semantic interoperability at the data level, although it does contain some of the same elements. The semantic information required for machine-to-machine interoperability at the data level is the information required to make use of the data. For example, the variable T is sea surface temperature, the data values correspond to ºC divided by 0.125, missing values are represented by -999, ... This information can be referred to as the semantic use metadata. Without this information, the digital entity, be it machine or application, cannot properly label the axes of plots of the data or merge them with data from other sources without intervention from a knowledgeable human.

There are other sets of issues that affect interoperability:

Political/Human Interoperability
Inter-disciplinary Interoperability
Legal Interoperability
International Interoperability

This session is about all sets of interoperability issues, but especially focused on attaining semantic interoperability at the data level.

1. Interoperability in a Distributed, Heterogeneous Data Environment: The OPeNDAP Example
Peter Cornillon, Graduate School of Oceanography, University of Rhode Island, USA

Data system interoperability in a distributed, heterogeneous environment requires a consistent description of both the syntax and semantics of the accessible datasets. The syntax describes elements of the dataset related to its structure or organization, the contained data types and operations that are permitted on the data by data system elements. The semantics give meaning to the data values in the dataset. The syntactic and semantic description of a dataset form a subset of the metadata used to describe it; other metadata often associated with a dataset are fields that describe how the data were collected, calibrated, who collected them, etc. Although important, indeed often essential, to meaningfully interpret the data, these additional fields are not required for machine-to-machine interoperability (Level 3 Interoperability) in a data system. We refer to semantic metadata required to locate a data source of interest as semantic search metadata and semantic metadata required to use the data, for example to label the axes of a plot of the data or to exclude missing values from subsequent analysis, as semantic use metadata.

In this presentation, we summarize the basic metadata objects required to achieve Level 3 Interoperability in the context of an infrastructure that has been developed by the Open source Project for a Network Data Access Protocol (OPeNDAP) and how this infrastructure is being used by the oceanographic community in the community-based National Virtual Ocean Data System (NVODS). At present, in excess of 400 data sets are being served from approximately 40 sites in the US, Great Britain, France, Korea and Australia. These data are stored in a variety of formats ranging from user developed flat files to SQL RDBMS to sophisticated formats with well defined APIs such as netCDF and HDF. A number of application packages (Matlab, IDL, VisAD, ODV, Ferret, ncBrowse and GrADS) have also been OPeNDAP-enabled allowing users of these packages to access subsets of data sets of interest directly over the network.

2. Interoperable data delivery in solar-terrestrial applications: adopting and evolving OpENDAP
Peter Fox, Jose Garcia, Patrick West, National Center for Atmospheric Research, USA

The High Altitude Observatory (HAO) division of NCAR investigates the sun and the earth's space environment, focusing on the physical processes that govern the sun, the interplanetary environment, and the earth's upper atmosphere.

We present details on how interoperability within a set of data systems support by HAO and collaborators has driven the implementation of services around the Data Access Protocol (DAP) originating in the Distributed Oceanographic Data System (DODS) project. The outgrowth of this is the OpENDAP - an open source project to provide reference implementations of the DAP and its core services.

We will present the recent design and development details of the services built around the DAP, including interfaces to common application programs, like the Interactive Data Language, the web, and server side data format translation and related services.

We also present examples of this interoperability in a number of science discipline and technology areas: the Coupling, Energetics and Dynamics of Atmospheric Regions (CEDAR) program, the Radiative Inputs from Sun to Earth (RISE) program, the Earth System Grid II project, and the Space Physics and Aeronomy Collaboratory.

3. The Earth System Grid: Turning Climate Datasets Into Community Resources
Ethan Alpert, NCAR, Boulder, CO, USA
David Bernholdt, Oak Ridge National Laboratory, Oak Ridge, TN, USA
David Brown, NCAR, Boulder, CO, USA
Kasidit Chancio, Oak Ridge National Laboratory, Oak Ridge, TN, USA
Ann Chervenak, USC/ISI, Marina del Ray, CA, USA
Luca Cinquini, NCAR, Boulder, CO, USA
Bob Drach, Lawrence Livermore National Laboratory, Livermore, CA, USA
Ian Foster, Argonne National Laboratory, Argonne, IL, USA
Peter Fox, NCAR, Boulder, CO, USA
Jose Garcia, NCAR, Boulder, CO, USA
Carl Kesselman, USC/ISI, Marina del Ray, CA, USA
Veronika Nefedova, Argonne National Laboratory, Argonne, IL, USA
Don Middleton, NCAR, Boulder, CO, USA
Line Pouchard, Oak Ridge National Laboratory, Oak Ridge, TN, USA
Arie Shoshani, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Alex Sim, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Gary Strand, NCAR, Boulder, CO, USA
Dean Williams, Lawrence Livermore National Laboratory, Livermore, CA, USA

Global coupled Earth System models are vital tools for understanding potential future changes in our climate. As we move towards mid-decade, we will see new model realizations with higher grid resolution and the integration of many additional complex processes. The U.S. Department of Energy (DOE) is supporting an advanced climate simulation program that is aimed at accelerating the execution of climate models one hundred-fold by 2005 relative to the execution rate of today. This program, and other similar modeling and observational programs, are producing terabytes of data today and will produce petabytes in the future. This tremendous volume of data has the potential to revolutionize our understanding of our global Earth System. In order for this potential to be realized, geographically distributed teams of researchers must be able to manage and effectively and rapidly develop new knowledge from these massive, distributed data holdings and share the results with a broad community of other researchers, assessment groups, policy makers, and educators.

The Earth System Grid II (ESG-II), sponsored by the U.S. Dept. of Energy's Scientific Discovery Through Advanced Computing (SciDAC) program, is aimed at addressing this important challenge. The broad goal is to develop next generation tools that harness the combined potential of massive distributed data resources, remote computation, and high-bandwidth wide-area networks as an integrated resource for the research scientist. This integrative project spans a variety of technologies including Grid and DataGrid technology, the Globus Toolkit™, security infrastructure, OPeNDAP, metadata services, and climate analysis environments. In this presentation we will discuss goals, technical challenges, and emerging relationships with other related projects worldwide.

4. Geoscientific Data Amalgamation: A Computer Science Approach
N. L. Mohan, Osmania University, India

Rapidly changing environment of data, information and knowledge and their communication and exchange has created multidimensional opportunity for researcher that it could (i) avoid duplication of work, increases - (ii) the competitive spirit, (iii) the high quality of research and (iv) the interdisciplinary areas of research. In this context amalgamation of geo-scientific data assumes greater importance. The amalgamation and availability of numerical form of geo-scientific data are possible based on fundamental premise that it is open to public domain and quality of data is assured.

Availability of data pertaining to earth related sciences in general and geophysical data in particular can be classified into three categories -(a) Raw data form, (b) Processed or filtered form of raw data and (c) Theoretically computed form. Further, data are of two types - Static mode and Dynamic mode as certain type of data pertaining to exploration geophysical methods like gravity, magnetic, electrical, seismic, well logging etc are Static type. That is once the data is acquired it may remain static. On the contrary, the data pertaining to earth tides, geomagnetic field records, earthquake seismograms, Satellite data etc are of Dynamic type. One more dimension of data availability is Model Construction, based on Expert System Shells- an Artificial Intelligence approach. The Final version of data availability is through two formats- Graphical and Image forms.

It is not the question to make available numerical data but how to organize, manage, and update are challenging aspects in Earth-related sciences. Unlike other fields of science and engineering, geo-scientific data management needs altogether special approach. The author believes that geo-scientific community may need to understand certain important areas of computer science so that they could guide computer specialists appropriately to manage data efficiently and communicate to his earth science community effectively, in 2-D, 3-D numerical form, including graphical and imagery forms.

Data Base System is a computerized record-keeping system. Several operations like Adding data to new and empty files, Inserting data into existing files, Retrieving data from existing files, Changing data in existing files, Deleting data from existing files and Removing existing files are involved in record keeping system. Further, Data Base Architecture comprises three levels- (i) Internal Level, a centralized storage system as it would help to store all types of geo-scientific data at one place; (ii) Conceptual level, a specific type of data storage system as it would help to track the data by specialist in the concerned specialized area like earthquake data and (iii) External level, a user node connectivities as simultaneously number of users can track different types of geo-scientific data according to ones own interest. Also, from another angle an abstract view of the data can be broadly segmented into three levels - (a) the physical level gives the idea how the data are actually stored; (b) logical level indicates that what data are stored in the data base, and what relationships exist among those data and (c) view level represents only a part of entire data base.

One of the most important data bases, particularly form the point of geo-scientific data organization, is object oriented data system that would play a predominant role to organize several different sets of data and could refer to each other for better understanding, modeling and refining the models and making meaningful and correct inferences etc. The object oriented data arrangement is based on concepts like (a) inheritance, (b) polymorphism, (c) multiple inheritance etc. These respective concepts would help to inherit certain properties from parent entity apart from its own; a function with same name can perform different tasks by taking different sets of data; and a class or set of different classes can inherit different properties from several classes.

Parallel and distributed data bases do play important but limited roles in certain contexts of geo-scientific data amalgamation systems. That is, these data base approaches may help within a geo-scientific organization where certain confidentiality is required and does not want to throw on public domain initially for some time.

Artificial Intelligence is the most promising area that geo-scientific community should look into for data organization, modeling, searching, multi-dimensional construction and view of graphical and image models and data management etc. In view of data communication through high speed, wide band and wireless inter-net domain systems, Knowledge Base and expert System Shells dominating the scene. Artificial Intelligence envelopes several areas like, data bases, object oriented representation, search and control strategies, matching techniques, knowledge organization and management, pattern recognition, visual image understanding, expert system architecture, machine learning, several types of learning techniques that include neural networks, problem solving methods, robotics, semantic nets, frames, cognitive modeling, data compression techniques etc.

Another important area is algorithmic design and analysis which is vital for geo-scientific data organization and management. It is very much important that algorithms must be designed based on how the geo-scientific data need to be arranged, such that search, retrieval, modification, insertion may be made user friendly.

Finally, the geo-scientific data amalgamation would be successful only if proper indexing, security, integraty and standardization or bench markings are taken care off.

5. The US National Virtual Observatory: Developing Information Infrastructure in Astronomy
D. De Young, National Optical Astronomy Observatory, USA
A. Szalay, Johns Hopkins Univ, USA
R. Hanisch, Space Telescope Science Inst, USA
G. Helou, Cal Tech/IPAC, USA
R. Moore, San Diego Supercomputer Center, USA
E. Schreier, Space Telescope Science Inst, USA
R. Williams, Cal Tech/CACR, USA

The Virtual Observatory (VO) concept is rapidly becoming mature as a result of intensive activity in several countries. The VO will provide interoperability among very large and widely dispersed datasets, using current developments in computational and grid technology. As a result, the VO will open new frontiers of scientific investigation and public education through world-wide access to astronomical data sets and analysis tools. This paper will provide and overview of present VO activities in the US, together with a brief description of the future implementations of USNVO capabilities.

Track III-C-6:
Data Centers

Chair: David Clark, NOAA National Geophysical Data Center, USA

1. CODATA in Africa "The Nigeran Data Program"
Kingsley Oise Momodu, Chairman CODATA Nigeria, Faculty of Dentistry, University of Benin, Nigeria

Approval was granted for Nigeria's membership into CODATA International in May 1998, in response to an application by the Federal Ministry of Science and Technology. The Nigerian CODATA committee has as its mandate, the task of providing liason between the scientifc community in Nigeria and the International Scientific Community. The Nigerian CODATA committee participated in the fourth International Ministerial meeting of the United Nationas Economic Commission for Africa (ECA) on development information on Thursday 23rd November 200 at the National Planning Commission conference room in Abuja, Nigeria. At the meeting, CODATA made the following observations:

That new research projects tend to get much more attention than already completed ones.
The continued processing of data from old projects through secondary analysis is often neglected.
A lack of directories that describes what data sets exists, where they are located and how users can access them, which leads to unnecessary dublication of efforts.
Lack of a viable network among scientists.
That the existence of data is unknown outside the original scientific group or agencies that generated them and even if known, information is not provided for a potential user to access their relevance.
That scientists in Africa are fundamentally poorly paid.

The Nigerian CODATA is making spirited efforts to respond adequately to the necessities for preserving data by establishing a data program which represents a strategy for the compilation and dissemination of scientific data. This program will help the local Scientific community in Nigeria take adventage of the opportunities and expertise offered by CODATA International which has made a commitment to assist Scientific research institutes and the local Scientific community in the area of database development.

This initaitive is also designed to complement the activities of the task group on reliable scientific data sources in Africa. The scientific activity for year 2002 is a survey and cataloguing of potential data sources which will be web-based

2. The 'Centre de Données de la Physique des Plasmas' (CDPP, Plasma Physics Data Centre), a new generation of Data Centre
M. Nonon-Latapie, C.C. Harvey, Centre National d'Etudes Spatiales, France

The CDPP results from a joint initiative of the CNRS (Centre National de la Recherche Scientifique) and the CNES (Centre National d'Etudes Spatiales). Its principal objectives are to ensure the long term preservation of data relevant to the physics of naturally occurring plasmas, to render this data easily accessible, and to encourage its analysis. The data is produced by instruments, in space or on the ground, which study the ionised regions of space near the Earth and elsewhere in the solar system.

The principal users of this data centre are space scientists, wherever they are located. This data centre is located in Toulouse (France), and it uses a computer system which is accessible via the Internet (http://cdpp.cesr.fr/english/index.html). This system offers several services : firstly the possibility to search for and retrieve scientific data, but also access to the "metadata" archived in association with this data, as the relevant documentation and quicklook data (graphical representations). Several tools are available to help the user to search for data. The CDPP has been accessible since October 1999. Since then its data holding and the services offered have been steadily augmented and developed.

After a brief presentation of the objectives, the organisation, and the services currently offered by the CDPP, this paper will concentrate on :

the system architecture (based on the ISO "Reference Model for an Open Archival Information System")
the data model
the standards used to format and describe the archived data
the crucial operation of ingesting new data; this function is based on the description of all delivered data entities via a dictionary.

Operational experience and new technical developments now being studied will also be presented.

3. A Space Physics Archive Search Engine (SPASE) for Data Finding, Comparison, and Retrieval
James R. Thieman, National Space Science Data Center, NASA/GSFC, USA
Stephen Hughes and Daniel Crichton, NASA Jet Propulsion Laboratory, USA

The diversity and volume of space physics data available electronicallyhas become so great that it is presently impossible to keep track of what information exists from a particular time or region of space. With current technology (especially the World Wide Web - WWW) it is possible to provide an easy way to determine the existence and location of data of interest via queries to network services with a relatively simple user interface. An international group of space physics data centers is developing such an interface system, called the Space Physics Archive Search Engine (SPASE). Space physicists have a wealth of network-based research tools available to them, including mission- and facility-based data archive and catalogue services (with great depth of information for some projects). Many comprehensive lists of URL’s have been put together to provide a minimal search capability for data. One recent effort to gather a list of data sources resulted in an assembly of nearly 100 URL’s and many important archives had still been missed. These lists are difficult to maintain and change constantly. However, even with these lists it is not possible to ask a simple question such as “where can I find observations in the polar cusp in 1993?” without doing extensive, manual searches on separate data services.

The only hope for a comprehensive, automated search service is to have data centers/archives make their own information available to other data centers and to users in a manner that will facilitate multiarchive searching. Nearly all space physics data providers have WWW services that allow at least a basic search capability, and many also provide more specialized interfaces that support complex queries and/or complex data structures, but each of these services is different. The SPASE effort is creating a simple, XML-based common search capability and a common data dictionary that would allow users to search all participating archives with topics and time frames such as “polar cusp” and “the year 1993”. The result would be a list of archives with relevant data. More advanced services at later stages of the project would allow intercomparison of search results to find, for example, overlapping data intervals. Retrieval of the relevant data sets or parts of the data sets would also be supported. The first stages of the project are based on the application of Object Oriented Data Technology (OODT - see http://oodt.jpl.nasa.gov/about.html) to the cross archive search capability. The initial effort also includes the derivation of a common data dictionary for facilitating the searches. The current state of these efforts and plans for the future will be reviewed.

Track III-D-5:
Information Management Systems

Chair: Glen Newton, CISTI, National Research Council of Canada, Ontario, Canada

The efficient and effective collection and management of data, information and knowledge is becoming more difficult, due to the volume and complexity of this information. Greater demands on system architectures, system design, networks and protocols are the catalyst for innovative solutions in management of information.

Applications and systems being researched and developed capture dimensions of the various issues presented to the community, and represent aspects of new paradigms for future solutions.

Some of the areas to be examined include:

Systems for Coupling and Integrating Heterogeneous Data Sources
Web services for data
Decision Support Systems
Intelligent Agents, Multi-Agent Systems, Agent-Oriented Programming
Interactive and Multimedia Web Applications
Internet and Collaborative Computing
Multimedia Database Applications

1. XML-Based Factual Databases: A Case Study of Insect and Terrestrial Arthropod Animals
Taehee Kim Ph.D., School of Multimedia Engineering, Youngsan University, South Korea
Kang-Hyuk Lee, Ph.D., Department of Multimedia Engineering, Tongmyung University of Information Technology, South Korea

XML (eXtensible Markup Language) serves as the de facto standard for document exchange in many data applications and information technologies. Its application areas span from ecommerce to mobile communication. An XML document describes not only the data structure, but also the document semantics. Thus, a domain specific, and yet self-contained document could successfully be built by using XML. Building and servicing factual databases in the XML format could provide such benefits as easy data exchange, economic data abstraction, and thin interface to other XML applications like e-commerce.

This paper reports an implementation of factual databases and thier service in terms of XML technologies. The database of insect and terrestrial arthropod animals has been constructed as an example. The insect database contains the intrinsic information on characteristics of Korean insects while the terrestrial arthropod animal database contains the related bibliographical information. Data Types and document structures were implemented. Data type definitions (DTDs) were then implemented for data validation. Microsoft SQL Server incorporated with Active Service Pages was used as our implementation framework. A web database service had also been built.

Based on the implementation, this paper then discusses issues related to XML-based factual databases. We emphasize that document design ought to be carried out in order to achieve maximal compatibility and scalability. We then point out that an information service system could better be built by exploiting the self-describing characteristics of XML documents.

2. A comprehensive and efficient "OAIS compliant" data center based on standardized XML technologies
Thierry Levoir and Marco Freschi, Centre National d'Etudes Spatiales, France

The OAIS (Open Archival Information System) Reference Model provides a framework to create an archive (consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community). It offers also a real help to design: ingest, data management, administration and data access systems. XML stands for eXtensible Markup Language. XML starts as a way to mark up content, but it soon became clear that XML also provided a way to describe structured and semi-structured data thus making the usage as a data storage and interchange format. Many related languages, formats, technologies like SOAP, XML Query, XML-RPC, WSDL, Schema, ... are still coming to provide solutions to almost all problems!

With such technologies, we can define many different architectures. Due to the vastness of the problem, it is quite difficult to describe all the possible solutions, so the article is intended to describe a possible architecture of a system where, the organization of data and their usage, is defined in accordance with the OAIS reference model. The article takes cue on the needed to update an existing data center, providing some features like platform independence, human readable format of data and easy extensibility for new type of data. All these advantages seem to be supplied by XML and Java. XML and Java together can certainly be used to create some very interesting applications from application servers to better searchable web sites. It also offers an easy and efficient way to be interoperable.

However, it is sometimes very difficult to understand where everything really fits. The article attempts to clarify the role of each single object inside of a data center, providing as result, the complete description of a system including its architecture. A section of the article is also dedicated to the problem to make data persistent on the database, the choice of this support often involves an automatic choice of a query language to retrieve data from the database and a strategy to store them.

3. Informatics Based Design Of Materials
Krishna Rajan, Rensselaer Polytechnic Institute, USA

In this presentation we demonstrate the use of a variety of data mining tools for both classification and prediction of materials properties. Specific applications of a variety of multivariate analysis techniques are discussed. The use of such tools has to be coupled to a fundamental understanding of the physics and chemistry of the materials science issues. In this talk we demonstrate the use of informatics strategies with examples including the design of new semiconductor alloys and how we can extend the concept of bandgap engineering to the development of "virtual" materials. The use of the combination of these approaches when integrated with the correct types of descriptors, allows informatics methodologies to be a powerful computational methodology for materials design.

4. XML-based Metadata Management for INPA's Biological Data
J. L. Campos dos Santos, International Institute for Geo-Information Science and Earth Observation - ITC, The Netherlands and The National Institute for Amazon Research - INPA, Brazil
R. A. de By, International Institute for Geo-Information Science and Earth Observation - ITC, The Netherlands

For more than a century, Amazonian biological data have been collected, primarily by single or small group of researchers in small areas over relatively short periods of time. Questions on "how ecological patterns and processes vary in time and space, and what are the causes and consequences of this variability" are still in questioning. For such questions to be properly answered, far more documented data are required than could feasibly be collected, managed, and analysed in a single organisation. Since biological data sets are neither perfect nor intuitive, they are shared in a close range to the data producers, who know the subject. Few additional information are needed for data sets to be used and interpreted. Research teams outside of the specific subject area need highly detailed documentation to accurately interpret and analyse historic or long-term data sets, as well as, data from complex experiments.

Usually, researchers refer to their data as raw data, which are structured in rows and columns of numeric or encoded sampling observations. The usefulness of such data can only be assessed when they are associated to either a theoretical or conceptual model. This requires understanding of the type of variable, the units adopted, potential biases in the measurement, sampling methodology and a series of facts that are not represented in the raw data, but rather in the metadata. Data and metadata combined within a conceptual framework produces the so needed information. Additionally, information can be lost through degradation of the raw data or lack of metadata. The loss of metadata can occur throughout the period of data collection and the rate of loss can increase after the results of the research have been published or the experiment ends. Specific details are most likely to be lost due to the abandonment of data forms and field notes. Metadata will ensure to data users the ability to locate and understand data through time.

This paper presents an XML-based solution for the management of metadata biological profiles via the Web. We have adopted the FGDC Metadata Standard, which incorporates the Biological Data Profile, and is represented as an XML schema. The schema is mapped to a well-formed biological metadata template. The XML metadata template can be deployed to users together with an XML Editor. The editor uploads the XML file and allows them to insert all the metadata information. After this process, the biological metadata can be submitted for certification and stored in an XML repository. The repository accepts a large number of well-formed XML metadata and maintains a single data representation of all the files it receives. The metadata can be retrieved, updated, or removed from the repository that once it is indexed, search and query are available. This solution is in test at the National Institute for Amazon Research (INPA) within the Biological Collection Program.

5. Incorporation of Meta-Data in Content Management & e-Learning
Horst Bögel, Robert Spiske and Thurid Moenke, Department of Chemistry of the Martin-Luther-University Halle-Wittenberg, Germany

In nearly all scientific disciplines experiments, observations or computer simulations produce more and more data. Those data have to be stored, archived and retrieved for inspection or later re-use. In this context the inclusion of 'Meta-Data' becomes important to have access to certain data and pieces of information.

There are three developments recently to be taken into account:

Content Management Systems (CMS) are used for teamwork and 'timeline' co-operation
use of eXtended Hypertext Markup Language (XML) to separate the content from the layout for presentation (DTD, XSL, XPATH, XSLT)
online Learning using WEB-technology develops for a powerful multimedia-based education system

All those key-processes are based on huge amount of date and the relations between them (this can be called information).

If we want to keep pace with necessities, we have to develop and use these new techniques for making progress.

We report about some tools (written in Java) for handling of data and the development of a unique WWW-Based Learning five-years project (BMBF - Geman Federal Ministry for Education and Research) in chemistry, to incorporate data (3D-structures, spectra, properties) and their visualizations in order to favour a research-oriented way of learning. The students have access to computational methods in the network, to carry out open-end calculations using different methods (e.g. semi-empirical and ab initio MO calculations to generate the electronic structure and the orbitals of molecules).

Track IV-A-3:
Spatial Data Issues

Chair: Harlan Onsrud, University of Maine, USA

1. Spatio-Temporal Database Support for Long-Range Scientific Data
Martin Breunig, Institute of Environmental Sciences, University of Vechta, Germany
Serge Shumilov, Institute of Computer Science III, University of Bonn, Germany

Hitherto, database support for spatio-temporal applications is not yet part of standard DBMS. However, applications like telematics, navigation systems, medicine, geology, and others require database queries referring to the location of moving objects. In real time navigation systems, the position of large sets of moving cars has to be determined within seconds. In patient-based medical computer systems, to take another example, the relevant progress of diseases has to be examined during days, weeks or even years.

Finally, geological processes like the backward restoration of basins include time intervals of several 1000 years to be considered between every documented snapshot of the database. We restrict ourselves to the requirements of long-range applications like the simulation of geological processes. An example for a spatio-temporal database service in geology is given. The two components of this service provide version management and the temporal integrity checking of geo-objects, respectively. In the given example, the location change of a 3D moving object O(t) between two time steps ti and ti+1 may have three reasons: the location change of the partial or complete geometry of O(t) at time ti caused by geometric mappings like translation, rotation etc., the change of the scale of O (t) at time ti caused by zooming (change of the size of the object) or the change of the shape of O(t) at time ti caused by mappings of one or more single points of its geometry.

These three reasons can also occur in combination with each other. Furthermore, 3D moving objects may be decomposed into several components and later sequently merge to a single object again. The relevant database operations needed to map the objects into the database (compose and merge) give a mapping between the IDs of all objects between two directly following time steps. We show that the simulation of long-range processes can be effectively supported by set-oriented spatio-temporal database operations. Among other aspects, this leads to a better understanding of the history of geological rock formations. The specifications of the operations are given in C++ program code. We describe how the proposed spatio-temporal operations are integrated into GeoToolKit, an object-oriented database kernel system developed for the support of 3D/4D applications. In our future work we intend to evaluate the presented spatio-temporal database operations in benchmarks with large data sets within the open GeoToolKit system architecture as part of a public database service for spatio-temporal applications.

2. Web Visualization for Spatio-Temporal Referenced Multimedia Data
Paule-Annick Davoine and Hervé Martin, Laboratoire LSR-IMAG, Equipe SIGMA, France

Web and multimedia technologies enhance possibilities to develop software for managing, displaying and distributing spatially-referenced information. A lot of works deal with Internet based cartographic visualization. More and more geographic applications have to integrate both a temporal and a multimedia dimension. This kind of information appears more complex than usual spatial information linked with statistical data such as economical, demographic or ecological data. The main problem for Geographical Information Systems (GIS) is to merge qualitative and multimedia information with information related to time and space. Moreover, spatial and temporal references may be heterogeneous and discontinuous. Actually, current GIS are not suited to the use and to the visualization of this kind of information. In this paper, we show how multimedia web information systems may be used to model and to navigate across spatio-temporal referenced multimedia data. This work has been realized in the framework of an European project, named SPHERE, on historical natural hazards.

In a first time we explain how Unified Modelling Language (UML) language allows to take into account various user requirements. We focus on two important features of GIS: how to specify system functionalities and how to capture spatio-temporal referenced multimedia data. To illustrate the former point, we present the main functionalities of the web visualisation interface that we have implemented during the SPHERE project. We also explain the technological choices used to develop this tool. We have developed Java tool based on a client-server architecture including an Apache Web server and a client browser for running Java applets. This software allows to navigate across information space according to spatial and temporal features and to visualize simultaneously and interactively cartographic, temporal and documentary aspects of information.

Keywords: Web and Database, Geographical Information System (GIS), information system, multimedia, spatio-temporal referenced information, UML modelling, visualisation interface.

Track IV-A-5:
Information Infrastructure for Science and Technology

Horst Kremers, Eng., Comp. Sci., Berlin, Germany

The manageability of complex information systems for multidisciplinary cooperation and its use in decision support depends on basic methods and techniques that cover application layers, such as:

the role of basic sets of information in global and national information infrastructures
access, compatibility, and interoperability
documentation of information models
validation procedures and quality control
financial and legal aspects (including copyright)
enabling cooperation on information
archiving
education

This session offers opportunities to present best practices in information infrastructure, as well as discussing the methodological backgrounds and potential ways to support the creation of national and global interdisciplinary information infrastructures. The session, of course, has cross-links to other sessions in the CODATA conference. Topics here are to be discussed in their strategic importance with respect to enabling freedom of information, as well as enabling reliable communication and cooperation in the information society.

1. Exchange Of Heterogeneous Information Concepts And Systems
Hélène Bestougeff, CODATA - France
Jacques-Emile Dubois, ITODYS, Université de Paris VII - France and Past-President, CODATA

Today, especially with the development of networking and the internet, the exchange of heterogeneous information leading towards better interdisciplinary co-operation is a vital issue. In this framework, several technical and organizational problems must be solved. Integration deals with developing architectures and frameworks as well as techniques for integrating schemas and data. Different approaches to interrelate the source systems and user's queries are possible depending on the degree of coupling between the original data sources and the resulting system.

However, integration is just a first essential step towards more sophisticated architectures which are developed towards management decisions support. These architectures, grouped under the term of Data Warehouses are subject oriented and involve the integration of current and historical data.

The third aspect of heterogeneous information exchange is knowledge management, information mining systems, and web information management. The web is now part of almost all organizations. Therefore, the databases and the warehouses of specific networks endowed with adequate metadata have to operate on the web. Moreover, the management of unstructured and multimedia data such as text, images, audio and video presents new original challenges.

This paper will present, in a systematic way, concepts and systems dealing with these problems and drawing on particular results and examples from the just published book by Kluwer " Heterogeneous Information Exchange and Organizational Hubs".(H.Bestougeff, J.E. Dubois, B.Thuraisingham, Editors) containing 15 original chapters covering:

I Heterogeneous Database Integration: Concepts and Strategies
II Data warehousing: Models and Architectures
III Sharing Information and Knowledge.

2. Information Infrastructure: The Dynamic Mosaic
Horst Kremers, Eng., Comp. Sci., Berlin, Germany

The various aspects of Information Infrastructure presented and dicussed in this session and in contributions troughout this conference give an overview of the specific role of CODATA shaping and developing this field in its specific competence and interest. The mosaic that this conference shows can be completed as well as it can be clearly distinguished from from other activities in Information Infrastructure at national and at international level. In addition, the various contributions have shown that this field is under dynamic development. This allows the discussion of a potential CODATA strategic position and of actions that would promote this development because of its growing relevance in an appropriate way.

3. Cooperative Canadian/US Project: An Experiment in Sharing Geospatial Data Cross-Border
Milo Robinson, US Federal Geographic Data Committee, USA
Marc LeMaire, Mapping Services Branch, USA

The US Federal Geographic Data Committee (FGDC) and its Canadian counterpart, GeoConnections, have developed several cooperative projects to develop a common spatial data infrastructure. To better understand the challenges and complexities of transboundary spatial data issues, GeoConnections and the Federal Geographic Data Committee jointly funded two collaborative demonstration projects covering a common geographic project that crosses the border and addressing a common issue, that of sharing data with our neighbors. These collaborative projects cover the Red River Basin (Roseau River and Pembina River Basin) and the Yukon to Yellowstone (Crown of the Continent Study Area). The results of these international spatial data demonstration projects, as well as new joint activities, will be discussed.

4. Current Trends in the Global Spatial Data Infrastructure: Evolution from National to Global Focus
Alan R. Stevens, Global Spatial Data Infrastructure (GSDI) Secretariat, USA

In the late 1980's many organizations from state, local and tribal governments, the academic community, and the private sector within the United States came together to encourage common practices and uniform standards in digital geographic (map) data collection, processing, archiving, and sharing. The National Spatial Data Infrastructure (NSDI) encompasses policies, standards, and procedures for organizations to cooperatively produce and share georeferenced information and data. The major emphasis now has turned toward Geospatial One-stop initiatives, Implementation Teams (I-Teams), and Homeland Security for better governance, but all still aimed at facilitating the building of the NSDI.

In the mid '90s other nations began to recognize that tremendous efficiencies and cost savings could be realized by reducing duplicative data collection, procession, archive and distribution not only within their own borders but across international boundaries as well. A small group, at first, spawned what is now known as the Global Spatial Data Infrastructure (GSDI). This group now has grown to over 40 nations and consists of government agencies, NGO's, academic institutions, other global initiatives, and a significant contingent of the private sector in the geosopatial industry. Industry is excited to be involved because they realize that common standards will increase demand for data from domestic customers and will expand the awareness within emerging nations further increasing the client base. The GSDI has incorporated as a non-profit organization so it can partner with others in securing funds to encourage and accelerate the development of National and Regional Spatial Data Infrastructures in fledgling organizations and countries.

Track IV-B-3:
Data Portals

1. Information Society Technologies Promotion for New Independent States
A.D. Gvishiani, Director of the Center of Geophysical Data Studies and Telematics Applications IPE RAS, Russia
J. Babot, Head of the E Work Sector in European Commission, Belguim
J.Bonnin, Institut de Physique du Globe de Strasbourg, France

Recently proposed cluster project CLUSTER-PRO will unite and coordinate the activities of the five Information Society Technology (IST) program projects now running by the European Commission in Baltic (TELEBALT), Eastern European (E3WORK and TEAMwork) and CIS (WISTCIS and TELESOL) countries in order to promote new information technologies for scientific and technological data handling in these countries. French Committee on Data for Science and Technology (CODATA FRANCE) serves as the coordinator of the proposed CLUSTER-PRO project.

In the presentation, all the five projects under clustering will be described. The goals of these projects are to promote modern teleworking tools for scientific, technological and business data handling and exchange between EU member states and EU pre-accession and third countries. Main goal of the cluster is to create a common structure of concentration between the existing projects, with a common portal, cross exchange and adaptation of results, common action plan for dissemination. One of the objectives is the elaboration of the cross-project Web sites that will focus on new opportunities of teleworking in scientific and technological data acquisition and exchange. Education, research, business, tele-medicine, new employment opportunities promotion and environmental protection in all participating countries are among the cluster project objectives. Another objective is the cross-project training actions using the courses and e-learning systems developed by the five clustered projects, which will be adopted for the whole range of countries. Cross-project training actions will be implemented in face-to-face mode at CLUSTERPRO and the clustered projects gatherings, and virtually through the Web sites. An informationportal "New opportunities for EU-CLUSTER-PRO countries teleworking" will be developed.

2. Data, Information, and Knowledge Management of a Solar UV Data Network for Coating Materials
Lawrence J. Kaetzel, K-Systems, Prescott, Arizona, USA and
Jonathan W. Martin, National Institute of Standards and Technology, Gaithersburg, MD, USA

A major factor in understanding the performance of coating materials and products requires the use of systematic methods for acquiring, recording, and interpreting laboratory and field test results and environmental conditions. The verified data from these sources can then be used with computer-based models for more accurately predicting materials performance. A solar ultra-violet data network has been created by the National Institute of Standards and Technology, Gaithersburg, Maryland. The network measures, collects and archives weather measurements for use in predicting the performance of automotive and architectural coatings. Operation of the network is performed as a collaborative effort among several U.S. Government agencies and private industry organizations. The network currently consists of 8 field locations operating in the United States that are equipped with solar spectroradiometers and weather stations. Data from the network is evaluated and stored electronically, then used in scientific analysis as applied to materials performance and biological studies.

This paper presents the efforts to ensure data integrity; the methodologies used to represent the data, information, and knowledge in a consistent manner; and the computer-based methods (e.g., computer-based models, decision-support or smart modules) developed to assist the knowledge consumer in determining relevance and to assist in the interpretation of the measurements. The paper will first discuss the application of the data as applied to coating performance and its use with computer-based modules, followed by a discussion of knowledge management methods.

Last site update: 15 March 2003

	CODATA 2002: Frontiers of Scientific and Technical Data Montréal, Canada — 29 September - 3 October