Track I-C-5:
Data Archiving
Chair:
Seamus Ross
|
1.
Report of Activities of the CODATA Working Group on Archiving
Scientific Data
William Anderson, Praxis101, Rye, NY, USA
Steve Rossouw, South African National Committee for CODATA
Co-organizers: CODATA Working Group on Archiving Scientifc Data
A Working Group on Scientific Data Archiving was formed following
the 2000 International CODATA Conference in Baveno, Italy. The
Working Group has (1) built a list of anotated primary references
to published reports and existing scientific data archives,
(2) constructed a classification scheme to help organize and
expose the many issues and requirements of archiving, preserving,
and maintaining access to scientific and technical data, (3)
helped sponsor a workshop in South Africa on archiving scientific
and technical data, and (4) proposed collaborating with the
International Council for Scientific and Technical Information
(ICSTI) to build and maintain an internet portal focused on
scientific data and information archiving, preservation and
access. The objectives of these efforts is to provide scientists
and scientific data managers a framework of information and
references that can assist in securing the resources and commitments
needed to preserve and archive scientific data. This presentation
outlines the results of these efforts with the goal of stimulating
discussion of the organizing framework as well as the definitions
and relationships among identified issues.
2.
The NIST Data Gateway: Providing Easy Access to NIST Data Resources
Dorothy M. Blakeslee, Angela Y. Lee, and Alec J. Belsky, National
Institute of Standards and Technology, USA
The
National Institute of Standards and Technology (NIST) maintains
a wide range of scientific and technical data resources, including
free online data systems and PC databases available for purchase.
However, many people are not familiar with these various NIST
data collections and the types of data they contain. To help
scientists, engineers, and the general public find out quickly
and easily whether data they need are available at NIST, NIST
has built a web portal to NIST data resources. The first version
of this portal, the NIST Data Gateway (http://srdata.nist.gov/gateway),
provides easy access to 26 online NIST data systems and information
on 48 NIST PC databases. NIST Data Gateway users can specify
a keyword, property, or substance name to find the NIST data
resources that contain standard reference data meeting their
search criteria. When users find a data resource they want to
use, links are provided so they can access or order that resource.
In this paper, we describe how version 1.0 of the NIST Data
Gateway was built and discuss some of the issues that arose
during the design and implementation stages. We include experience
we gained that we hope will be useful to others building data
portals. We also discuss future plans for the NIST Data Gateway,
including efforts to provide access to additional NIST data
resources.
3. Long Term Data Storage: Are We Getting
Closer to a Solution?
A. Stander and N. Van der Merwe, Department of Information
Systems, University of Cape Town, South Africa
Steve F. Rossouw, South Africa National Committee for CODATA,
South Africa
Many scientific and socioeconomic reasons exist for the long
term retention of scientific and lately also business data.
To do so successfully, the solution must be affordable and also
technologically flexible enough to survive the many technology
changes during its useful life. This paper looks at the current
status of available technology for long term data storage, more
specific the standards that exist for data interchange, the
creation and storage of metadata, data conversion problems and
the reliability and suitability of digital storage media. Even
if in the ideal format, application and database management
software is needed to store and retrieve the data.
Typically the life expectancy of such software is much shorter
than that of the storage media and as this has already been
the cause of major data loss, possible solutions are investigated.
Most research into long term data storage focus on large to
very large databases. It is often forgotten that small, but
very important pockets of scientific data exist on the computers
of individual researchers or smaller institutions. As most of
the time this is stored in application specific formats with
a short lifespan, strategies for the preservation of smaller
amounts of data are also looked at.
4.
Prototype of TRC Integrated Information System for Physicochemical
Properties of Organic Compounds: Evaluated Data, Models, and
Knowledge
Xinjian Yan, Thermodynamics Research Center (TRC), National
Institute of Standards and Technology, USA
Qian Dong, Xiangrong Hong, Robert D. Chirico and Michael Frenkel
Physicochemical
property data are crucial for industrial process development
and scientific research. However, such data that have been experimentally
determined are not only very limited, but also deficient in
critical evaluations. Moreover, the models developed for the
prediction of physicochemical property have rarely been presented
with sufficient examination. This situation makes it very difficult
to understand the data that are obtained from reference books,
databases or models after a time-consuming effort. Therefore,
we aim at developing a comprehensive system, TRC Integrated
Information System (TIIS), which consists of evaluated data,
models, knowledge and functions to infer, and then to recommend,
the best data and models. Additionally, it provides valuable
information for users to have a better understanding of physicochemical
property data, models, and theory.
Evaluated physicochemical property data in TIIS are mainly selected
from the TRC Source data system, which is an extensive repository
system of experimental physicochemical properties and relevant
measurement information. Data uncertainty and reliability are
analyzed based on scientific data principles, statistics, and
highly evaluated property models. Information about experimental
condition, data processing, etc., is recorded in a detailed
way.
Reliability of the data predicted by a model cannot be determined
without a full description of the model's ability.
Each model in TIIS is carefully examined by using evaluated
data, with emphasis on the predictive ability for calculating
the compounds not used in processing the model's parameters,
and applicable compound classes, for which the model can produce
reasonably good property data. For a given compound, the best
predictive value is recommended according to models' performances
in calculating evaluated data set. TIIS also provides regression
analyses and optimization functions so that users are able to
process model parameters by using the current best experimental
data set for a particular compound.
A property value, a model or a chemical system cannot be fully
understood without sufficient supporting information. Therefore,
the knowledge that describes characteristics of property data,
models, molecular structures, and the results from theoretical
analysis and calculation, is provided by TIIS.
5. An Introduction of CODATA-China
Physical and Chemical Database Information System
Xiao Yun, Secretary General, Chinese National Committee
for CODATA, China
Yan Baoping, Director, Computer Network Information Center,
CAS, China
Zhang Hui, Secretary, Chinese National Committee for CODATA,
China
Jin Huanian, Engineer, Computer Network Information Center,
CAS, China
In 2001 the Chinese Ministry of Science and Technology made
the decision to bring the data center coordinated by CODATA-China
into the basic work of the National Key Research Development
Program, rendering long-term support for the accumulation, development
and utilization of the technological basic data work by starting
the special technological basic project.
A database
information service system is expected to be set up within 3
to 5 years with the CODATA-Chian Physical and Chemical Database
Information System as the main body, involving the subjects
of agriculture, forestry, mechanism, material, biology, etc.,
so as to form a centered group of CODATA-China Physical and
Chemical Database Information System, which, targeting the field
of mathematics, physics, and chemistry, is able to provide basic
and applied data for the scientific research and production.
At present the data contained in CODATA-China Physical and Chemical
Database Information System mainly includes: the Chinese nuclear
data, the Chinese atom and molecule data, the Chinese chemistry
and chemical industry data, the geothermodynamics data, the
chemdynamics data, the Chinese aviation material data, and the
Chinese feedstuff technology data. The Computer Network Information
Center, CAS, will work as the general center,providing this
project with service platform and technologic support based
on centralized management assisted by distributed management.
Relying on high-performance Unix server and the database management
system of Oracle, the data application service platform of superb
usability and efficiency will be developed based on the high-performance
and transplantable software development language of JAVA. The
advanced full text retrieval system in China, the TRS Full Text
Retrieval System, will be used to provide highly efficient and
reliable service of full text data retrieval, and the data service
will be realized in the Web mode through Internet.
Track
I-C-6:
Ingénierie de la veille technologique et
de l'intelligence économique
(Data for Competitive Technical and Economic Intelligence)
Chair: Clément Paoli, Université
MLV, France
La
production d'information élaborée
à partir de l'analyse mathématique
et linguistique des sources d'information électroniques
contenant des données scientifiques factuelles
et technologiques textuelles, constitue la matière
première des décisions stratégiques.
Le développement des méthodes et
outils logiciels permettant un criblage systématique
des sources d'information provenant des banques
de données en ligne et d'Internet permet
d'obtenir des corpus d'information à forte
valeur ajoutée.
Le management de la connaissance repose sur l'obtention
rapide et de qualité de données
et d'information élaborée. Les techniques
d'amélioration de ces données sont
souvent associées avec les logiciels de
traitement retenus pour les problèmes d'intelligence
économiques.
Les
problèmes de standardisation et d'interopérabilité
des systèmes locaux entre eux et avec l'information
externe constituent des bases de discussions et
d'échanges de vue rechercher dans cette
session.
Les principaux thème proposés ici
sont exposés pour le ouvrir le débat,
d'autres propositions seront évaluées.
-
Accès
aux sources d'information : moteurs d'interrogation
-
Représentation
des connaissances : traitements sémantiques
- cartographie - imagerie
-
Méthodes
et outils d'analyse statistique : analyse statique
et en ligne
-
Analyse
mathématiques de données : AFC,
Classifications hiérarchisées
-
Méthodes
et outils linguistiques : extraction terminologique
(analyse sémantique)
-
Data
Mining , Texte Mining : interopérabilité
et bases hétérogènes clustérisation
-
Knowledge
Management : l'information comme ressource
-
Vulnérabilités
informationnelles : qualité des données
et des informations, protection
|
1.
Ingénierie de la veille pédagogique et gestion
des connaissances en enseignement supérieur (Data
for competitive pedagogy and knowledge management in higher
education)
Jean-Paul Pinte, Université Marne La Vallée, France
L'enseignement universitaire est condamné à se
renouveler, à redéfinir ses paradigmes, sinon
il se sclérose.
Des changements pour ces derniers sont apparus depuis quelques
années dans de nombreux domaines comme le téléphone
sans fil, la médecine préventive, l'écologie,
la mondialisation
Pour ce qui est de l'enseignement nous sommes entrés
dans le paradigme de l'apprentissage.
Les pressions nous viennent principalement du monde du travail
avec entre autres la création de nouveaux environnements
de travail, l'apparition de nouvelles caractéristiques
de clientèles, une explosion des connaissances et des
ressources, le développement fulgurant des Technologies
de l'Information et de la Communication, et, surtout, l'arrivée
de nouveaux étudiants de tout âge, de toute provenance,
avec des motivations et des compétences diversifiées
à l'extrême.
En dehors de l'enseignement de matières au niveau le
plus élevé de la connaissance, de la recherche
et de la production de savoirs, l'université assure aujourd'hui
un troisième rôle économique et social ayant
pour objectif la production de valeur ajoutée et débouchant
sur la recherche finalisée.
L'économie du savoir supplante l'économie matérielle
et les universités sont de plus en plus "entrepreneuriales".
Une remise en cause profonde de l'université est déjà
en cours. Elle vise à s'ouvrir à ce nouveau rôle
par la mise en place de pédagogies " actives ",
de formations ouvertes et à distance (E-learning, campus
virtuels, numériques,
).
Avec les TIC on ne peut plus enseigner comme avant.
Des TIC aux TIC, il nous faut maintenant passer à "
Technologies pour l'Intelligence et la Connaissance " .
La veille pédagogique est une des principales clés
de réussite pour accompagner ce changement.
2.
L'analyse des mots associés pour l'information non scientifique
(Co-Word analysis for non scientific information)
Bertrand Delecroix
and Renaud Eppstein, ISIS/CESD, Université de Marne La
Vallée, France
Co-word analysis is based on a sociological theory developed
by the CSI and the SERPIA (Callon,Courtial,Turner,Michelet)
in the middle of the eighties. It measures association strength
between terms in documents to reveal and visualise evolution
of science through the construction of clusters and strategic
diagram. Since, this method has been successfully pplied to
investigate the structure of many scientific fields. Nowadays
it occurs in many software systems which are used by companies
to improve their business and define their strategy but the
relevance in this kind of application has not been proved yet.
Through the example of economic and marketing information on
DSL technologies from Reuters Business Briefing, this presentation
gives an interpretation of co-word analysis for this kind of
information. After an outlook of the software we used (Sampler
and LexiMine) and after a survey of the experimental protocol,
we investigate and explain each step of the co-word analysis
process : terminological extraction, computation of clusters
and strategic diagram. In particular, we explain the meaning
of every parameter of the method : the choice of variables and
similarity measures is discussed. Finally we try to give global
interpretation of the method in an economic context. Further
studies will be added to this work in order to allow a generalisation
of these results.
Keywords : clustering, co-word analysis, competitive intelligence
3.
Stratégies du partenariat scientifique entre les pays
de l' UE et les pays en développement : indicateurs bibliométriques
P.L. Rossi , IRD, Centre d'Ile de France, Bondy, France
L'exploitation de la base de données bibliographique
Science Citation Index (SCI) de l'ISI (Institute for Scientific
Information, Philadelphie) permet de concevoir des indicateurs
bibliométriques servant à caractériser
les stratégies de partenariat scientifique qui existent
entre les pays de l'Union européenne et les pays en développement.
Les données disponibles dans la base SCI que nous avons
exploité permettent d'établir de multiples indicateurs
concernant les productions scientifiques des pays, de leurs
régions, de leurs institutions, des politiques scientifiques
nationales, des proximités et des affinités entre
différents acteurs. Ils sont regroupés dans des
indicateurs de productions scientifiques, des indicateurs de
spécialisation, des indicateurs relationnels.
Cette étude a été réalisée
sur les données bibliographiques de la période
1987-2001 et concerne les pays de l'Union européenne
en plus des pays de l'Afrique, de l'Amérique latine ainsi
que de l'Asie.
En ce qui concerne les stratégies de partenariat scientifique
entre les 15 pays de l'Union européenne et les pays du
continent africain, trois " grandes " catégories
de partenaires européen peuvent être définies
:
-
les pays de l'Union européenne qui
ont un profil de partenariat " proche " des principaux
producteurs scientifiques africains : l'Autriche, l'Allemagne,
l'Espagne, la Finlande et l'Italie,
-
les pays de l'Union européenne qui
ont un profil de partenariat avec un " engagement "
fort en Afrique subsaharienne : le Danemark, les Pays Bas
et la Suède,
-
les pays de l'Union européenne qui
ont un profil de partenariat avec des pays pour lesquels des
relations liées à l'histoire coloniale et à
la langue existent : la Belgique, la France et la Grande Bretagne.
Pour ces deux derniers pays est à signaler
la différence de l'impact qu'ils ont sur les productions
nationales de leurs principaux partenaires : très important
pour la France, plus modéré pour la Grande Bretagne.
4.
La fusion analytique Data/Texte : nouvel enjeu de l'analyse
avancée de l'information (The merging of structured
and unstructured data : the new challenge of advanced information
analytics)
J.F. Marcotorchino, Kalima Group
Track III-C-4:
Attaining Data Interoperabilty
Chair: Richard Chinman, University Corporation for Atmospheric
Research, Boulder, CO, USA
Interoperability
can be characterized as the ability of two or more
autonomous, heterogeneous, distributed digital entities
(e.g., systems, applications, procedures, directories,
inventories, data sets, ...) to communicate and cooperate
among themselves despite differences in language,
context, or content. These entities should be able
to interact with one another in meaningful ways without
special effort by the user - the data producer or
consumer - be it human or machine.
By becoming
interoperable the scientific and technical data communities
gain the ability to better utilize their own data
internally and become more visible, accessible, usable,
and responsive to their increasingly sophisticated
user community. When "the network is the computer",
interoperability is critical to fully take advantage
of data collections and repositories.
Two sets
of issues affect the extent to which digital entities
efficiently and conveniently interoperate: syntactic
and semantic interoperability.
Syntactic
interoperability involves the use of communication,
transport, storage and representation standards. For
digital entities to interoperate syntactically, information
(metadata) about the data types and structures at
the computer level, the syntax of the data, is exchanged.
However, if the entities are to do something meaningful
with the data, syntactic interoperability is not enough,
semantic interoperability is also required.
Semantic
interoperability requires that a consistent interpretation
of term Usage and meaning occur. For digital entities
to interoperate semantically, consistent information
(metadata) about the content of the data - what the
basic variable names are and mean, what their units
are, what their ranges are - is exchanged. This information
can be referred to as the semantic search metadata,
since it can be used to search for (and locate) data
of interest to the user. However, this is not the
metadata that is required for semantic interoperability
at the data level, although it does contain some of
the same elements. The semantic information required
for machine-to-machine interoperability at the data
level is the information required to make use of the
data. For example, the variable T is sea surface temperature,
the data values correspond to ºC divided by 0.125,
missing values are represented by -999, ... This information
can be referred to as the semantic use metadata. Without
this information, the digital entity, be it machine
or application, cannot properly label the axes of
plots of the data or merge them with data from other
sources without intervention from a knowledgeable
human.
There
are other sets of issues that affect interoperability:
-
Political/Human
Interoperability
-
Inter-disciplinary
Interoperability
-
-
International
Interoperability
This session
is about all sets of interoperability issues, but
especially focused on attaining semantic interoperability
at the data level.
|
1.
Interoperability in a Distributed, Heterogeneous Data Environment:
The OPeNDAP Example
Peter Cornillon, Graduate School of Oceanography, University
of Rhode Island, USA
Data system interoperability
in a distributed, heterogeneous environment requires a consistent
description of both the syntax and semantics of the accessible
datasets. The syntax describes elements of the dataset related
to its structure or organization, the contained data types and
operations that are permitted on the data by data system elements.
The semantics give meaning to the data values in the dataset.
The syntactic and semantic description of a dataset form a subset
of the metadata used to describe it; other metadata often associated
with a dataset are fields that describe how the data were collected,
calibrated, who collected them, etc. Although important, indeed
often essential, to meaningfully interpret the data, these additional
fields are not required for machine-to-machine interoperability
(Level 3 Interoperability) in a data system. We refer to semantic
metadata required to locate a data source of interest as semantic
search metadata and semantic metadata required to use the data,
for example to label the axes of a plot of the data or to exclude
missing values from subsequent analysis, as semantic use metadata.
In this presentation,
we summarize the basic metadata objects required to achieve
Level 3 Interoperability in the context of an infrastructure
that has been developed by the Open source Project for a Network
Data Access Protocol (OPeNDAP) and how this infrastructure is
being used by the oceanographic community in the community-based
National Virtual Ocean Data System (NVODS). At present, in excess
of 400 data sets are being served from approximately 40 sites
in the US, Great Britain, France, Korea and Australia. These
data are stored in a variety of formats ranging from user developed
flat files to SQL RDBMS to sophisticated formats with well defined
APIs such as netCDF and HDF. A number of application packages
(Matlab, IDL, VisAD, ODV, Ferret, ncBrowse and GrADS) have also
been OPeNDAP-enabled allowing users of these packages to access
subsets of data sets of interest directly over the network.
2.
Interoperable data delivery in solar-terrestrial applications:
adopting and evolving OpENDAP
Peter Fox, Jose Garcia, Patrick West, National Center for
Atmospheric Research, USA
The High Altitude
Observatory (HAO) division of NCAR investigates the sun and
the earth's space environment, focusing on the physical processes
that govern the sun, the interplanetary environment, and the
earth's upper atmosphere.
We present details on how interoperability within a set of data
systems support by HAO and collaborators has driven the implementation
of services around the Data Access Protocol (DAP) originating
in the Distributed Oceanographic Data System (DODS) project.
The outgrowth of this is the OpENDAP - an open source project
to provide reference implementations of the DAP and its core
services.
We will present the recent design and development details of
the services built around the DAP, including interfaces to common
application programs, like the Interactive Data Language, the
web, and server side data format translation and related services.
We also present examples of this interoperability in a number
of science discipline and technology areas: the Coupling, Energetics
and Dynamics of Atmospheric Regions (CEDAR) program, the Radiative
Inputs from Sun to Earth (RISE) program, the Earth System Grid
II project, and the Space Physics and Aeronomy Collaboratory.
3.
The Earth System Grid: Turning Climate Datasets Into Community
Resources
Ethan Alpert, NCAR, Boulder, CO, USA
David Bernholdt, Oak Ridge National Laboratory, Oak Ridge, TN,
USA
David Brown, NCAR, Boulder, CO, USA
Kasidit Chancio, Oak Ridge National Laboratory, Oak Ridge, TN,
USA
Ann Chervenak, USC/ISI, Marina del Ray, CA, USA
Luca Cinquini, NCAR, Boulder, CO, USA
Bob Drach, Lawrence Livermore National Laboratory, Livermore,
CA, USA
Ian Foster, Argonne National Laboratory, Argonne, IL, USA
Peter Fox, NCAR, Boulder, CO, USA
Jose Garcia, NCAR, Boulder, CO, USA
Carl Kesselman, USC/ISI, Marina del Ray, CA, USA
Veronika Nefedova, Argonne National Laboratory, Argonne, IL,
USA
Don Middleton, NCAR, Boulder, CO, USA
Line Pouchard, Oak Ridge National Laboratory, Oak Ridge, TN,
USA
Arie Shoshani, Lawrence Berkeley National Laboratory, Berkeley,
CA, USA
Alex Sim, Lawrence Berkeley National Laboratory, Berkeley, CA,
USA
Gary Strand, NCAR, Boulder, CO, USA
Dean Williams, Lawrence Livermore National Laboratory, Livermore,
CA, USA
Global coupled Earth System models are vital tools for understanding
potential future changes in our climate. As we move towards
mid-decade, we will see new model realizations with higher grid
resolution and the integration of many additional complex processes.
The U.S. Department of Energy (DOE) is supporting an advanced
climate simulation program that is aimed at accelerating the
execution of climate models one hundred-fold by 2005 relative
to the execution rate of today. This program, and other similar
modeling and observational programs, are producing terabytes
of data today and will produce petabytes in the future. This
tremendous volume of data has the potential to revolutionize
our understanding of our global Earth System. In order for this
potential to be realized, geographically distributed teams of
researchers must be able to manage and effectively and rapidly
develop new knowledge from these massive, distributed data holdings
and share the results with a broad community of other researchers,
assessment groups, policy makers, and educators.
The Earth System Grid II (ESG-II), sponsored by the U.S. Dept.
of Energy's Scientific Discovery Through Advanced Computing
(SciDAC) program, is aimed at addressing this important challenge.
The broad goal is to develop next generation tools that harness
the combined potential of massive distributed data resources,
remote computation, and high-bandwidth wide-area networks as
an integrated resource for the research scientist. This integrative
project spans a variety of technologies including Grid and DataGrid
technology, the Globus Toolkit™, security infrastructure,
OPeNDAP, metadata services, and climate analysis environments.
In this presentation we will discuss goals, technical challenges,
and emerging relationships with other related projects worldwide.
4.
Geoscientific Data Amalgamation: A Computer Science Approach
N. L. Mohan, Osmania University, India
Rapidly changing
environment of data, information and knowledge and their communication
and exchange has created multidimensional opportunity for researcher
that it could (i) avoid duplication of work, increases - (ii)
the competitive spirit, (iii) the high quality of research and
(iv) the interdisciplinary areas of research. In this context
amalgamation of geo-scientific data assumes greater importance.
The amalgamation and availability of numerical form of geo-scientific
data are possible based on fundamental premise that it is open
to public domain and quality of data is assured.
Availability of data pertaining to earth related sciences in
general and geophysical data in particular can be classified
into three categories -(a) Raw data form, (b) Processed or filtered
form of raw data and (c) Theoretically computed form. Further,
data are of two types - Static mode and Dynamic mode as certain
type of data pertaining to exploration geophysical methods like
gravity, magnetic, electrical, seismic, well logging etc are
Static type. That is once the data is acquired it may remain
static. On the contrary, the data pertaining to earth tides,
geomagnetic field records, earthquake seismograms, Satellite
data etc are of Dynamic type. One more dimension of data availability
is Model Construction, based on Expert System Shells- an Artificial
Intelligence approach. The Final version of data availability
is through two formats- Graphical and Image forms.
It is not the question to make available numerical data but
how to organize, manage, and update are challenging aspects
in Earth-related sciences. Unlike other fields of science and
engineering, geo-scientific data management needs altogether
special approach. The author believes that geo-scientific community
may need to understand certain important areas of computer science
so that they could guide computer specialists appropriately
to manage data efficiently and communicate to his earth science
community effectively, in 2-D, 3-D numerical form, including
graphical and imagery forms.
Data Base System is a computerized record-keeping system. Several
operations like Adding data to new and empty files, Inserting
data into existing files, Retrieving data from existing files,
Changing data in existing files, Deleting data from existing
files and Removing existing files are involved in record keeping
system. Further, Data Base Architecture comprises three levels-
(i) Internal Level, a centralized storage system as it would
help to store all types of geo-scientific data at one place;
(ii) Conceptual level, a specific type of data storage system
as it would help to track the data by specialist in the concerned
specialized area like earthquake data and (iii) External level,
a user node connectivities as simultaneously number of users
can track different types of geo-scientific data according to
ones own interest. Also, from another angle an abstract view
of the data can be broadly segmented into three levels - (a)
the physical level gives the idea how the data are actually
stored; (b) logical level indicates that what data are stored
in the data base, and what relationships exist among those data
and (c) view level represents only a part of entire data base.
One of the most important data bases, particularly form the
point of geo-scientific data organization, is object oriented
data system that would play a predominant role to organize several
different sets of data and could refer to each other for better
understanding, modeling and refining the models and making meaningful
and correct inferences etc. The object oriented data arrangement
is based on concepts like (a) inheritance, (b) polymorphism,
(c) multiple inheritance etc. These respective concepts would
help to inherit certain properties from parent entity apart
from its own; a function with same name can perform different
tasks by taking different sets of data; and a class or set of
different classes can inherit different properties from several
classes.
Parallel and distributed data bases do play important but limited
roles in certain contexts of geo-scientific data amalgamation
systems. That is, these data base approaches may help within
a geo-scientific organization where certain confidentiality
is required and does not want to throw on public domain initially
for some time.
Artificial Intelligence is the most promising area that geo-scientific
community should look into for data organization, modeling,
searching, multi-dimensional construction and view of graphical
and image models and data management etc. In view of data communication
through high speed, wide band and wireless inter-net domain
systems, Knowledge Base and expert System Shells dominating
the scene. Artificial Intelligence envelopes several areas like,
data bases, object oriented representation, search and control
strategies, matching techniques, knowledge organization and
management, pattern recognition, visual image understanding,
expert system architecture, machine learning, several types
of learning techniques that include neural networks, problem
solving methods, robotics, semantic nets, frames, cognitive
modeling, data compression techniques etc.
Another important area is algorithmic design and analysis which
is vital for geo-scientific data organization and management.
It is very much important that algorithms must be designed based
on how the geo-scientific data need to be arranged, such that
search, retrieval, modification, insertion may be made user
friendly.
Finally, the geo-scientific data amalgamation would be successful
only if proper indexing, security, integraty and standardization
or bench markings are taken care off.
5.
The US National Virtual Observatory: Developing Information
Infrastructure in Astronomy
D. De Young, National Optical Astronomy Observatory, USA
A. Szalay, Johns Hopkins Univ, USA
R. Hanisch, Space Telescope Science Inst, USA
G. Helou, Cal Tech/IPAC, USA
R. Moore, San Diego Supercomputer Center, USA
E. Schreier, Space Telescope Science Inst, USA
R. Williams, Cal Tech/CACR, USA
The Virtual Observatory (VO) concept is rapidly becoming mature
as a result of intensive activity in several countries. The
VO will provide interoperability among very large and widely
dispersed datasets, using current developments in computational
and grid technology. As a result, the VO will open new frontiers
of scientific investigation and public education through world-wide
access to astronomical data sets and analysis tools. This paper
will provide and overview of present VO activities in the US,
together with a brief description of the future implementations
of USNVO capabilities.
Track III-C-6:
Data Centers
Chair: David Clark, NOAA National Geophysical Data Center,
USA
|
1.
CODATA in Africa "The Nigeran Data Program"
Kingsley Oise Momodu, Chairman CODATA Nigeria, Faculty of Dentistry,
University of Benin, Nigeria
Approval was granted
for Nigeria's membership into CODATA International in May 1998,
in response to an application by the Federal Ministry of Science
and Technology. The Nigerian CODATA committee has as its mandate,
the task of providing liason between the scientifc community
in Nigeria and the International Scientific Community. The Nigerian
CODATA committee participated in the fourth International Ministerial
meeting of the United Nationas Economic Commission for Africa
(ECA) on development information on Thursday 23rd November 200
at the National Planning Commission conference room in Abuja,
Nigeria. At the meeting, CODATA made the following observations:
-
That new research projects tend to get much more attention
than already completed ones.
-
The continued processing of data from old projects through
secondary analysis is often neglected.
-
A lack of directories that describes what data sets exists,
where they are located and how users can access them, which
leads to unnecessary dublication of efforts.
-
Lack of a viable network among scientists.
-
That the existence of data is unknown outside the original
scientific group or agencies that generated them and even
if known, information is not provided for a potential user
to access their relevance.
-
That scientists in Africa are fundamentally poorly paid.
The Nigerian CODATA
is making spirited efforts to respond adequately to the necessities
for preserving data by establishing a data program which represents
a strategy for the compilation and dissemination of scientific
data. This program will help the local Scientific community
in Nigeria take adventage of the opportunities and expertise
offered by CODATA International which has made a commitment
to assist Scientific research institutes and the local Scientific
community in the area of database development.
This initaitive is also designed to complement the activities
of the task group on reliable scientific data sources in Africa.
The scientific activity for year 2002 is a survey and cataloguing
of potential data sources which will be web-based
2. The 'Centre de Données
de la Physique des Plasmas' (CDPP, Plasma Physics Data Centre),
a new generation of Data Centre
M. Nonon-Latapie, C.C. Harvey, Centre National d'Etudes Spatiales,
France
The CDPP results
from a joint initiative of the CNRS (Centre National de la Recherche
Scientifique) and the CNES (Centre National d'Etudes Spatiales).
Its principal objectives are to ensure the long term preservation
of data relevant to the physics of naturally occurring plasmas,
to render this data easily accessible, and to encourage its
analysis. The data is produced by instruments, in space or on
the ground, which study the ionised regions of space near the
Earth and elsewhere in the solar system.
The principal users of this data centre are space scientists,
wherever they are located. This data centre is located in Toulouse
(France), and it uses a computer system which is accessible
via the Internet (http://cdpp.cesr.fr/english/index.html). This
system offers several services : firstly the possibility to
search for and retrieve scientific data, but also access to
the "metadata" archived in association with this data,
as the relevant documentation and quicklook data (graphical
representations). Several tools are available to help the user
to search for data. The CDPP has been accessible since October
1999. Since then its data holding and the services offered have
been steadily augmented and developed.
After a brief presentation of the objectives, the organisation,
and the services currently offered by the CDPP, this paper will
concentrate on :
-
the
system architecture (based on the ISO "Reference Model
for an Open Archival Information System")
-
-
the standards used to format and describe the archived data
-
the crucial operation of ingesting new data; this function
is based on the description of all delivered data entities
via a dictionary.
Operational experience
and new technical developments now being studied will also be
presented.
3. A Space Physics Archive Search Engine
(SPASE) for Data Finding, Comparison, and Retrieval
James R. Thieman, National Space Science Data Center, NASA/GSFC,
USA
Stephen Hughes and Daniel Crichton, NASA Jet Propulsion Laboratory,
USA
The diversity and volume of space physics data available electronicallyhas
become so great that it is presently impossible to keep track
of what information exists from a particular time or region
of space. With current technology (especially the World Wide
Web - WWW) it is possible to provide an easy way to determine
the existence and location of data of interest via queries to
network services with a relatively simple user interface. An
international group of space physics data centers is developing
such an interface system, called the Space Physics Archive Search
Engine (SPASE). Space physicists have a wealth of network-based
research tools available to them, including mission- and facility-based
data archive and catalogue services (with great depth of information
for some projects). Many comprehensive lists of URLs have
been put together to provide a minimal search capability for
data. One recent effort to gather a list of data sources resulted
in an assembly of nearly 100 URLs and many important archives
had still been missed. These lists are difficult to maintain
and change constantly. However, even with these lists it is
not possible to ask a simple question such as where can
I find observations in the polar cusp in 1993? without
doing extensive, manual searches on separate data services.
The only hope for a comprehensive, automated search service
is to have data centers/archives make their own information
available to other data centers and to users in a manner that
will facilitate multiarchive searching. Nearly all space physics
data providers have WWW services that allow at least a basic
search capability, and many also provide more specialized interfaces
that support complex queries and/or complex data structures,
but each of these services is different. The SPASE effort is
creating a simple, XML-based common search capability and a
common data dictionary that would allow users to search all
participating archives with topics and time frames such as polar
cusp and the year 1993. The result would be
a list of archives with relevant data. More advanced services
at later stages of the project would allow intercomparison of
search results to find, for example, overlapping data intervals.
Retrieval of the relevant data sets or parts of the data sets
would also be supported. The first stages of the project are
based on the application of Object Oriented Data Technology
(OODT - see http://oodt.jpl.nasa.gov/about.html) to the cross
archive search capability. The initial effort also includes
the derivation of a common data dictionary for facilitating
the searches. The current state of these efforts and plans for
the future will be reviewed.
Track III-D-5:
Information Management Systems
Chair: Glen Newton, CISTI, National Research Council of
Canada, Ontario, Canada
The efficient
and effective collection and management of data, information
and knowledge is becoming more difficult, due to the
volume and complexity of this information. Greater
demands on system architectures, system design, networks
and protocols are the catalyst for innovative solutions
in management of information.
Applications and systems being researched and developed
capture dimensions of the various issues presented
to the community, and represent aspects of new paradigms
for future solutions.
Some of the areas to be examined include:
-
Systems
for Coupling and Integrating Heterogeneous Data
Sources
-
-
-
Intelligent Agents, Multi-Agent Systems, Agent-Oriented
Programming
-
Interactive and Multimedia Web Applications
-
Internet and Collaborative Computing
-
Multimedia Database Applications
|
1.
XML-Based Factual Databases: A Case Study of Insect and Terrestrial
Arthropod Animals
Taehee Kim Ph.D., School of Multimedia Engineering, Youngsan
University, South Korea
Kang-Hyuk Lee, Ph.D., Department of Multimedia Engineering,
Tongmyung University of Information Technology, South Korea
XML (eXtensible Markup Language) serves as the de facto standard
for document exchange in many data applications and information
technologies. Its application areas span from ecommerce to mobile
communication. An XML document describes not only the data structure,
but also the document semantics. Thus, a domain specific, and
yet self-contained document could successfully be built by using
XML. Building and servicing factual databases in the XML format
could provide such benefits as easy data exchange, economic
data abstraction, and thin interface to other XML applications
like e-commerce.
This paper reports an implementation of factual databases and
thier service in terms of XML technologies. The database of
insect and terrestrial arthropod animals has been constructed
as an example. The insect database contains the intrinsic information
on characteristics of Korean insects while the terrestrial arthropod
animal database contains the related bibliographical information.
Data Types and document structures were implemented. Data type
definitions (DTDs) were then implemented for data validation.
Microsoft SQL Server incorporated with Active Service Pages
was used as our implementation framework. A web database service
had also been built.
Based on the implementation, this paper then discusses issues
related to XML-based factual databases. We emphasize that document
design ought to be carried out in order to achieve maximal compatibility
and scalability. We then point out that an information service
system could better be built by exploiting the self-describing
characteristics of XML documents.
2.
A comprehensive and efficient "OAIS compliant" data
center based on standardized XML technologies
Thierry Levoir and Marco Freschi, Centre National d'Etudes
Spatiales, France
The OAIS (Open Archival Information System) Reference Model
provides a framework to create an archive (consisting of an
organization of people and systems, that has accepted the responsibility
to preserve information and make it available for a Designated
Community). It offers also a real help to design: ingest, data
management, administration and data access systems. XML stands
for eXtensible Markup Language. XML starts as a way to mark
up content, but it soon became clear that XML also provided
a way to describe structured and semi-structured data thus making
the usage as a data storage and interchange format. Many related
languages, formats, technologies like SOAP, XML Query, XML-RPC,
WSDL, Schema, ... are still coming to provide solutions to almost
all problems!
With such technologies, we can define many different architectures.
Due to the vastness of the problem, it is quite difficult to
describe all the possible solutions, so the article is intended
to describe a possible architecture of a system where, the organization
of data and their usage, is defined in accordance with the OAIS
reference model. The article takes cue on the needed to update
an existing data center, providing some features like platform
independence, human readable format of data and easy extensibility
for new type of data. All these advantages seem to be supplied
by XML and Java. XML and Java together can certainly be used
to create some very interesting applications from application
servers to better searchable web sites. It also offers an easy
and efficient way to be interoperable.
However, it is sometimes very difficult to understand where
everything really fits. The article attempts to clarify the
role of each single object inside of a data center, providing
as result, the complete description of a system including its
architecture. A section of the article is also dedicated to
the problem to make data persistent on the database, the choice
of this support often involves an automatic choice of a query
language to retrieve data from the database and a strategy to
store them.
3. Informatics Based Design Of Materials
Krishna Rajan, Rensselaer Polytechnic Institute, USA
In this presentation
we demonstrate the use of a variety of data mining tools for
both classification and prediction of materials properties.
Specific applications of a variety of multivariate analysis
techniques are discussed. The use of such tools has to be coupled
to a fundamental understanding of the physics and chemistry
of the materials science issues. In this talk we demonstrate
the use of informatics strategies with examples including the
design of new semiconductor alloys and how we can extend the
concept of bandgap engineering to the development of "virtual"
materials. The use of the combination of these approaches when
integrated with the correct types of descriptors, allows informatics
methodologies to be a powerful computational methodology for
materials design.
4. XML-based Metadata Management for
INPA's Biological Data
J. L. Campos dos Santos, International Institute for
Geo-Information Science and Earth Observation - ITC, The Netherlands
and The National Institute for Amazon Research - INPA, Brazil
R. A. de By, International Institute for Geo-Information Science
and Earth Observation - ITC, The Netherlands
For more than a century, Amazonian biological data have been
collected, primarily by single or small group of researchers
in small areas over relatively short periods of time. Questions
on "how ecological patterns and processes vary in time
and space, and what are the causes and consequences of this
variability" are still in questioning. For such questions
to be properly answered, far more documented data are required
than could feasibly be collected, managed, and analysed in a
single organisation. Since biological data sets are neither
perfect nor intuitive, they are shared in a close range to the
data producers, who know the subject. Few additional information
are needed for data sets to be used and interpreted. Research
teams outside of the specific subject area need highly detailed
documentation to accurately interpret and analyse historic or
long-term data sets, as well as, data from complex experiments.
Usually, researchers refer to their data as raw data, which
are structured in rows and columns of numeric or encoded sampling
observations. The usefulness of such data can only be assessed
when they are associated to either a theoretical or conceptual
model. This requires understanding of the type of variable,
the units adopted, potential biases in the measurement, sampling
methodology and a series of facts that are not represented in
the raw data, but rather in the metadata. Data and metadata
combined within a conceptual framework produces the so needed
information. Additionally, information can be lost through degradation
of the raw data or lack of metadata. The loss of metadata can
occur throughout the period of data collection and the rate
of loss can increase after the results of the research have
been published or the experiment ends. Specific details are
most likely to be lost due to the abandonment of data forms
and field notes. Metadata will ensure to data users the ability
to locate and understand data through time.
This paper presents an XML-based solution for the management
of metadata biological profiles via the Web. We have adopted
the FGDC Metadata Standard, which incorporates the Biological
Data Profile, and is represented as an XML schema. The schema
is mapped to a well-formed biological metadata template. The
XML metadata template can be deployed to users together with
an XML Editor. The editor uploads the XML file and allows them
to insert all the metadata information. After this process,
the biological metadata can be submitted for certification and
stored in an XML repository. The repository accepts a large
number of well-formed XML metadata and maintains a single data
representation of all the files it receives. The metadata can
be retrieved, updated, or removed from the repository that once
it is indexed, search and query are available. This solution
is in test at the National Institute for Amazon Research (INPA)
within the Biological Collection Program.
5. Incorporation of Meta-Data in
Content Management & e-Learning
Horst Bögel, Robert Spiske and Thurid Moenke, Department
of Chemistry of the Martin-Luther-University Halle-Wittenberg,
Germany
In nearly all scientific
disciplines experiments, observations or computer simulations
produce more and more data. Those data have to be stored, archived
and retrieved for inspection or later re-use. In this context
the inclusion of 'Meta-Data' becomes important to have access
to certain data and pieces of information.
There are three
developments recently to be taken into account:
-
Content
Management Systems (CMS) are used for teamwork and 'timeline'
co-operation
-
use
of eXtended Hypertext Markup Language (XML) to separate the
content from the layout for presentation (DTD, XSL, XPATH,
XSLT)
-
online
Learning using WEB-technology develops for a powerful multimedia-based
education system
All those key-processes
are based on huge amount of date and the relations between them
(this can be called information).
If we want to keep pace with necessities, we have to develop
and use these new techniques for making progress.
We report about some tools (written in Java) for handling of
data and the development of a unique WWW-Based Learning five-years
project (BMBF - Geman Federal Ministry for Education and Research)
in chemistry, to incorporate data (3D-structures, spectra, properties)
and their visualizations in order to favour a research-oriented
way of learning. The students have access to computational methods
in the network, to carry out open-end calculations using different
methods (e.g. semi-empirical and ab initio MO calculations to
generate the electronic structure and the orbitals of molecules).
Track IV-A-3:
Spatial Data Issues
Chair: Harlan Onsrud, University of Maine, USA
|
1.
Spatio-Temporal Database Support for Long-Range Scientific Data
Martin Breunig, Institute of Environmental Sciences, University
of Vechta, Germany
Serge Shumilov, Institute of Computer Science III, University
of Bonn, Germany
Hitherto, database support for spatio-temporal applications
is not yet part of standard DBMS. However, applications like
telematics, navigation systems, medicine, geology, and others
require database queries referring to the location of moving
objects. In real time navigation systems, the position of large
sets of moving cars has to be determined within seconds. In
patient-based medical computer systems, to take another example,
the relevant progress of diseases has to be examined during
days, weeks or even years.
Finally, geological processes like the backward restoration
of basins include time intervals of several 1000 years to be
considered between every documented snapshot of the database.
We restrict ourselves to the requirements of long-range applications
like the simulation of geological processes. An example for
a spatio-temporal database service in geology is given. The
two components of this service provide version management and
the temporal integrity checking of geo-objects, respectively.
In the given example, the location change of a 3D moving object
O(t) between two time steps ti and ti+1 may have three reasons:
the location change of the partial or complete geometry of O(t)
at time ti caused by geometric mappings like translation, rotation
etc., the change of the scale of O (t) at time ti caused by
zooming (change of the size of the object) or the change of
the shape of O(t) at time ti caused by mappings of one or more
single points of its geometry.
These three reasons can also occur in combination with each
other. Furthermore, 3D moving objects may be decomposed into
several components and later sequently merge to a single object
again. The relevant database operations needed to map the objects
into the database (compose and merge) give a mapping between
the IDs of all objects between two directly following time steps.
We show that the simulation of long-range processes can be effectively
supported by set-oriented spatio-temporal database operations.
Among other aspects, this leads to a better understanding of
the history of geological rock formations. The specifications
of the operations are given in C++ program code. We describe
how the proposed spatio-temporal operations are integrated into
GeoToolKit, an object-oriented database kernel system developed
for the support of 3D/4D applications. In our future work we
intend to evaluate the presented spatio-temporal database operations
in benchmarks with large data sets within the open GeoToolKit
system architecture as part of a public database service for
spatio-temporal applications.
2.
Web Visualization for Spatio-Temporal Referenced Multimedia
Data
Paule-Annick Davoine and Hervé Martin, Laboratoire LSR-IMAG,
Equipe SIGMA, France
Web and multimedia technologies enhance possibilities to develop
software for managing, displaying and distributing spatially-referenced
information. A lot of works deal with Internet based cartographic
visualization. More and more geographic applications have to
integrate both a temporal and a multimedia dimension. This kind
of information appears more complex than usual spatial information
linked with statistical data such as economical, demographic
or ecological data. The main problem for Geographical Information
Systems (GIS) is to merge qualitative and multimedia information
with information related to time and space. Moreover, spatial
and temporal references may be heterogeneous and discontinuous.
Actually, current GIS are not suited to the use and to the visualization
of this kind of information. In this paper, we show how multimedia
web information systems may be used to model and to navigate
across spatio-temporal referenced multimedia data. This work
has been realized in the framework of an European project, named
SPHERE, on historical natural hazards.
In a first time we explain how Unified Modelling Language (UML)
language allows to take into account various user requirements.
We focus on two important features of GIS: how to specify system
functionalities and how to capture spatio-temporal referenced
multimedia data. To illustrate the former point, we present
the main functionalities of the web visualisation interface
that we have implemented during the SPHERE project. We also
explain the technological choices used to develop this tool.
We have developed Java tool based on a client-server architecture
including an Apache Web server and a client browser for running
Java applets. This software allows to navigate across information
space according to spatial and temporal features and to visualize
simultaneously and interactively cartographic, temporal and
documentary aspects of information.
Keywords: Web and Database, Geographical Information System
(GIS), information system, multimedia, spatio-temporal referenced
information, UML modelling, visualisation interface.
Track IV-A-5:
Information Infrastructure for Science and Technology
Horst Kremers, Eng., Comp. Sci., Berlin, Germany
The manageability
of complex information systems for multidisciplinary
cooperation and its use in decision support depends
on basic methods and techniques that cover application
layers, such as:
-
the
role of basic sets of information in global and
national information infrastructures
-
access, compatibility, and interoperability
-
documentation of information models
-
validation procedures and quality control
-
financial and legal aspects (including copyright)
-
enabling cooperation on information
-
-
This session
offers opportunities to present best practices in
information infrastructure, as well as discussing
the methodological backgrounds and potential ways
to support the creation of national and global interdisciplinary
information infrastructures. The session, of course,
has cross-links to other sessions in the CODATA conference.
Topics here are to be discussed in their strategic
importance with respect to enabling freedom of information,
as well as enabling reliable communication and cooperation
in the information society.
|
1.
Exchange Of Heterogeneous Information Concepts And Systems
Hélène Bestougeff, CODATA - France
Jacques-Emile Dubois, ITODYS, Université de Paris VII
- France and Past-President, CODATA
Today, especially with the development of networking and the
internet, the exchange of heterogeneous information leading
towards better interdisciplinary co-operation is a vital issue.
In this framework, several technical and organizational problems
must be solved. Integration deals with developing architectures
and frameworks as well as techniques for integrating schemas
and data. Different approaches to interrelate the source systems
and user's queries are possible depending on the degree of coupling
between the original data sources and the resulting system.
However, integration is just a first essential step towards
more sophisticated architectures which are developed towards
management decisions support. These architectures, grouped under
the term of Data Warehouses are subject oriented and involve
the integration of current and historical data.
The third aspect of heterogeneous information exchange is knowledge
management, information mining systems, and web information
management. The web is now part of almost all organizations.
Therefore, the databases and the warehouses of specific networks
endowed with adequate metadata have to operate on the web. Moreover,
the management of unstructured and multimedia data such as text,
images, audio and video presents new original challenges.
This paper will present, in a systematic way, concepts and systems
dealing with these problems and drawing on particular results
and examples from the just published book by Kluwer " Heterogeneous
Information Exchange and Organizational Hubs".(H.Bestougeff,
J.E. Dubois, B.Thuraisingham, Editors) containing 15 original
chapters covering:
I Heterogeneous Database Integration: Concepts and Strategies
II Data warehousing: Models and Architectures
III Sharing Information and Knowledge.
2.
Information Infrastructure: The Dynamic Mosaic
Horst Kremers, Eng., Comp. Sci., Berlin, Germany
The various aspects of Information Infrastructure presented
and dicussed in this session and in contributions troughout
this conference give an overview of the specific role of CODATA
shaping and developing this field in its specific competence
and interest. The mosaic that this conference shows can be completed
as well as it can be clearly distinguished from from other activities
in Information Infrastructure at national and at international
level. In addition, the various contributions have shown that
this field is under dynamic development. This allows the discussion
of a potential CODATA strategic position and of actions that
would promote this development because of its growing relevance
in an appropriate way.
3. Cooperative Canadian/US Project:
An Experiment in Sharing Geospatial Data Cross-Border
Milo Robinson, US Federal Geographic Data Committee, USA
Marc LeMaire, Mapping Services Branch, USA
The US Federal Geographic Data Committee (FGDC) and its Canadian
counterpart, GeoConnections, have developed several cooperative
projects to develop a common spatial data infrastructure. To
better understand the challenges and complexities of transboundary
spatial data issues, GeoConnections and the Federal Geographic
Data Committee jointly funded two collaborative demonstration
projects covering a common geographic project that crosses the
border and addressing a common issue, that of sharing data with
our neighbors. These collaborative projects cover the Red River
Basin (Roseau River and Pembina River Basin) and the Yukon to
Yellowstone (Crown of the Continent Study Area). The results
of these international spatial data demonstration projects,
as well as new joint activities, will be discussed.
4.
Current Trends in the Global Spatial Data Infrastructure: Evolution
from National to Global Focus
Alan R. Stevens, Global Spatial Data Infrastructure (GSDI) Secretariat,
USA
In the late 1980's many organizations from state, local and
tribal governments, the academic community, and the private
sector within the United States came together to encourage common
practices and uniform standards in digital geographic (map)
data collection, processing, archiving, and sharing. The National
Spatial Data Infrastructure (NSDI) encompasses policies, standards,
and procedures for organizations to cooperatively produce and
share georeferenced information and data. The major emphasis
now has turned toward Geospatial One-stop initiatives, Implementation
Teams (I-Teams), and Homeland Security for better governance,
but all still aimed at facilitating the building of the NSDI.
In the mid '90s other nations began to recognize that tremendous
efficiencies and cost savings could be realized by reducing
duplicative data collection, procession, archive and distribution
not only within their own borders but across international boundaries
as well. A small group, at first, spawned what is now known
as the Global Spatial Data Infrastructure (GSDI). This group
now has grown to over 40 nations and consists of government
agencies, NGO's, academic institutions, other global initiatives,
and a significant contingent of the private sector in the geosopatial
industry. Industry is excited to be involved because they realize
that common standards will increase demand for data from domestic
customers and will expand the awareness within emerging nations
further increasing the client base. The GSDI has incorporated
as a non-profit organization so it can partner with others in
securing funds to encourage and accelerate the development of
National and Regional Spatial Data Infrastructures in fledgling
organizations and countries.
Track IV-B-3:
Data Portals
|
1.
Information Society Technologies Promotion for New Independent
States
A.D. Gvishiani, Director of the Center of Geophysical Data Studies
and Telematics Applications IPE RAS, Russia
J. Babot, Head of the E Work Sector in European Commission,
Belguim
J.Bonnin, Institut de Physique du Globe de Strasbourg, France
Recently proposed cluster project CLUSTER-PRO will unite and
coordinate the activities of the five Information Society Technology
(IST) program projects now running by the European Commission
in Baltic (TELEBALT), Eastern European (E3WORK and TEAMwork)
and CIS (WISTCIS and TELESOL) countries in order to promote
new information technologies for scientific and technological
data handling in these countries. French Committee on Data for
Science and Technology (CODATA FRANCE) serves as the coordinator
of the proposed CLUSTER-PRO project.
In the presentation, all the five projects under clustering
will be described. The goals of these projects are to promote
modern teleworking tools for scientific, technological and business
data handling and exchange between EU member states and EU pre-accession
and third countries. Main goal of the cluster is to create a
common structure of concentration between the existing projects,
with a common portal, cross exchange and adaptation of results,
common action plan for dissemination. One of the objectives
is the elaboration of the cross-project Web sites that will
focus on new opportunities of teleworking in scientific and
technological data acquisition and exchange. Education, research,
business, tele-medicine, new employment opportunities promotion
and environmental protection in all participating countries
are among the cluster project objectives. Another objective
is the cross-project training actions using the courses and
e-learning systems developed by the five clustered projects,
which will be adopted for the whole range of countries. Cross-project
training actions will be implemented in face-to-face mode at
CLUSTERPRO and the clustered projects gatherings, and virtually
through the Web sites. An informationportal "New opportunities
for EU-CLUSTER-PRO countries teleworking" will be developed.
2.
Data, Information, and Knowledge Management of a Solar UV Data
Network for Coating Materials
Lawrence J. Kaetzel, K-Systems, Prescott, Arizona, USA and
Jonathan W. Martin, National Institute of Standards and Technology,
Gaithersburg, MD, USA
A major factor in
understanding the performance of coating materials and products
requires the use of systematic methods for acquiring, recording,
and interpreting laboratory and field test results and environmental
conditions. The verified data from these sources can then be
used with computer-based models for more accurately predicting
materials performance. A solar ultra-violet data network has
been created by the National Institute of Standards and Technology,
Gaithersburg, Maryland. The network measures, collects and archives
weather measurements for use in predicting the performance of
automotive and architectural coatings. Operation of the network
is performed as a collaborative effort among several U.S. Government
agencies and private industry organizations. The network currently
consists of 8 field locations operating in the United States
that are equipped with solar spectroradiometers and weather
stations. Data from the network is evaluated and stored electronically,
then used in scientific analysis as applied to materials performance
and biological studies.
This paper presents
the efforts to ensure data integrity; the methodologies used
to represent the data, information, and knowledge in a consistent
manner; and the computer-based methods (e.g., computer-based
models, decision-support or smart modules) developed to assist
the knowledge consumer in determining relevance and to assist
in the interpretation of the measurements. The paper will first
discuss the application of the data as applied to coating performance
and its use with computer-based modules, followed by a discussion
of knowledge management methods.
|