CODATA logo
CODATA 2002: Frontiers of
Scientific and Technical Data

Montréal, Canada — 29 September - 3 October
 

Data Policy Abstracts

Proceedings
Table of Contents

Keynote Speakers

Invited Cross-Cutting Themes

CODATA 2015

Physical Science Data

Biological Science Data

Earth and Environmental Data

Medical and Health Data

Behavioral and Social Science Data

Informatics and Technology

Data Science

Data Policy

Technical Demonstrations

Large Data Projects

Poster Sessions

Public Lectures

Program at a Glance

Detailed Program

List of Participants
[PDF File]

(To view PDF files, you must have Adobe Acrobat Reader.
)

Conference Sponsors

About the CODATA 2002 Conference

 


Track I-D-4:
The Public Domain in Scientific and Technical Data: A Review of Recent Initiatives and Emerging Issues

Chair: Paul F. Uhlir, The National Academies, USA

The body of scientific and technical data and other information in the public domain is massive and has contributed broadly to the scientific, economic, social, cultural, and intellectual vibrancy of the entire world. The "public domain" may be defined in legal terms as sources and types of data and information whose uses are not restricted by statutory intellectual property regimes and that are accordingly available to the public without authorization. In recent years, however, there have been growing legal, economic, and technological pressures on public-domain information-scientific and otherwise-forcing a reevaluation of the role and value of the public domain. Despite these pressures, some well-established mechanisms for preserving the public domain in scientific data exist in the government, university, and not-for-profit sectors. In addition, very innovative models for promoting various public-domain digital information resources are now being developed by different groups in the scientific, library, and legal communities. This session will review some of the recent initiatives for preserving and promoting the public domain in scientific data within CODATA and ICSU, the US National Academies, OECD, UNESCO, and other organizations, and will highlight some of the most important emerging issues in this context.

 

1. International Access to Data and Information
Ferris Webster, University of Delaware, USA

Access to data and information for research and education is the principal concern of the ICSU/CODATA ad hoc Group on Data and Information. The Group tracks developments by intergovernmental organizations with influence over data property rights. Where possible, the Group works to assure that the policies of these organizations recognize the public good to be derived by assuring access to data and information for research and education.

A number of international organizations have merited attention recently. New proprietary data rights threaten to close off access to data and information that could be vital for progress in research. The European Community has been carrying out a review of its Database Directive. The World Meteorological Organization's resolution on international exchange of meteorological data has been the subject of continuing debate. The Intergovernmental Oceanographic Commission is drafting a new data policy that may have constraints that are parallel to those of the WMO. The World Intellectual Property Organization has had a potential treaty on databases simmering for several years.

The latest developments in these organizations will be reviewed, along with the activities of the ICSU/CODATA Group.

 

2. The OECD Follow up Group on Issues of Access to Publicly Funded Research Data: A Summary of the Interim Report
Peter Arzberger, University of California at San Diego, USA

This talk will present a summary of the interim report of the OECD Follow up Group on Issues of Access to Publicly Funded Research Data. The Group's efforts have origins in the 3rd Global Research Village conference in Amsterdam, December 2000. In particular, it will include issues of global sharing of research data. The Group has conducted case studies of practices across different communities, and looked at factors such as sociological, economic, technological and legal issues that either enhance or inhibit data sharing. The presentation will also address issues such as data ownership and rights of disposal, multiple uses of data, the use of ICT for widening the scale and scope of data-sharing, effects of data-sharing on the research process, and co-ordination in data management. The ultimate goal of the Group is to articulate principles, based on best practices that can be interpreted into the science policy arena. Some initial principles will be discussed. Questions such as the following will be addressed:

  • What principles should govern science policy in this area?
  • What is the perspective of social informatics in this field?
  • What role does the scientific community play in this?

It is intended that this presentation will generate discussion and feedback on key points of the Group's interim report.

 

3. An Overview of Draft UNESCO Policy Guidelines for the Development and Promotion of Public-Domain Information
John B. Rose, UNESCO, Paris, FRANCE
Paul F. Uhlir, The National Academies, Washington, DC, USA

A significantly underappreciated, but essential, element of the information revolution and emerging knowledge society is the vast amount of information in the public domain. Whereas the focus of most policy analyses and law making is almost exclusively on the enhanced protection of private, proprietary information, the role of public-domain information, especially of information produced by the public sector, is seldom addressed and generally poorly understood.

The purpose of UNESCO's Policy Guidelines for the Development and Promotion of Public-Domain Information, therefore, is to help develop and promote information in the public domain at the national level, with particular attention to information in digital form. These Policy Guidelines are intended to better define public-domain information and to describe its role and importance, specifically in the context of developing countries; to suggest principles that can help guide the development of policy, infrastructure and services for provision of government information to the public; to assist in fostering the production, archiving and dissemination of an electronic public domain of information for development, with emphasis on ensuring multicultural, multilingual content; and to help promote access of all citizens, especially including disadvantaged communities, to information required for individual and social development. This presentation will review the main elements of the draft Policy Guidelines, with particular focus on scientific data and information in the public domain.

Complementary to, but distinct from, the public domain are the wider range of information and data which could be made available by rights holders under specific "open access" conditions, as in the case of open source software, and the free availability of protected information for certain specific purposes, such as education and science under limitations and exceptions to copyright (e.g., "fair use" in U.S. law). UNESCO is working to promote international consensus on the role of these facilities in the digital age, notably through a recommendation under development on the "Promotion and Use of Multilingualism and Universal Access to Cyberspace," which is intended to be presented to the World Summit on the Information Society to be organized in Geneva (2003) and Tunis (2005), as well as a number of other relevant programme actions which will also be presented at the Summit.

 

4. Emerging Models for Maintaining the Public Commons in Scientific Data
Harlan Onsrud, University of Maine, USA

Scientists need full and open disclosure and the ability to critique in detail the methods, data, and results of their peers. Yet scientific publications and data sets are burdened increasingly by access restrictions imposed by legislative acts and case law that are detrimental to the advancement of science. As a result, scientists and legal scholars are exploring combined technological and legal workarounds that will allow scientists to continue to adhere to the mores of science without being declared as lawbreakers. This presentation reviews three separate models that might be used for preserving and expanding the public domain in scientific data. Explored are the technological and legal underpinnings of Research Index, the Creative Commons Project and the Public Commons for Geographic Data Project. The first project relies heavily on protections granted to web crawlers under the U.S. Digital Millennium Copyright Act while the latter two rely on legal approaches utilizing open access licenses.


5. Progress, Challenges, and Opportunities for Public-Domain S&T Data Policy Reform in China
Liu Chuang, Chinese Academy of Sciences, Beijing, China

China has experienced four different stages for public-domain S&T data management and policy during the last quarter century. Before 1980, most of the government funded S&T data were free to be accessed, and the services received a good reputation from the scientific community. Most of these data were recorded on paper media, however, and took time to be accessed.

With the computer developments in the early 1980s, digital data and databases increased rapidly. The data producers and holders began to realize that the digital data could be an important resources for the scientific activities. The policy to charge fees for data access gained prominence between the early 1980s and approximately 1993. During this time period, China experienced new problems in S&T data management. For example, there was an increase of parallel work in database development and in data controlled by individual persons with a high risk of losing the data, and the price of access to data became very expensive in most cases.

In the 1994-2000 period, members of the scientific community asked for data policy reform, and for lower costs of access to government funded databases for non-profit applications. The Ministry of Science and Technology (MOST) set up a group to investigate China's S&T data sharing policies and practices.

A new program for S&T data sharing was initiated by MOST in 2001. This was a major milestone for enhancing access to and the application of public-domain S&T data. This new program, along with the current development of a new data access policy and support system, is expected to be greatly expanded during next decade.


Track IV-A-4:
Confidentiality Preservation Techniques in the Behavioral, Medical and Social Sciences

D. Johnson, Building Engineering and Science Talent, San Diego, CA, USA
John L. Horn, Department of Psychology, University of Southern California, USA
Julie Kaneshiro, National Institutes of Health, USA
Kurt Pawlik, Psychologisches Institut I, Universität Hamburg, Germany
Michel Sabourin, Université de Montréal, Canada

In the behavioral and social sciences and in medicine, the movement to place data
in electronic databases is hampered by considerations of confidentiality. The data collected on individuals by scientists in these areas of research are often highly personal. In fact, it is often necessary to guarantee potential research participants that the data collected on them will be held in strictest confidence and that their privacy will be protected. There has even been debate in these sciences about whether data collected under a formal confidentiality agreement can be placed in a database, because such use might constitute a use of the data to which the research participants did not consent.

The members of this panel will discuss a broad range of techniques that are being used across the behavioral and social sciences and medicine to protect the confidentiality of individuals whose data are entered into an electronically accessible database. Among the highly controversial data to which these techniques are being applied are data on accident avoidance by pilots of commercial aircraft and data on medical errors. The stakes in finding ways to use these data without violating confidentiality are high, since the payoff from learning how to reduce airplane accidents and medical mistakes is saved lives.

Standard techniques for separating identifier information from data, as well as less common techniques such as the introduction of systematic error in data, will be discussed. Despite the methods that are in place and those that are being experimented with, there is evidence that even sophisticated protection techniques may not be enough. The group will conclude its session with a discussion of this challenge.

 

1. Issues in Accessing and Sharing Confidential Survey and Social Science Data
Virginia A. de Wolf, USA

Researchers collect data under pledges of confidentiality. The US federal statistical system has established practices and procedures that enable others to access data it collects. The two main methods the federal statistical agencies use are to restrict the content of the data (termed "restricted data") and to restrict the conditions under which the data can be accessed, i.e., at what locations, for what purposes (term "restricted access").

Data sharing practices in the various social science disciplines vary. For example, codes of ethics of some social science disciplines encourage sharing (e.g., the American Association for Public Opinion Research) while others do not. In the US both of the institutions that fund the bulk of the social science research, the National Institutes of Health and the National Science Foundation, have statements on data sharing.

This presentation will review the practices, procedures, issues, etc., of US federal statistical agencies in allowing access to the data they collect. It will highlight the activities of US federal interagency committees and will conclude with a discussion of the applicability of the experience of the US federal statistical system to the academic social science community.



2. Contemporary Statistical Techniques for Closing the "Confidentiality Gap" in Behavioral Science Research
John L. Horn, Department of Psychology, University of Southern California, USA

Over the past three decades, behavioral scientists have become acutely aware of the need for both the privacy of research participants and the confidentiality of research data. During this same time period, knowledgeable researchers have created a variety of methods and procedures to insure confidentiality. But many of the best techniques used were not designed to permit the sharing of research data with other researchers outside of the initial data collection group. Since a great deal of behavioral science data collected at the individual level require such protections they cannot easily be shared with others in a confidential way. These practical problems have created a great deal of confusion and a kind of "confidentiality gap" among researchers and participants alike. This presentation will review some available "statistical" approaches to deal with these problems, and examples will be drawn from research projects on human cognitive abilities. These statistical techniques range from the classical use of replacement or shuffled records to more contemporary techniques based on multiple imputations. In addition, new indices will be used to relate the potential loss of data accuracy versus the loss of confidentiality. These indices will help researchers define the confidentiality gap in their own and any other research project.

References

  1. Feinberg, S.E, & Willenborg, L. C.R.J. (1998). Special issue on "Disclosure limitation methods for protecting confidentiality of statistical data." Journal of Official Statistics, 14 (4), 337-566.
  2. Willenborg, L. C.R.J. & de Waal, T. (2001). Elements of statistical disclosure control. Lecture Notes in Statistics, 155. New York: Springer-Verlag.
  3. Clubb, J.M., Austin, E.W., Geda, C.L. & Traugott, M.W. (1992). Sharing research data in the social sciences. In G. H. Elder, Jr., E. K. Pavalko & E. C. Clipp. Working with Archival Data: Studying Lives (pp. 39-75). SAGE Publications.
  4. Willenborg, L. C.R.J. & de Waal, T. (1996). Statistical disclosure control in practice. Lecture Notes in Statistics, 111. New York: Springer-Verlag.



NASA Aviation Safety Reporting System (ASRS)
Linda J. Connell, NASA Ames Research Center, USA

In 1974, the United States experienced a tragic aviation accident involving a B-727 on approach to Dulles Airport in Virginia. All passengers and crew were killed. The accident was classified as a Controlled Flight Into Terrain event. During the NTSB accident investigation, it was discovered from ATC and cockpit voice recorder tapes that the crew had become confused over information regarding the approach instructions, both in information provided in approach charts and the ATC instruction "cleared for the approach". It was discovered that another airline had experienced a similar chain of events, but they detected the error and increased their altitude. This action allowed them to miss the on-coming mountain. The second event would be classified as an incident. The benefit of the information spread rapidly in this airline, but had not reached other airlines. As a result of the NTSB findings, the FAA and NASA created the Aviation Safety Reporting System in 1976. The presentation will describe the background and principles that guide the operation of the ASRS. The presentation will also include descriptions of the uses of and products from approximately 490,000 incident reports.

 

 

Last site update: 15 March 2003