CODATA 2002 Program

Behavioral and Social Science Data Abstracts

Data Policy

Technical Demonstrations

Detailed Program

List of Participants
[PDF File]
(To view PDF files, you must have Adobe Acrobat Reader.)

Conference Sponsors

About the CODATA 2002 Conference

Track I-C-4:
Government as a Driver in Database Development in the Behavioral Sciences

Chair: David Johnson, Building Engineering and Science Talent, USA

The behavioral sciences have not had a tradition of data sharing. Thus they have been somewhat behind other sciences in the development of databases. Officials in several science agencies of the US federal government have been concerned about this lack of data sharing and have taken measures to stimulate development. The purpose of this panel is to explore the ways that government agencies can arrange funding opportunities to stimulate innovation in areas that scientists within given fields have been reluctant to address.The work of three US agencies will be highlighted: The National Science Foundation, the National Institutes of Health, and the Federal Aviation Administration.

Government and science often exercise reciprocal influences on each other. The three examples that that will be explored in this panel session represent three discrete models by which governments may stimulate a science to produce knowledge in a way that it would not have in the absence of the government's effort.

1. Sharing data collection and sharing collected data: The NICHD Study of Early Child Care and Youth Development
Sarah L. Friedman, The NICHD Study of Early Child Care and Youth Development, USA

The NICHD Study of Early Child Care and Youth Development came to life as a result of a 1988 NICHD solicitation (RFA) and is scheduled to terminate at the end of 2009. The aim of the solicitation was to bring together investigators from different universities or research institutions to collaborate with NICHD staff on the planning and execution of one longitudinal study with data to be collected across sites. The idea for such a collaborative study was unprecedented in the scientific field of developmental psychology.

Ten data collection sites were selected on a competitive basis and the affiliated investigators, in collaboration with NICHD staff, have designed the different phases of the solicited longitudinal study and have implemented it. While the data collected at each of the sites belongs to the site, NICHD required that each of the 10 sites would send its data to a central location, the Data Acquisition and Analysis Center, for data editing, data reduction and data analyses. The study investigators, in collaboration of the data center staff, guide the data acquisition and analyses. Upon completion of an agreed upon quota of network authored scientific papers for a given phase of the study, individual study investigators get access to the data sets of the entire sample. A few months after the data sets and supporting documentation are available to individual study investigators for their exclusive use, the same data sets are made available to interested and qualified others in the scientific community.

While the archiving of the data is done by an NICHD grantee, the Murray Center at Radcliff College has expressed interest in archiving the data and supporting their use by interested and qualified investigators. If the grantee institutions will accept the Murray Center request, the data collected by the grantees will be available to the scientific community beyond the life of the grant.

2. Data Sharing at NIH and NIA
Miriam F. Kelty, National Institute on Aging, Office of Extramural Activities, USA

NIH published its policy mandating sharing of unique biological resources in 1986. Sixteen years later NIH published a draft policy. It states that NIH expects the timely release and sharing of final research data for use by other researchers. Further, NIH will require extramural and intramural investigators to promulgate a data sharing plan in their research proposals or to explain why a plan to share data is not possible. The policy is available for comment until June 1. The presentation will provide background information and summarize public comments.

NIA staff have been leading advocates for data sharing and have encouraged it among grantees, particularly when research involves large data sets that are valuable research resources and impractical to replicate. NIA will provide funds to make data that are well documented and user-friendly available to other researchers. Some examples of NIA supported activities in support of data sharing are described below:

The National Archive of Computerized Data on Aging (NACDA), located within the Interuniversity Consortium for Political and Social Research (ICPSR), is funded by the National Institute on Aging. NACDA's mission is to advance research on aging by helping researchers to profit from the under-exploited potential of a broad range of datasets. NACDA acquires and preserves data relevant to gerontological research, processing as needed to promote effective research use, disseminates them to researchers, and facilitates their use. By preserving and making available the largest library of electronic data on aging in the United States, NACDA offers opportunities for secondary analysis on major issues of scientific and policy relevance.

NACDA supports a data analysis system that allows the user to access subset variables or cases. The system can be used with a variety of data stets, including the Longitudinal Survey on Aging, National Survey of Self-Care and Aging, National Health and Nutrition Survey, National Hospital Discharge Survey, and the National Health Interview Survey.

NIA supports a range of studies that have agreed to make data available to researchers. An example is the Health and Retirement Study, a nationally representative study that collects data on aging and retirement. The study is based at the University of Michigan and the Michigan Center on Demography of Aging makes data available to a range of researchers. Some data is available to anyone for analysis while other data sets are restricted and require contractual agreements prior to being made available for use.

The presentation will address NIA's experience with the use of available data sets and raise some issues surrounding data sharing.

3. Data Archiving for Animal Cognition Research: The NIMH Experience
Howard S. Kurtzman, Cognitive Science Program, National Institute of Mental Health, USA

In July 2001, the National Institute of Mental Health (a component of the U.S. National Institutes of Health) sponsored a workshop on "Data Archiving for Animal Cognition Research." Participants included leading scientists as well as experts in archiving, publishing, policy, and law. Due to the focus on non-human research, participants were able to devote primary attention to important issues aside from protection of confidentiality, which has dominated most previous discussions of behavioral science archiving. The further limitation of the workshop's scope to animal cognition research allowed archiving to be examined realistically in the context of one particular scientific community's goals, methods, organization, and traditions.

The workshop produced a set of conclusions, detailed in a formal report, concerning: (1) the likely impacts of archiving on research and education, (2) guidelines for incorporating archiving into research practice, (3) contents of archives, (4) technical standards, and (5) organizational and policy issues. The presentation will review these conclusions and describe activities following up on the workshop. Also discussed will be the applicability of the workshop's conclusions to other areas of behavioral science and how this workshop's approach to stimulating archive development might serve as a model for other fields.

4. Data Sharing and the Social and Behavioral Sciences at the National Science Foundation
Philip Rubin, Division of Behavioral and Cognitive Sciences, USA

At the heart of the National Science Foundation's (NSF) strategic plan are people, ideas, and tools. In the latter area, our goal is to provide broadly accessible, state-of-the-art information-bases and shared research and education tools. We actively encourage data sharing across all of our fields of study. This presentation will provide examples from the social and behavioral sciences. As data sharing is encouraged and increased, however, there are growing concerns and issues related to privacy and confidentiality. These issues will also be discussed, as will future directions in information sharing.

At the NSF, the Directorate for Social, Behavioral, and Economic Sciences (SBE) participates in special initiatives and competitions on a number of topics, including infrastructure to improve data resources, data archives, collaboratories, and centers.

The breadth of fields is wide in our Directorate, ranging from Anthropology through Political Science and Economics. However, common to many of the disciplinary areas that we support is a rapid change in how the science is being done. What is emerging is a large scale social science, driven by computational progress, the need for scientific expertise across a number of domains, growing bodies of data and other information, and theoretical and practical issues that require for their understanding a broader view than has been taken in the past.

This change will be illustrated by some examples of recent or continuing projects that we are supporting. For example, physical anthropologists utilize tools from a wide range of overlapping disciplines ranging from molecular biology (population genetics) to field ecology to remote sensing (paleoanthropology). In all of these areas large amounts of data are generated that are conducive to the establishment of digital libraries, databases, web-based archives and the like. A recent SBE Infrastructure award will be described that supports a number of interrelated activities that will advance research in physical anthropology, evolutionary biology, neuroscience and any others that may require information and/or biomaterials from nonhuman primates.

An example in geography is the National Historical Geographic Information System (NHGIS) at the University of Minnesota, Twin Cities. This project upgrades and enhances U.S. Census databases from 1790 to the present, including the digitization of all census geography so that place-specific information can be readily used in geographic information systems. We expect that the NHGIS will become a resource that can be used widely for social science training, by the media, for policy research at the state and local levels, by the private sector, and in secondary education.

Last year the National Science Board approved renewal of NSF support for the Panel Study of Income Dynamics (PSID). The PSID is a longitudinal survey initiated in 1968 of a nationally representative sample for U.S. individuals and the family units in which they reside. The major objective of the panel is to provide shared-use databases, research platforms and educational tools on cyclical, intergenerational and life-course measures of economic and social behavior. With thirty-plus years of data on the same families, the PSID can justly be considered a cornerstone of the infrastructure support for empirically based social science research.

Additional examples abound, and will be discussed. These include CSISS, the Center for Spatially Integrated Social Science, at the University of Santa Barbara; the fMRI Data Center at Dartmouth College, a national cognitive neuroscience resource; data-rich linguistics projects that support both the preservation of knowledge of disappearing languages and statistically-guided approaches to increasing our understanding of ongoing language use; systems for storage and dissemination of multimodal (audio, visual, haptic, etc.) data; and systems and techniques for the meta-analysis of large scale data sets.

Data sharing is at the heart of NSF's mission and of our vision of the social and behavioral sciences. This presentation is intended to provide an overview of that vision.

Track I-D-6:
Database Innovation in the Behavioral Sciences and the Debate Over What Should Be Stored
Session organizer: US National Committee for the International Union of Psychological Sciences, National Academy of Sciences, Washington, D.C., USA

Chair: Merry Bullock, American Psychological Association

Data sharing is not the norm in behavioral science, although there are pockets of change and innovation. At the same time, a debate is underway regarding what data from experiments are worth placing in databases to be available for others. As it becomes possible to store huge quantities of data, it is becoming more necessary to assure that databases grow into useful tools rather than clogged informational arteries. This panel has two objectives: to inform attendees of innovations and to discuss the possible criteria for determining what should be included in databases.

Panelists will discuss several innovative databases that are proving transformational for the fields they touch. For example, a database of functional magnetic resonance images of the brain created at Dartmouth College is making it possible to test hypotheses about brain-behavior relations on data pooled across many individual studies; a database of geographic information based at the University of California, Santa Barbara is allowing those in a variety of disciplines to look at the influence of location on such things as health behaviors, social development, and wealth accumulation. A database of aptitude test scores at the University of Virginia is a test bed for statistical innovations that are making it possible to legitimately compare data and not just outcomes from disparate studies.

The Panel will describe several of these innovations in behavioral and other sciences, and will address important emerging issues. For example, the fMRI database (originally envisioned as capturing all the images from most of the major neuroscience journals) is constrained because of file size-images from a single journal consume terabytes of storage space and raise important questions of accessibility. As the behavioral sciences evolve toward more common acceptance of data sharing, those in the behavioral sciences must evolve toward a more common understanding of what should be contained in a database and what sorts of data are appropriate for archiving. Examples and issues from other disciplines will help inform the discussion.

1. Acquisition Criteria at the Murray Research Center: A Center for the Study of Lives
Jacquelyn B. James, Murray Research Center

The Murray Research Center is a repository for social and behavioral sciences data on the in-depth study of lives over time, and issues of special concern to American women. The center acquires data sets that are amenable to secondary analysis, replication, or longitudinal follow-up. In determining whether or not to acquire a new data set for the archive, several kinds of criteria are used. The criteria can be roughly grouped into five general categories: content of the study, methodology, previous analysis and publication, historical value, and cost of acquiring and processing the data. Each of these will be described with an indication of the relative importance of each criterion, where possible.

2. What Functional Neuroimaging Data is 'Worth' Sharing and the Scope of Large-Scale Study Data Archiving
John Darrell Van Horn, The fMRI Data Center, Dartmouth College, USA

Functional neuroimaging studies routinely produce large sets of raw data that comprise both functional image time series as well as high-resolution anatomical brain volumes. It is often the case that these data are then passed through several steps of processing and then only a limited set of the statistical output is presented in papers published in the peer-reviewed literature. Arguments for archiving only these summary results have suggested that they are of greater value than that of the raw data itself. However, since with each step of processing the information content of a data set remains constant or is reduced, it is difficult to see the source of any increased scientific value. The fMRI Data Center (fMRIDC) strives to archive complete raw functional neuroimaging data sets accompanied by enough information that anyone else would be able to reconstruct the steps in processing of the data and arrive at the same statistical brain map as the original authors. To achieve this, the fMRIDC requests that authors of published studies provide considerably more study 'meta' and raw data than is typically presented in their published article. As such, several studies currently in the fMRIDC archive rival the size of the entire human genome database (~20GB compressed). Through the storing of complete study data sets, the fMRIDC effort will serve to not only advance thinking into fundamental concepts about brain function by permitting others to examine the published neuroimaging data of others but also to document more thoroughly the scientific record of work in the fields of functional brain imaging and cognitive neuroscience.

3. Accession and Sharing of Geographic Information
Michael F. Goodchild, University of California, Santa Barbara, USA

Geographic information is a well-defined type, with complex uses and production systems. The Alexandria Digital Library began as an effort to provide remote access to a large collection of geographic information (maps and images), but has evolved into a functional geolibrary (a digital library that can be searched using geographic location as the primary key). I use ADL to illustrate many of the issues and principles inherent in sharing geographic information, and in policies regarding its acquisition by archives, including granularity, metadata schema, support for search across distributed archives, portals and clearinghouses, and interoperability.

Last site update: 15 March 2003

	CODATA 2002: Frontiers of Scientific and Technical Data Montréal, Canada — 29 September - 3 October