20th International CODATA Conference (Times New Roman, 11pt)
Session: Computational informatics: integrating data science with materials modeling

 

Materials Research Groups, Digital Libraries, & Education:

Metadata from Nanoscale Simulation Code

Laura Bartolo

Kent State University, USA

 

Images of nanostructures can provide important visual resources for scientific research and education.  However associating appropriate metadata with these visual resources is essential for their interpretation especially when they are shared with collaborators, students, or the broader scientific community.  Within the NSDL Materials Digital Library Pathway (MatDL), information scientists at Kent State University (KSU) and materials scientists at the University of Michigan (U-M) are collaborating to capture, in Dublin Core (DC) XML format, optimal description of nanoscale computer simulation output as research codes are executed.  Metadata capture routines have been incorporated into the simulation codes of a U-M research group which is piloting and testing this effort as part of their normal workflow. Two U-M classes (i.e., an upper level undergraduate class in molecular engineering and a graduate class in thermodynamics) are also participating in the pilot.
Input parameters and associated values that determine the output of nanoscale simulations are recorded in order to capture the metadata material scientists consider important. This strategy has the advantage of creating more granular, consistent, and accurate metadata from input file parameters and values that can be associated immediately with simulation output.  This approach also eliminates duplicate effort and possible recording errors that could occur if the metadata were produced by a separate mechanism at a later time. Ultimately, streamlining the process of submitting these resources to a digital library and eliminating the necessity for hand-generating metadata, should increase the likelihood of submissions to outside repositories, such as digital libraries that are sustained by user contributions.  Capturing and associating metadata with research data as it is generated will enable the data, including non-print materials, to be more easily located and interpreted within a lab, a group of labs collaborating through a distributed network, as well as a digital library.
Currently, input file parameters and values for the U-M research group's master simulation code and all of its modules are captured as DC metadata in XML format upon execution of the code.  The DC metadata transparently describes all associated parameters and values necessary to identify, describe, or recreate the simulation.  The metadata includes DC elements such as: title, creator, and date as well as subject terms which are drawn from parameters, e.g., simulation type, code module, and integrator scheme.  DC description includes additional parameter names and associated values such as: number of particles; system number density; system temperature as a function of time; time step for integration of equations of motion; and number of dimensions.  Subject terms also lay the foundation for the development of a community-built, refereed, web-accessible dictionary, glossary, and thesaurus which will serve as a reference resource on assembly of nanosystems for upper level undergraduates as well as for beginning graduate research lab assistants.
This presentation will describe: 1) metadata capture for nanoscale computer simulation output during research code execution; 2) pilot results of resource submissions; 3) development of a reference resource on nanosystem assembly useful in research and educational settings.   Next steps include: implementing metadata capture with national and international collaborators of the pilot group; and expanding the reference resource on nanosystem assembly.