Mike Wilde
Argonne National Laboratory
The Virtual Data Grid: A New Model and Architecture for Data-Intensive
Collaboration
The NSF-sponsored Grid Physics Network project � GriPhyN � couples four
large-scale experiments in physics and astrophysics (ATLAS, CMS, LIGO, and SDSS)
with a team of computer scientists, to create approaches to harness the Grid for
data intensive scientific collaborations. GriPhyN�s goal is to create a �Virtual
Data Grid� which enables scientists to leverage large-scale Grid resources with
greater ease, creating a powerful environment for data analysis and management
that facilitates discovery, collaboration, composition, and validation.
Virtual data describes in an abstract, machine-independent manner the graph of
data transformations that are applied to an input set of data objects to create
some specific output set of objects. Such descriptions serve both as a log of
what has been done, and a specification of future work to perform. Virtualizing
data helps us better understand what the data represents � both its meaning and
its validity. Applying a virtual data approach scientific data analysis enables
us to create an environment in which the derived data objects in a scientific
data collection are automatically annotated with a precise description of how
each object was produced � recording input datasets, transformations, and output
datasets for each result. We express these relationships in a virtual data
language, VDL, whose definitions are stored in virtual data catalogs. These
catalogs can be queried to find data-production �recipes� based on datasets,
argument values, or other metadata. Using these recipes as patterns, new data
analysis avenues and approaches can be explored, documented, validated,
cataloged, and shared.
This talk will describe the underlying concepts of virtual data, our current
implementations of a virtual data toolkit, and our experiences in applying these
tools and methods to several Grid-based challenge problems in the GriPhyN
experiments and related data intensive efforts.