XStack-2010: Abstracts for X-Stack Software Research 2010 FSIO projects
- The Damsel Project: A Data Model Storage Library for Exascale Science
Alok Choudhary, Northwestern University
Rob Latham, Argonne National Lab
Nagiza Smatova, North Carolina State University
Quincey Koziol, The HDF Group
- HECFSIO Topic: Next Generation I/O Architectures
Keywords: I/O Middleware
Computational science applications are steadily increasing the complexity of grids, solution methods on those grids, and data that link the two together on modern petascale computers. Several common motifs can be described, having distinct grid types and computation/communication patterns; examples include structured AMR, unstructured finite element/volume, and spectral elements. From a storage and I/O perspective, these applications exhibit distinct data organization and access patterns during simulation, analysis, and visualization. However, these codes continue to interact with I/O data libraries much the same as they have since the 1990s, in terms of relatively low-level data structures like vertex positions, connectivity arrays, and multi-dimensional solution data arrays. Although these high-level I/O libraries have had a beneficial impact on parallel applications in terms of read/write performance, the impedance mismatch with application data models is growing as applications go to more complex data models and interactions. This mismatch is making it more difficult to achieve I/O performance close to system peak capabilities, while also not supporting the full range capabilities in today’s computational science data models.
A different approach is needed, that leverages performance improvements and best practices learned from previous approaches, while raising the level of interaction with application data models and access patterns. As the International Exascale Software Project (IESP) report observes, “[...] The purpose of I/O by an application can be a very important source of information that can help scalable I/O performance when hundreds of thousands (to millions) of cores simultaneously access the I/O system [...]” In other words, the high-level view of the data model is overlooked rather than exploited. Also, the data layout used in these codes and how that layout interacts with I/O software used to save the data to or read the data from storage systems are highly relevant. Increasing the complexity with which applications can interact with I/O libraries will reduce the impedance mismatch between the two, while also streamlining the I/O process by reducing unnecessary data copying between applications and I/O libraries. This process will be simplified further by developing customized interfaces and formats, or “verticals”, for the data model motifs used in today’s petascale applications. This model represents the underlying approach of our project.
The goal for Damsel is to enable Exascale computational science applications to interact conveniently and efficiently with storage through abstractions that match their data models. We are pursuing four major activities: (1) constructing a unified, high-level data model that maps naturally onto a set of data model motifs used in a representative set of high-performing computational science applications; (2) developing a data model storage library, called Damsel, that supports the unified data model, provides efficient storage data layouts, incorporates optimizations to enable exascale operation, and is tolerant to failures; (3) assessing the performance of this approach through the construction of new I/O benchmarks or the use of existing I/O benchmarks for each of the data model motifs; and (4) productizing Damsel and working with computational scientists to encourage adoption of this library by the scientific community.