CPA-2007 I/O Projects


BUD: A Buffer-Disk Architecture for Energy Conservation in Parallel Disk Systems

Xiao Qin, xqin@cs.nmt.edu, Department of Computer Science, New Mexico Institute of Mining and Technology, Socorro, NM 87801

HEC Topics: Management and RAS

Keywords: Power Management, Energy Aware

Parallel disks consisting of multiple disks with high-speed switched interconnect are ideal for data-intensive applications running in high-performance computing systems. Improving the energy efficiency of parallel disks is an intrinsic requirement of next generation high-performance computing systems, because a storage subsystem can represent 27% of the energy consumed in a data center. However, it is a major challenge to conserve energy for parallel disks and energy efficiently coordinate I/Os of hundreds or thousands of concurrent disk devices to meet high-performance and energy-saving requirements. This research investigates novel energy conservation techniques to provide significant energy savings while achieving low-cost and high-performance for parallel disks. In this research project, the investigators take an organized approach to implementing energy-saving techniques for parallel disks, simulating energy-efficient parallel disk systems, and conducting a physical demonstration. This research involves four tasks: (1) design and develop a buffer-disk (BUD) architecture to reduce energy dissipation in parallel disk systems; (2) develop innovative energy-saving techniques, including an energy-related reliability model, energy-aware data partitioning, disk request processing, data movement, data placement, prefetching strategies, and power management for buffer disks; (3) implement a simulation toolkit (BUDSIM) used to develop a variety of energy-saving techniques and their integration in the BUD architecture; and (4) validate the BUD architecture along with our innovative energy-conservation techniques using real data-intensive applications running on high-performance clusters. This research can benefit society by developing economically attractive and environmentally friendly parallel disk systems, which are able to lower electricity bills and reduce emissions of air pollutants. Furthermore, the BUD architecture and the energy-conservation techniques can be transferable to embedded disk systems, where power constraints are more severe than conventional disk systems.

Algorithms Design and Systems Implementation to Improve Buffer Management for Fast I/O Data Accesses

Zhang, Xiaodong Ohio State University Research Foundation Jiang, Song Wayne State University

HEC topics: Measurement and Understanding

Keywords: using disk layout to improve buffer cach

Although processor cycles, memory size, and disk capacity all become increasingly abundant, there is still a serious deficiency in the system support for handling data-intensive applications, which is the long latency of hard disk accesses, measured by the time to get the first byte of requested data. This latency improvement has significantly lagged behind other system component improvement, including disk peak bandwidth. To address this critical issue, the investigators will develop new and efficient buffer cache management systems that adapt to the dramatic technology changes and the high demand of data-intensive applications with complicated access patterns. Aiming at making the memory buffer as a truly effective agent between the requests from applications and services provided by disks, the investigators will leverage the cache and prefetch mechanisms in the memory buffer to improve effective I/O system performance, perceived by applications, by minimizing the cost (both energy and time) of expensive disk accesses. A unique approach to be adopted in the research is to put the disk layout information directly on the map of buffer management and effectively integrate both temporal and spatial localities. The investigators will design and implement a system infrastructure that analyzes and exploits data layout information on disks. With this critical system support, the investigators will further design and implement dual-side-aware memory buffer management algorithms that adapt to characteristics exhibited at both programs' side and disks' side.

High Throughput I/0 for Large Scale Data Repositories

Ali Saman Tosun, Department of Computer Science, University of Texas at San Antonio, San Antonio, TX, 78249

  • Link to Project Site
  • HEC topics: Metadata

    Keywords: Declustering, high dimensional data

    ABSTRACT

    Declustering has attracted a lot of interest over the last few years and has applications in many areas including high-dimensional data management, geographical information systems and scientific visualization. Most of the declustering research have focused on spatial range queries and finding schemes with low worst-case additive error. This research investigates various aspects of declustering including novel declustering schemes, replicated declustering, heterogeneous declustering, adaptive declustering and declustering using multiple databases. The investigators approach every issue both theoretically and practically, study what is theoretically possible, what can be achieved in practice and try to close the gap between the two. The investigators study novel declustering schemes with solid theoretical foundations including number-theoretic declustering and design-theoretic declustering. Replication strategies for various types of queries including spatial range queries and arbitrary queries are studied. Retrieval algorithm for design-theoretic replication has linear complexity and guarantees worst-case retrieval cost. The investigators study tradeoffs in retrieval between complexity and retrieval cost and develop a suite of protocols for retrieval. This research involves adaptive declustering schemes that adapt to disk failures, disk additions and changing query types by moving buckets between disks during idle

    Object Based Caching for MPI-IO

    Dr. Phillip M. Dickens Department of Computer Science, University of Maine

    HEC topics: Small unaligned I/O, Next generation middleware

    As the size of large-scale computing clusters increases from thousands to tens of thousands of nodes, the challenge of providing high-performance parallel I/O to MPI applications executing in such environments becomes increasingly important and difficult. There are many factors that make this problem so challenging. The most often cited difficulties include the I/O access patterns exhibited by scientific applications (e.g., non-contiguous I/O), poor file system support for parallel I/O optimizations, strict file consistency semantics, and the latency of accessing I/O devices across a network. However, we believe that a more fundamental problem, whose solution would help alleviate all of these challenges, is the legacy view of a file as a linear sequence of bytes. The problem is that application processes rarely access data in a way that matches this file model, and thus a large component of the scalability problem is the cost of dynamically translating between the process data model and the file data model. In fact, the data model used by applications is more accurately defined as an object model, where each process maintains a collection of (perhaps) unrelated objects. We believe that aligning these different data models will significantly enhance the performance of parallel I/O for large-scale, data-intensive applications. This research is developing the infrastructure to merge the power and flexibility of the MPI-IO parallel I/O interface with a more powerful object-based file model. Toward this end, we are developing an object-based caching system that serves as an interface between MPI applications and object-based files. The object-based cache is based on MPI file views, or, more precisely, the intersections of such views. These intersections, which we term objects, identify all of the file regions within which conflicting accesses are possible and (by extension) those regions for which there can be no conflicts (termed shared-objects and private-objects respectively). This information will be leveraged by the runtime system to maximize the parallelism of file accesses and minimize the cost of enforcing strict file consistency semantics and global cache coherence. In this way, the performance and scalability characteristics of large-scale, data-intensive MPI applications will be significantly enhanced.

  • A High Throughput Massive I/O Storage Hierarchy for PETA-scale High-end Architectures
    Gao, Guang R University of Delaware
  • A High Throughput Massive I/O Storage Hierarchy for PETA-scale High-end Architectures

    Gao, Guang R University of Delaware

    HEC topics: Small unaligned I/O, Next generation middleware

    Mihailo Kaplarevic Brian Lucas Ziang Hu Guang R. Gao

    There has been significant progress in the research and development of modern high-end computer (HEC) architecture that is comprised of tens-of-thousands of processors or more. This has widened the performance gap between the computing power and the storage and I/O performance that can support and sustain such calculations. This gap presents a great challenge for the scalability of future parallel I/O architecture models and I/O middleware support. To address these challenges, we propose a new I/O architecture model in which each node has a dedicated high-bandwidth connection to its own local solid state storage (FLASH memory). We will propose and develop an I/O middleware model and software support that will exploit the features of the proposed I/O architecture model. We will also develop new management and RAS (reliability, accessibility and serviceability) capabilities that can scale to the new peta-scale architecture. Two flash memories will be visible to each node, the local flash memory and a neighboring node’s flash memory that will keep a backup copy. Dedicated service agents make this dual connection configuration transparent to nodes, by managing the traffic according to priority, current usage, and availability. We will implement the proposed solutions by leveraging the extension of an experimental HEC system software testbed to simulate the proposed I/O architecture and middleware models as well as the RAS support. We also plan to demonstrate the effectiveness of our proposal for the most common set of third party I/O benchmarks.

    HEC FSIO Website designed and hosted by Los Alamos National Laboratory.
    Email Contacts: Project Leaders :: Webmaster