Los Alamos National Laboratory
 
 

Science >  LANL Institutes >  Information Science and Technology Institute

National Security Education Center

Contacts

2013 Papers
2012 Papers
2011 Papers
2010 Papers
2009 Papers
2008 Papers
2007 Papers
Theses
Chart displaying papers published each year by Institutelet

    2013 Papers

  • Shingled Magnetic Recording: Areal Density Increase Requires New Data Management

    Shingled Magnetic Recording: Areal Density Increase Requires New Data Management

    Shingled Magnetic Recording (SMR) is the next technology being deployed to increase areal density in hard disk drives (HDDs). The technology will provide the capacity growth spurt for the teens of the 21st century. SMR drives get that increased density by writing overlapping sectors, which means sectors cannot be written randomly without destroying the data in adjacent sectors. SMR drives can either maintain the current model for HDDs by performing data retention behind the scenes, or expose the underlying sector layout, so that file system developers can develop SMR-aware file systems.


    2012 Papers

  • IEEE Cluster 2012: The Power and Challenges of Transformative I/O
    Garth Gibson, Adam Manzanares, Meghan McClelland, John Bent

    Extracting high data bandwidth and metadata rates from parallel file systems is notoriously difficult. User workloads almost never achieve the performance of synthetic benchmarks. The reason for this is that real-world applications are not as well-aligned, well-tuned, or consistent as are synthetic benchmarks. There are at least three possible ways to address this challenge: modification of the real-world workloads, modification of the underlying parallel file systems, or reorganization of the real-world workloads using transformative middleware. In this paper, we demonstrate that transformative middleware is applicable across a large set of high performance computing workloads and is portable across the three major parallel file systems in use today. We also demonstrate that our transformative middleware layer is capable of improving the write, read, and metadata performance of I/O workloads by up to 150x, 10x, and 17x respectively, on workloads with processor counts of up to 65,536.


  • Statically Checking API Protocol Conformance with Mined Multi-Object Specifications
    Jonathan Aldrich, Ciera Jaspan

    Programmers using an API often must follow protocols that specify when it is legal to call particular methods. Several techniques have been proposed to find violations of such protocols based on mined specifications. However, existing techniques either focus on single-object protocols or on particular kinds of bugs, such as missing method calls.  There is no practical technique to find multi-object protocol bugs without a priori known specifications. In this paper, we combine a dynamic analysis that infers multi-object protocols and a static checker of API usage constraints into a fully automatic protocol conformance checker. The combined system statically detects illegal uses of an API without human-written specifications. Our approach finds 41 bugs and code smells in mature, real-world Java programs with a true positive rate of 51%. Furthermore, we show that the analysis reveals bugs not found by state of the art approaches.


    2011 Papers

  • DiskReduce: Replication as a Prelude to Erasure Coding in Data-Intensive Scalable Computing.
    Garth Gibson

    The first generation of Data-Intensive Scalable Computing file systems such as Google File System and Hadoop Distributed File System employed n replications for high data reliability, therefore delivering users only about 1/n of the total storage capacity of the raw disks. This paper presents DiskReduce, a framework integrating RAID into these replicated storage systems to significantly reduce the storage capacity overhead, for example, from 200% to 25% when triplicated data is dynamically replaced with RAID sets (e.g. 8 + 2 RAID 6 encoding). Based on traces collected from Yahoo!, Facebook and Opencloud cluster, we analyze (1) the capacity effectiveness of simple and not so simple strategies for grouping data blocks into RAID sets; (2) implication of reducing the number of data copies on read performance and how to overcome the degradation; and (3) different heuristics to mitigate "small write penalties." Finally, we introduce an implementation of our framework that has been built and submitted into the Apache Hadoop project. 


  • FAST'11—Scale and Concurrency of GIGA+: File System Directories with Millions of Files
    Garth Gibson, Swapnil Patil
  • HPDC '11—Six Degrees of Scientific Data: Reading Patterns for Extreme Scale Science IO
    Garth Gibson, Milo Polte

    @inproceedings{lofstead:hpdc11,
    Address = {San Jose, CA},
    Author = {Jay Lofstead and Milo Polte and Garth A. Gibson and Scott Klasky and Karsten Schwan and Ron A. Oldfield and Matthew Wolf and Qing Liu},
    Booktitle = {HPDC '11},
    Month = {June 8-11},
    Title = {Six Degrees of Scientific Data: Reading Patterns for Extreme Scale Science IO},
    Year = {2011}}


  • PLATEAU11: Are Object Protocols Burdensome?
    Jonathan Aldrich, Ciera Jaspan

    Object protocols are a commonly studied research problem, but there is little known about their usability in practice. In particular, there is little research to show that object protocols cause difficulty for developers. In this work, we use community forums to find empirical evidence that object protocols are burdensome for developers. We analyzed 427 threads from the Spring and ASP.NET forums and discovered that 69 were on a protocol violation. We found that violations of protocols result in unusual runtime behavior rather than exceptions in 45% of our threads, that questions took an average of 62 hours to resolve, and that even though 54% of questions were repeated violations of similar protocols, the manifestation of the violation at runtime was different enough that developers could not search for similar questions. 


  • Principles of Operation for Shingled Disk Devices
    Garth Gibson, Greg Ganger
  • SC11: On the Duality of Data-intensive File System Design: Reconciling HDFS and PVFS
    Garth Gibson, Swapnil Patil

    Data-intensive applications fall into two computing styles: Internet services (cloud computing) or high-performance computing (HPC). In both categories, the underlying file system is a key component in scalable application performance. In this paper, we explore the similarities and differences between PVFS, a parallel file system used in HPC at large scale, and HDFS, the primary storage system used in cloud computing with Hadoop. We integrate PVFS into Hadoop and compare its performance to HDFS using a set of data-intensive computing benchmarks. We study how HDFS-specific optimizations can be matched using PVFS and how consistency, durability, and persistence tradeoffs made by these file systems affect application performance. We show how to embed multiple replicas into a PVFS file, including a mapping with a complete copy local to the writing client, to emulate HDFS's file layout policies. We also highlight implementation issues with HDFS's dependence on disk bandwidth and benefits from pipelined replication.


  • SOCC'11—YCSB++ : Benchmarking and Performance Debugging Advanced Features in Scalable Table Stores
    Garth Gibson, Julio Lopez, Swapnil Patil, Milo Polte

    Inspired by Googleʼs BigTable, a variety of scalable, semi-structured, weak-semantic table stores have been developed and optimized for different priorities such as query speed, ingest speed, availability, and interactivity. As these systems mature, performance benchmarking will advance from measuring the rate of simple workloads to understanding and debugging the performance of advanced features such as ingest speed-up techniques and function shipping filters from client to servers. This paper describes YCSB++, a set of extensions to the Yahoo! Cloud Serving Benchmark (YCSB) to improve performance understanding and debugging of these advanced features. YCSB++ includes multi-tester coordination for increased load and eventual consistency measurement, multi-phase workloads to quantify the consequences of work deferment and the benefits of anticipatory configuration optimization such as B-tree pre-splitting or bulk loading, and abstract APIs for explicit incorporation of advanced features in benchmark tests. To enhance performance debugging, we customized an existing cluster monitoring tool to gather the internal statistics of YCSB++, table stores, system services like HDFS, and operating systems, and to offer easy post-test correlation and reporting of performance behaviors. YCSB++ features are illustrated in case studies of two BigTable-like table stores, Apache HBase and Accumulo, developed to emphasize high ingest rates and fine-grained security. 


    2010 Papers

  • IASDS10/Cluster10: pWalrus: Towards Better Integration of Parallel File Systems into Cloud Storage
    Garth Gibson

    Amazon S3-style storage is an attractive option for clouds that provides data access over HTTP/HTTPS. At the same time, parallel file systems are an essential component in privately owned clusters that enable highly scalable dataintensive computing. In this work, we take advantage of both of those storage options, and propose pWalrus, a storage service layer that integrates parallel file systems effectively into cloud storage. Essentially, it exposes the mapping between S3 objects and backing files stored in an underlying parallel file system, and allows users to selectively use the S3 interface and direct access to the files. We describe the architecture of pWalrus, and present preliminary results showing its potential to exploit the performance and scalability of parallel file systems. 


  • ICER10: Analyzing the Strength of Undergraduate Misconceptions about Software Engineering
    Ciera Jaspan

    While many computer science students plan to pursue careers as software engineers, research shows that most traditional undergraduate CS programs fail to prepare students for the realities of programming in industry. Many misconceptions that are interfering with the transition to industry are belief-oriented, not skill-oriented, in nature, so traditional misconception assessments will not yield a deep understanding of them. In this paper we present a novel methodology that shows interactions among the misconceptions based on a forced choice paradigm and reveals the relative strength of the misconceptions. By analyzing studentsʼ repeated responses and response times, we construct a model of participantsʼ misconceptions. We used this methodology to assess CS undergraduates at Carnegie Mellon University and compared their results to those from industry practitioners at several highly regarded companies. The results show that the students have misconceptions about process and teamwork. Surprisingly, we found that several misconceptions are correlated with elective courses that we expected to weaken misconceptions about software engineering but instead appeared to strengthen them. 


  • OOPSLA/SPLASH2010—Verifying Configuration Files
    Jonathan Aldrich, Ciera Jaspan, Steve Painter, Dave Montoya

    Won first place in the OOPSLA/SPLASH ACM student research competition.

    SPLASH, Reno 2010, October 17-21


  • Scale and Concurrency of GIGA+: File System Directories with Millions of Files
    Garth Gibson, Swapnil Patil

    2009 Papers

  • ...And eat it too: High read performance in write-optimized HPC I/O middleware file formats
    Garth Gibson, Scott A. Klasky, Qing Liu, Manish Parashar, Karsten Schwan, Matthew Wolf, Milo Polte, Gerald "Jay" Lofstead, John Bent

    Paper accepted at the 4th Petascale Data Storage Workshop Supercomputing '09 held November 15, 2009 in Portland Oregon.


  • Checking Framework Interactions with Relationships
    Jonathan Aldrich, Ciera Jaspan
  • IEEE Cluster 2009: Overlapped Checkpointing with Hardware Assist
    Jun Wang, James Nunez, Christopher Mitchell
  • In Search of an API for Scalable File Systems: Under the Table or Above It?
    Garth Gibson, Greg Ganger, Julio Lopez, Milo Polte
  • Microsoft Research eScience WS2009: Understanding and Maturing the Data-Intensive Scalable Computing Storage Substrate
    Garth Gibson, Milo Polte, Swapnil Patil, Lin Xiao, Wittawat Tantisiriroj

    Modern science has available to it, and is more productively pursued with, massive amounts of data, typically either gathered from sensors or output from some simulation or processing. The table below shows a sampling of data sets that a few scientists at Carnegie Mellon University have available to them or intend to construct soon. Data Intensive Scalable Computing (DISC) couples computational resources with the data storage and access capabilities to handle massive data science quickly and efficiently. Our topic in this extended abstract is the effectiveness of the data intensive file systems embedded in a DISC system. We are interested in understanding the differences between data intensive file system implementations and high performance computing (HPC) parallel file system implementations. Both are used at comparable scale and speed. Beyond feature inclusions, which we expect to evolve as data intensive file systems see wider use, we find that performance does not need to be vastly different. A big source of difference is seen in their approaches to data failure tolerance: replication in DISC file systems versus RAID in HPC parallel file systems. We address the inclusion of RAID in a DISC file system to dramatically increase the effective capacity available to users. This work is part of a larger effort to mature and optimize DISC infrastructure services.


  • PDSW09: DiskReduce: RAID for Data-Intensive Scalable Computing
    Garth Gibson

    Data-intensive file systems, developed for Internet services and popular in cloud computing, provide high reliability and availability by replicating data, typically three copies of everything. Alternatively high performance computing, which has comparable scale, and smaller scale enterprise storage systems get similar tolerance for multiple failures from lower overhead erasure encoding, or RAID, organizations. DiskReduce is a modification of the Hadoop distributed file system HDFS) enabling asynchronous compression of initially triplicated data down to RAID-class redundancy overheads. In addition to increasing a cluster's storage capacity as seen by its users by up to a factor of three, DiskReduce can delay encoding long enough to deliver the performance benefits of multiple data copies. 


  • Retrieving Relationships from Declarative Files
    Jonathan Aldrich, Ciera Jaspan
  • SC09: PLFS: A Checkpoint Filesystem for Parallel Applications
    Garth Gibson, Milo Polte, John Bent, Gary Grider, James Nunez
  • Software Mythbusters Explore Formal Methods
    Ciera Jaspan

    Anthony Hallʼs “Seven Myths of Formal Methods” (IEEE Software Sept./Oct 1990, pp. 11–19) is the subject of the final installment of the updates on the seven distinguished articles selected from the Software editorial boardsʼ 25th- anniversary top picks list. We decided to try a different format for this update. When Mary Shaw told me about her plan to cover Hallʼs piece in her graduate course, I asked Shaw and her students and colleagues at Carnegie Mellon Universityʼs Institute for Software Research to expand and refocus their discussion with a slant to produce this update. Hall graciously provided them with irreplaceable guidance direct from the source. Hereʼs the result of that stimulating intellectual effort. —Hakan Erdogmus, Editor in Chief 


  • USENIX HotCloud09 In search of an API for scalable file systems: Under the table or above it?”
    Garth Gibson, Milo Polte
  • WISH at ASPLOS XIV Enabling Enterprise Solid State Disks Performance
    Garth Gibson, Milo Polte

    n this paper, we examine two modern enterprise Flash-based solid state devices and how varying usage patterns influence the performance one observes from the device. We observe that in order to achieve peak sequential and random performance of an SSD, a workload needs to meet certain criteria such as high degree of concurrency. We measure the performance effects of intermediate operating system software layers between the application and device, varying the filesystem, I/O Scheduler, and whether or not the device is accessed in direct mode. Finally, we measure and discuss how device performance may degrade under sustained random write access across an SSDʼs full address space. 




Theses and Dissertations

Proper Plugin Protocols
Ciera Jaspan, Ph.D. Dissertation, December 2011
University Advisor: Jonathan Aldrich
LANL Mentors: Dave Montoya, Steve Painter

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Inside | © Copyright 2008-09 Los Alamos National Security, LLC All rights reserved | Disclaimer/Privacy | Web Contact