Petascale Data Storage Institute

Recent Success Stories

File Systems Research Tools

LANL has recently built a prototype Parallel Log Structured File System to address parallel small strided writes N processed to 1 file. This software may increase speeds for HPC checkpoints for small strided I/O N to 1 patterns significantly. Click here

File Systems Statistics Data Release

LANL has recently released file systems statistics data for its scratch file systems. This data outlines the number of files and the shape of the file systems for supercomputer scratch file systems. LANL has released similar data for thousands of workstations across the lab. Click here

LANL Parallel I/O Trace Data Release

LANL has recently released Parallel I/O traces to assist researchers. These traces are of parallel I/O benchmarks that do representative I/O patterns for some supercomputer applications. Click here

LANL Computer Failure Data Release

LANL started a recent trend in computer failure data release. LANL released nearly 10 years of failure, availability, and usage data for over 20 supercomputers. In some cases, the data released comprises the complete life of a machine. Before the LANL release, previous failure data releases of any size were from the Digital VAX era and for small numbers of VAX machines for a few months. The LANL release dwarfs all past releases with over 23,000 interrupts over 9 years on thousands of machines. Additionally the inclusion of usage data along with the failure data is a big plus. The LANL failure data has already been used by researchers at Carnegie-Mellon University, Garth Gibson and Bianca Schroeder, for two papers, "A Large Scale Study of Failures in High-performance Computing Systems" and "Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You?". The paper analyzing the LANL disk failures won best paper award at the recent File and Storage Technologies conference (FAST) in San Jose, CA, and has already made a splash with disk vendors and the non-HPC community with two articles based on the research in the electronic version of Computer World; "Disk Drive Failures 15 Times What Vendors Say, Study Says" and "Hard Data". Some interesting findings from the CMU paper include the traditionally accepted belief of failures takes the shape of a bathtub curve is false, just steadily rising annual replacement rates, the manufacturer's MTBF numbers are overstated by a factor of 2-10 even for new drives, but MTBF for 5-8 year old drives is some 30 times lower than advertised, and failure rates between disk technologies (SATA/SCSI/FC) are very similar contrary to vendor assertions and popular belief. Click here Additionally, other papers from UCSC and Colorodo School of Mines have also been written. This huge release and subseqent papers have started an outpouring of data relases from other sites.

So many sites are giving data now, USENIX has started an index site to showcase these data.

The USENIX site

LANL PDSI Website designed and hosted by
Institutes Office at Los Alamos National Laboratory.

Email Contacts: Gary Grider, James Nunez, John Bent