Los Alamos National Laboratory
Sitemap  |   Lab Home  |   Phone
 
 
Computer Science Research :: fsstats-Data :: FAQ

Fsstats Data to Support and Enable Computer Science Research: FAQ

To enable computer science research Los Alamos National Laboratory is releasing static file tree data (fsstats) for some of our parallel filesystems. These data include aggregate information on capacity, file and directory sizes, filename lengths, link counts, etc. The fsstats cover 9 anonymous parallel filesystems ranging from 16 TB to 439 TB total capacity used and file counts range from 2,024,729 to 43,605,555. The following sets of data are provided by Los Alamos National Laboratory under universal release to any computer science researcher to use to enable computer science work.

If you use these data in your research please recognize Los Alamos National Laboratory for providing these data.

The dataset may be found at the Los Alamos National Laboratory Fsstats data release page.

README: For details on the results and what each field means, see the README. For more information on the fsstats project in general, see the PDSI fsstats page.

Table of Contents

  1. Can you tell me a little bit about how these filesystems are typically used?
  2. Can we compare these filesystems to the 2008 fsstats released by LANL?
  3. Why is there always 1 entry per directory for the entire filesystem?
  4. Why is the negative overhead for anonymous filesystem 6 so large?
  5. For More Information

  1. Can you tell me a little bit about how these filesystems are typically used?

    Group 1: these users tend to have N-N codes, they write tons of tiny files
    Group 2: you see a lot more N-1 codes, thus fewer files than Group 1 typically has
    Group 3: test systems - non-production runs meant to test or debug

    Really any "Group" can run on any system, but here are the trends:
    Filesystems 6 and 7 tend to be Group 1 usage, while 1-5 and 8-9 are more Group 2.

    Table of Contents

  2. Can we compare these filesystems to the 2008 fsstats released by LANL?

    For the 2008 fsstats there are two Group 2 filesystems and one (panscratch1.csv) is from Group 3. Comparing data across groups may not be useful.

  3. Why is there always 1 entry per directory for the entire filesystem?

    This was a mistake and has been fixed. Thanks to Yifan Wang and the CMU PDL team for finding and reporting this issue!

  4. Why is the negative overhead for anonymous filesystem 6 so large?

    Almost all of the negative overhead for anonymous filesystem 6 comes from 2(!) files from the same user.
    File1 1029403328512 bytes = 958 GB
    File2 58650690272 = 54.6 GB
    So in total 1012 GB from the same user.

  5. Table of Contents

  6. For More Information
    • If you still have unanswered questions, please contact