Los Alamos National Laboratory
 
 

Science >  LANL Institutes >  Information Science and Technology Institute

National Security Education Center

Contacts

IS&T Data Exploratory - Data Intensive Supercomputing at LANL

DISC Ready for Users

DISC is operational and accepting users with jobs that involve large datasets who would like to explore the possibilities offered by a cluster designed for data intensive applications. If you are interested in running a job on DISC contact John Bent.

About DISC

The IS&T Data Exploratory is a unique data intensive supercomputing facility designed to explore the challenges of working with large datasets.  Sensors, Internet packet filters, telescopes and satellites all generate massive amounts of data.  To solve problems in areas like energy security, bio-security, and cosmology we need effective ways to analyze this data.  The Data Exploratory will provide a trial facility in an open environment where researchers can experiment with data intensive methods.

The exploratory will have two 10GigE connections to bring in data: one to the yellow network and one to the turquoise network.  The data will be stored on a distributed file system of hard drives local to compute nodes.  The focus of the facility will be providing an infrastructure to effectively support data intensive applications.

One of the challenges of large datasets is representing the data in a manageable format that scientists can learn from.  The IS&T Data Exploratory will include a visualization environment to provide an alternative method for looking at results from data intensive applications.  The goal is to find new ways of using visualization for information applications, to provide an abstraction of the data rather than a simulation of it. 

DISC for Cosmology

Cosmology simulations have the ability to generate petabytes, even exabytes of data. Once the data is generated it must be processed in order to provide useful knowledge; which is a heavily I/O bound operation. One such cosmology application, uses a clustering technique to highlight interesting regions of the simulation space. Shown in the figure:

  • Map phase: Simulation area is divided into smaller datasets and regions of interest are identified
  • Reduce phase: Simulation space is reassembled into one file with areas of interest highlighted for the user.

DISC for Cyber Security

A present and mounting concern at the lab now is Cybersecurity. one way to insure lab security is to analyze all lab network traffic. With so much I/O traffic generated on a daily basis, analysis becomes very data-intensive.

  • Map Phase: Network traffic is streamed into the application and split among multiple TaskTrakers. Threats are identified and recorded in each map task.
  • Reduce Phase: The malicious activity identifified by the map tasks are collected into one file and delivered to cyber-security personnel in real time.

DISC for Image Processing

This example of image processing takes a photo and identifies potential threats. While this example is simplistic, the idea can easily be expanded to something more complex like threat analysis on GIS data or even real-time video, which becomes much more data-intensive.

  • Map Phase: Similar to Cosmology, the image is chunked into smaller regions. These regions are then processed to identify potential threats.
  • Reduce Phase: The image is reassembled with the identified threat areas easily identifiable to the user.
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
Inside | © Copyright 2008-09 Los Alamos National Security, LLC All rights reserved | Disclaimer/Privacy | Web Contact