DISC is operational and accepting users with jobs that involve large datasets who would like to explore the possibilities offered by a cluster designed for data intensive applications. If you are interested in running a job on DISC contact John Bent.
The IS&T Data Exploratory is a unique data intensive supercomputing facility designed to explore the challenges of working with large datasets. Sensors, Internet packet filters, telescopes and satellites all generate massive amounts of data. To solve problems in areas like energy security, bio-security, and cosmology we need effective ways to analyze this data. The Data Exploratory will provide a trial facility in an open environment where researchers can experiment with data intensive methods.
The exploratory will have two 10GigE connections to bring in data: one to the yellow network and one to the turquoise network. The data will be stored on a distributed file system of hard drives local to compute nodes. The focus of the facility will be providing an infrastructure to effectively support data intensive applications.
One of the challenges of large datasets is representing the data in a manageable format that scientists can learn from. The IS&T Data Exploratory will include a visualization environment to provide an alternative method for looking at results from data intensive applications. The goal is to find new ways of using visualization for information applications, to provide an abstraction of the data rather than a simulation of it.
Cosmology simulations have the ability to generate petabytes, even exabytes of data. Once the data is generated it must be processed in order to provide useful knowledge; which is a heavily I/O bound operation. One such cosmology application, uses a clustering technique to highlight interesting regions of the simulation space. Shown in the figure:
A present and mounting concern at the lab now is Cybersecurity. one way to insure lab security is to analyze all lab network traffic. With so much I/O traffic generated on a daily basis, analysis becomes very data-intensive.
This example of image processing takes a photo and identifies potential threats. While this example is simplistic, the idea can easily be expanded to something more complex like threat analysis on GIS data or even real-time video, which becomes much more data-intensive.