Los Alamos National Laboratory
 
 

Science >  LANL Institutes >  Information Science and Technology Institute

National Security Education Center

Contacts

Information Science & Technology Institute (ISTI) Metrics

Collaborative projects

Current

  • Title: Data Intensive Super Computing for Science
    Institute: ISTI IRHPIT
    Description:

    This project explores how well suited data intensive computing programming/run time paradigms like map reduce and other graphs apply to scientific applications.

  • Title: File System and File Structure
    Institute: ISTI IRHPIT
    Description:

    The concept is to head HPC file system storage towards file formats in the file system.  It is quite possible that many file types like N processes to 1 file with small strided writes might be served well by special handling at the file system level.  Decades ago, file types and access methods were used and were supported within a single file system.  The IBM MVS storage systems allowed for many different file types, partitioned data sets, indexed sequential, virtual sequential, and sequential to name a few.  Storage for modern HPC systems may benefit from a new parallel/scalable version of file types.  There is much research to be done in this area to determine the usefulness of this concept and how such a thing would work with modern supercomputers and future HPC languages and operating environments.

  • Title: Large-Scale Inverse Problems
    Institute: ISTI ISAMI
    Description:

    Our research focus is on Monte-Carlo methodology for large systems of equations. These may arise in inverse problems, DP, or other ... but we are focusing on inverse problems and (to a lesser extent) DP for the moment.

  • Title: Wireless Sensor Networks
    Institute: ISTI IRWIN
    Description:

    We are creating a clone of the OSU testbed capability in a long-running outdoor surveillance testbed at LANL.  The project will address energy efficiency concerns, in particular, maximizing efficiencies across routing and MAC.

  • Title: Video Image Segmentation and Tracking
    Institute: ISTI IRWIN
    Description:

    We are using an active PTZ camera data collection to study patterns and trends in outdoor scenes over long durations. Data is collected in a persistent manner using multiple cameras and locations at different times of day and day of week. The video images are used to track the trajectories of people and extract common pathways and routes by clustering and segmenting the trajectories.

  • Title: Statistical Physics for Computing and Communications
    Institute: ISTI ISAMI
    Description:

    This project exploits methods from statistical physics to provide fundamental advances in computing and communication systems. The intersection of computer science, information theory and statistical physics has seen a recent explosion of activity, resulting in new algorithms and new methods of analysis. Discrete computational challenges including constraint satisfaction, error correction and communication network performance have benefited from techniques and insights offered by statistical physics. Physics, at the same time, has been significantly enriched by approaches from discrete computation, such as message-passing algorithms.

  • Title: Energy Infrastructure: Smart Grids
    Institute: ISTI ISAMI
    Description:

    -Grid Design - develop efficient and robust integration of geographically distributed renewable generation by solving cutting-edge optimization problems that incoporate difficult constraints imposed by the the transmission of electrical power.
    -Grid Control - develop distributed control techniques and algorithms based on queuing theory to enable high penetration of small-scale distributed generation and plug-in hybrid vehicles.
    -Grid Stability - develop a toolbox capable of detecting instabilities and failures for preventing costly outages.

  • Title: Filtering Internet Information for Use in Biothreat Scenarios
    Institute: ISTI ISSDM
    Description:

    We propose to apply information technology developed for personalized recommendation sys- tems and collaborative filtering (e.g. amazon.com’s ”Customers who bought this also bought” or Netflix’s ”You might also like”) to ill-defined scientific issues where choosing the most relevant information from a large amount of slightly-relevant information becomes literally of vital importance. The scenario is an outbreak of an infectious pathogen, where the nature of the pathogen is known through previous analysis. The question then becomes: what to do next? Quarantine, triage, treat, limit further exposure, locate drug and vaccine supplies, determine their likely efficacy and the most efficient way to use them as well as the time they will take to arrive, find subject experts, identify possible origins, and countermeasures, etc. Much information that can be highly relevant in this scenario exists on the internet, but locating it and assessing its reliability and relevance is extremely difficult. We believe that personalized recommendation systems can play a role in focusing the searches and finding the most important and urgently needed information.

  • Title: ...And eat it too: High read performance in write-optimized HPC I/O middleware file formats
    Institute: ISTI IRHPIT
    Description:

    As HPC applications run on increasingly high process counts on larger and larger machines, both the frequency of checkpoints needed for fault tolerance and the resolution and size of Data Analysis Dumps are expected to increase proportionally. In order to maintain an acceptable ratio of time spent performing useful computation work to time spent performing I/O, write bandwidth to the underlying storage system must increase proportionally to this increase in the checkpoint and computation size. Unfortunately, popular scientific self-describing file formats such as netCDF and HDF5 are designed with a focus on portability and exibility. Extra care and careful crafting of the output structure and API calls is required to optimize for write performance using these APIs. To provide sufficient write bandwidth to continue to support the demands of scientific applications, the HPC community has developed a number of I/O middleware layers, that structure output into write-optimized file formats. However, the obvious concern with any write optimized file format would be a corresponding penalty on reads. In the log-structured filesystem, for example, a file generated by random writes could be written efficiently, but reading the file back sequentially later would result in very poor performance. Simulation results require efficient read-back for visualization and analytics, and though most checkpoint files are never used, the efficiency of a restart is very important in the face of inevitable failures. The utility of write speed improving middleware would be greatly diminished if it sacrificed acceptable read performance. In this paper we examine the read performance of two write-optimized middleware layers on large parallel machines and compare it to reading data natively in popular file formats.

  • Title: Creating Compact Multi-Resolution Representations of Massive Flow Fields for Efficient and Effective Visualization and Analysis
    Institute: ISTI IRWIN
    Description:

    The goal of this project is to research the area of multi threaded and multi-process parallel rendering on multi-core computers as part of LANL's investigation into using supercomputers for parallel visualization.  We are evaluating the practical feasibility of visualization on large memory, multi-CPU, high bandwidth connected nodes.

  • Title: Managing High-Bandwidth Real-Time Data Storage (Storage Systems for High-Bandwidth, Low-Lifetime Data)
    Institute: ISTI ISSDM
    Description:

    This project was first conceived as a storage system for the Long Wavelength Array (LWA) radio telescope, and subsequently extended to incorporate several other problems with similar themes. The problem is characterized by ultra-high bandwidth data streams in which most data is "useless," but occasionally is "useful" but not recognized as such until a later period. These traits combine to form an odd set of requirements: there is far too much data to retain in the long-term, but which needs to be stored in the short term in order to determine whether or not a portion of it is useful.

    In the case of the LWA, a practical example of this data pattern is that a significant astronomical event is detected at time T. In order to understand as much about the event as possible, scientists might declare that they need all of the data in the ten minutes prior to time T, as well as the data for some amount of time after time T. Similarly, in a cybersecurity example, an intrusion may be detected at time T coming from source A. In order to track the intrusion as closely as possible, network security personnel might declare that they need all IP packets coming from source A prior to time T, as far back as possible. These two examples involve very similar data patterns, but the data itself does not share similar organization. The main difference is in how the data is organized. In one case, a system must save and track large data elements without needing to search on anything more complicated than time. In the other, a system much save and track very small data elements which are highly structured, and needs to search on many aspects of each element, for example: origin, destination, and size. The disparity between these two types of data makes a general-purpose solution much more tricky.

    The primary focus in this research is centered around providing and maintaining quality of service guarantees for real-time data on standard disk drives. This problem is complicated by the addition of other requirements which must be integrated into the quality of service guarantees. Other processes must be able to access the drive, reliability mechanisms must be implemented to ensure data resiliency, the data must be indexed and quickly searchable for when preservation commands are given, and the system must be able to scale upwards to meet increasing bandwidth requirements. None of these extra requirements can be allowed to disrupt the central need for quality of service guarantees on the real-time data stream. This project also has research potential for a number of related secondary goals. Due to the nature of the data collection, an opportunity exists to test how well hard drives perform under unrelenting high- bandwidth writing over months and years. Solid state storage devices are unsuitable for managing this type of high-bandwidth data due to limited write cycles, but may be extremely useful when applied to specific indexing needs only. Non-standard error-correction codes not normally used for standard storage systems may find a new place in a system where every single write is a "large write" and reading, even for repair work, is extremely rare.

  • Title: Convex Optimization Approaches to Nonlinear System Identification
    Institute: ISTI ISAMI
    Description:
  • Title: Incorporating solid state drives into distributed storage systems (Using Solid State Drives in Distributed Storage Systems)
    Institute: ISTI ISSDM
    Description:

    This project examines a promising architecture that uses a limited number of SSDs (solid state drives) to decrease the power consumption and either increase the performance or reliablitility of RAID storage subsystems.  We are researching the use of SSDs for parity storage in a disk-SSD hybrid RAID system.

  • Title: Recursive Parameter Estimation Algorithm
    Institute: ISTI ISSDM
    Description:
  • Title: JPEG 2000 Compression on Scientific Data (Climate Data)
    Institute: ISTI ISSDM
    Description:
  • Title: Human Assisted Computer Vision
    Institute: ISTI ISSDM
    Description:
  • Title: Blackbox
    Institute: ISTI ISSDM
    Description:

    Blackbox is a project to streamline the manipulation of large data sets between computational clusters and long term archival storage. The current UX is difficult to work with, and frustrating for researchers. Since supercomputer time is expensive, and data sets are extremely large in size, this becomes a critical issue.

    The proposed goal is to allow a user to stay on the compute cluster, but offload filesystem operations to a special cluster while keeping in mind the users' perceptions and responses that result from this model. This will let the compute nodes compute, but won't degrade them with file transfers at the expense of computation.

  • Title: Exascale Data Management in File Systems
    Institute: ISTI ISSDM
    Description:

    Today's file systems use interfaces largely conforming to POSIX IO, a standard that was designed in the mid-1960s when high-end file systems stored less than 100MB. Today’s high-end file systems in super-computing environments and search engine and social network companies are up to 7-9 orders of magnitude larger, resulting in numbers of data items for which POSIX abstractions are quite inadequate. With the advent of exascale systems this inadequacy will continue to grow worse. We propose to coalesce the functionality of data analysis, management, and file systems by extending file systems with data management services, including declarative querying, distributed query planning and optimization, automatic indexing, a common data model for most scientific file formats, and provenance tracking that spans multiple layers of abstraction. These services are optimized for predictability, throughput, and low latency while minimizing data movement and power consumption. We will leverage important insights gained by both the file system and the database communities. However, the sheer amount of data managed by file systems and their associated access patterns require a design very different from current database management systems.

    SUBPROJECTS:
    "SciHadoop" (Joe Buck)
    "FLAMBES: Scalable Simulation of Parallel File Systems" (Adam Crume)
    "HDF5-FS" (Latchesar Ionkov)
    "Redundant Indexing" (Jeff LeFevre)
    "Parallel Querying of Scientific Data" (Noah Watkins)
    "QMDS:Queriable File System Metadata Services" (Sasha Ames, Funded by LLNL)

  • Title: Synthetic Cognition through Petascale Models of the Primate Visual Cortex
    Institute: ISTI ISAMI
    Description:

    We seek to understand and implement the computational principles that enable high-level sensory processing and other forms of cognition in the human brain. To achieve these goals, we are creating synthetic cognition systems that emulate the functional architecture of the primate visual cortex. By using petascale computational resources, combined with our growing knowledge of the structure and function of biological neural systems, we can match, for the first time, the size and functional complexity necessary to reproduce the information processing capabilities of cortical circuits. The arrival of next generation supercomputers may allow us to close the performance gap between state of the art computer vision approaches by bringing these systems to the scale of the human brain.

    There are approximately 10 billion neurons in the human visual cortex, with each neuron receiving approximately 10000 synaptic inputs and each synapse requiring approximately 10 floating point operations per second (FLOPS), based on an average firing rate of 1Hz. Thus, it follows that synthetic visual cognition will require on the order of 10G (neurons) x 10K (synapses) x 10 FLOPS/synapse= 1 Petaflop, commensurate with next generation supercomputers performance, such as Roadrunner at LANL.

  • Title: Compute and Data-Intensive Computing: Bridging the Gap
    Institute: ISTI IRHPIT
    Description:

    Direct storage of simulation output into data intensive subsystems

Affiliate

  • Title: Searching Petascale File Systems
    Institute: ISTI ISSDM
    Description:

    This project is an LLNL/UCSC collaboration: the goal is to design a scalable
    metadata‐rich file system with database‐like data management services. With such a
    file system scientist will be able to perform time‐critical analysis over continually
    evolving, very large data sets.

    In the first phase we designed and implemented QUASAR, a path‐based query
    language using the POSIX IO data model extended by relational links. We conducted
    a couple of data mining case studies where we compared the baseline architecture
    consisting of a database and a file system with our MRFS prototype. The QUASAR
    interface via its query language provides much easier access to large data sets than
    POSIX IO. MRFS' querying performance is significantly better than the baseline
    system due to QUASAR's hierarchical scoping.

    Challenges remain and we are in the process of addressing them: we are working on
    a scalable physical data model of QUASAR's logical data model, and we are designing
    a rich‐metadata client cache to address small update overheads and metadata
    coherence.

    The work so far was presented at SOSP'07 (demo & poster), PDSW'07 (poster),
    PDSW'08 (poster), and reported in two 2008 tech reports available at the
    UCSC/PDSI site.

  • Title: Distributed Performance Management
    Institute: ISTI ISSDM
    Description:

    Performance monitoring and management of distributed systems.

  • Title: Virtualizing Real-time Processing
    Institute: ISTI ISSDM
    Description:

    I am interested in CPU virtualization. I am working on enabling more flexibility and
    control of CPU scheduling by separating user space and kernel space scheduling.

  • Title: Performance Mgmt of Data-intensive Computing
    Institute: ISTI ISSDM
    Description:

    "Much of today's IT infrastructure, including high‐performance systems, suffers
    from poorer and less predictable performance than necessary due to ineffective
    resource management. While processor performance is increasing at a rapid rate,
    increases in storage and memory performance are rather marginal, turning them
    into serious bottlenecks, particularly for data‐intensive applications. At the same
    time, memory and most storage subsystems operate in best‐effort mode without
    even minimal performance guarantees.

    We show that better and more predictable performance can be achieved by
    considering system resource characteristics. Our work on disk scheduling shows
    how this unpredictable resource can be effectively managed, and guaranteed, by
    changing the metric by which it is managed. An analysis of memory performance
    reveals a somewhat similar behavior to disk I/O with orders of magnitude
    performance difference between best and worst case. Thus, we propose to manage
    memory performance the same way, scheduling data‐intensive applications based
    on their memory access pattern.

    While analyzing system resource characteristics, we found that memory
    performance scales with parallel accesses. This can be exploited to predictably
    increase memory performance. We chose graphics processors with hundreds of
    cores as an example of massively‐parallel architectures, and databases as an
    example of data‐intensive applications to put the concept to the test. Our p‐ary
    search algorithm successfully demonstrates how using parallel memory accesses
    can yield predictable performance increases of up to 200%.

  • Title: Multiprocessor Real-Time Scheduling
    Institute: ISTI ISSDM
    Description:
  • Title: Real-time Memory Management
    Institute: ISTI ISSDM
    Description:

    My research focuses on buffer‐‐cache management for predictable I/O performance
    in local and distributed storage systems. The objective is to enable applications with
    predictable performance requirements (such as hard, firm, and soft real‐time) to
    have direct access to a storage system, and/or share it with applications with other
    performance requirements. Our approach consists on virtualizing the performance
    buffer‐cache, providing the illusion of a dedicated buffer‐cache to each application.

  • Title: Research on Storage QoS
    Institute: ISTI ISSDM
    Description:

    I am working on resource management algorithms for providing guaranteed
    performance to mixed workloads executing concurrently in a shared storage
    system.

    Guaranteed I/O performance is needed for a variety of applications ranging from
    real‐time data collection to desktop multimedia to large‐scale scientific simulations.
    Reservations on throughput, the standard measure of disk performance, fail to
    effectively manage disk performance due to the orders of magnitude difference
    between best‐, average‐, and worst‐case response times, allowing reservation of less
    than 0.01% of the achievable bandwidth. Moreover, hard I/O performance
    guarantees for a mix of workloads are generally considered impractical due to the
    stateful nature of disk I/O and the interference between workloads.

    Our research provides I/O performance guarantees for a mix of workloads with
    different performance and timeliness requirements without sacrificing
    performance. This is achieved via a novel disk I/O scheduler, Fahrrad, that makes
    hard performance guarantees in terms of disk time utilization. Building upon this
    base scheduler, our work addresses several key questions in real‐time disk I/O: how
    to provide isolation between concurrently executing request streams, the tradeoff
    between the tightness of the guarantees and the overall performance of the system,
    and how to use Fahrrad to make throughput and timeliness guarantees.

  • Title: Scalable Data Management
    Institute: ISTI ISSDM
    Description:
  • Title: Measuring Wikipedia
    Institute: ISTI ISSDM
    Description:

    Google funded project on information trust

  • Title: Dynamics of the Hog1 Signaling Pathway thorugh Stochastic Modeling and Single Molecule mRNA Counting
    Institute: ISTI ISAMI
    Description:
  • Title: PLFS (Parallel Log-structured File System)
    Institute: ISTI ISSDM
    Description:

    He is working on developing and evaluating algorithms by which patterns in the PLFS (Parallel Log-structured File System) metadata can be discovered and then used to replace the current metadata, in order to reduce metadata size.

  • Title: Parallel I/O interfaces for OpenMP
    Institute: ISTI ISSDM
    Description:
  • Title: Features and Optimizations for LANL's PLFS
    Institute: ISTI ISSDM
    Description:
  • Title: PRObE – The NSF Parallel Reconfigurable Observational Environment for Data Intensive Super-Computing and High Performance Computing (NSF unsolicited)
    Institute: ISTI IRHPIT
    Description:
  • Title: The Cost of Security: Auditing Focus
    Institute: ISTI CSCNSI
    Description:
  • Title: The Cost of Security: Firewall Focus
    Institute: ISTI CSCNSI
    Description:
  • Title: Cluster Virtualization
    Institute: ISTI CSCNSI
    Description:
  • Title: Tape Storage Performance
    Institute: ISTI CSCNSI
    Description:
  • Title: Performance Studies of Out-of-Core Computing on Various Solid State Storage Devices – Fusion-IO’s IOSan SSD, SuperTalent’s RaidDrive SSD, OCZ Vertex 3, Kingston V+
    Institute: ISTI CSCNSI
    Description:
  • Title: Performance Studies of Scientific Applications on High Performance Computing Cluster with Big- Memory Footprint
    Institute: ISTI CSCNSI
    Description:
  • Title: Building a Parallel Cloud Storage System Using OpenStack’s Swift Object Store
    Institute: ISTI CSCNSI
    Description:
  • Title: GlusterFS: One Storage Server to Rule Them All
    Institute: ISTI CSCNSI
    Description:
  • Title: Scalable Node Monitoring
    Institute: ISTI CSCNSI
    Description:
  • Title: iSSH vs. Auditd: Intrusion Detection in High Performance Computing
    Institute: ISTI CSCNSI
    Description:
  • Past

  • Title: Alternative Reliability Models in Ceph
    Institute: ISTI ISSDM
    Description:

    RAID systems have traditionally offered increased performance and data security in small storage systems. An opportunity now exists to extend traditional RAID principles into the area of large-scale object-based storage devices in order to offer greater data security and space efficiency. In a system where component failures can be expected on a daily basis, the importance of redundancy mechanisms is obvious, and RAID principles offer an appropriate model. Ceph is an excellent platform to test these RAID systems and to learn how they function in a new environment.

  • Title: COLT: Continuous On-Line Tuning of Databases (Self-Tuning Database Systems)
    Institute: ISTI ISSDM
    Description:

    Tuning is one of the most challenging and important tasks in setting up a database system. A typical assumption is that a representative workload can provide the means to perform some initial tuning automatically, which can then be refined manually by the database administrator. However, several emerging application domains challenge this assumption. A characteristic example is scientific data
    management, where the use of ad-hoc data analytics cancels the ability to gather a representative workload, and the lack of database expertise among scientists lowers the possibility for efficient manual tuning. This motivates us to look into autonomic tuning techniques that operate continuously and require minimal intervention from an administrator. In particular, we examine online methods to tune the physical database design, i.e., the set of materialized data structures, such as indices and materialized views, that are crucial for efficient query processing.

  • Title: Dynamic Load-Balancing in Petabyte-scale File Systems that use Pseudo-Random Data
    Institute: ISTI ISSDM
    Description:

    Pseudorandom placement in distributed storage systems offers scalability benefits. Pseudorandom placement makes load balancing harder; new techniques are required. We explore different load balancing techniques using Ceph, an object-based storage system developed at UCSC.

  • Title: Cosmic Calibration - Statistical Modeling of Dark Energy
    Institute: ISTI ISSDM
    Description:

    We propose to work on the problem of calibrating the parameters of computer code used for simulation of physical phenomena. We will explore statistical methods based on a Bayesian approach implemented with Sampling Importance Resampling (SIR).

  • Title: Erasure Codes for Reliability and Recoverability
    Institute: ISTI ISSDM
    Description:

    Large data centers are typically composed of many hard drives arranged in a striped layout with redundancy, to provide adequate performance and reliability. Solid state disk drives (SSDs) are a newer technology that has better performance and lower power requirements compared to hard drives. This research examines the use of solid state disk drives in disk-SSD hybrid storage systems.

  • Title: Boosting for Image Recognition (Noise Robust Boosting and Combining Lists)
    Institute: ISTI ISSDM
    Description:

    The ISIS team in ISR-2 primarily uses supervised learning techniques to solve classification problems in imagery and therefore has a strong interest in finding linear classification algorithms that are both robust and efficient. Boosting algorithms take a principled approach to finding linear classifiers and they has been shown to be so effective in practice that they are widely used in a variety of domains. In this proposal we present evidence that smoothing is not necessarily the optimal way

  • Title: Exploring Uncertainty Visualization in Large Data Sets (Finding Multi-streaming Events in Cosmological Simulations)
    Institute: ISTI ISSDM
    Description:

    Definition: Mutivalue data are where you have multiple values about the same variable in repeated measurements.

    Challenge: Lack of current comprehensive visualization tools for the probabilistic or uncertain nature of the multivalue data.

    Research: Investigate probabilistic streamlines using different ways of combining vector distributions.

    The size of the data sets and the uncertainty in the data sets come from the fact that we are dealing with ensemble data sets. These are usually from Monte Carlo simulations where each output (out of many runs) represents a possible solution. The degree of agreement (or disagreement) provides some indications of certainty (or uncertainty) about the results. Because Monte Carlo simulations can potentially involve large number of repetitions, the total data size can very quickly get very large. This project will explore uncertainty and how visualization can be used as a tool to help deal with it.

  • Title: Sensing Human Shape and Motion
    Institute: ISTI ISSDM
    Description:

    The research objective of this proposal is to measure human body shape and motion without augmenting the subject. The hypothesis is that replacing traditional cameras with high accuracy 3D shape measurement devices and utilizing a carefully constructed prior model of human surface shape are the critical factors that have been missing from prior attempts to meet this goal. The long term accuracy targets are shape to 1mm and motion to 1deg.

  • Title: Information Trust (Bringing Trust to Collaborative Content)
    Institute: ISTI ISSDM
    Description:

    To help users make sense of collaboratively-generated information, we are developing algorithmic notions of information trust

  • Title: Interactive Search and Browsing (Searching Semantic Web)
    Institute: ISTI ISSDM
    Description:

    We focus on developing intelligent interactive search and browsing techniques to help users and the information they are looking for from billions of non-relevant files

  • Title: Search-Oriented Interfaces for Distributed Metadata Indices
    Institute: ISTI ISSDM
    Description:

    Data users need effective means to locate their files in huge and increasing scaled out file systems containing millions or even billions of files. Use of simple directories is not sufficient for managing this number of files. Users already have search tools for file systems but they are based on directory structure oriented searches. We propose to explore how indexing of metadata about files could provide users more utility for searching for files in extremely large parallel file systems. We will concentrate on the interfaces users would use to search indexed metadata about files.

  • Title: Synthetic Parallel Applications (Storage Systems)
    Institute: ISTI ISSDM
    Description:

    Storage system design and optimization require usage data in order to determine where in the system to focus resources. Typically this usage data is created by observing the response of the system while running synthetic benchmarks designed to place high stress on particular aspects of the system. The problem with many benchmarks is that they encourage improvements to areas of the system that do not improve actual user performance on typical workloads. Ideally, the benchmarks should be similar to actual applications so improvements can much more easily be made to storage systems. User applications often are proprietary or secure so using user applications for benchmarks is often not practical. This project will explore how synthetic benchmark applications can be synthetically generated using traces of real parallel user applications.

  • Title: Providing Quality of Service Support in Object-Based File Systems
    Institute: ISTI ISSDM
    Description:

    In large scale storage and file systems, multiple parallel applications can be asking for service simultaneously. In parallel workloads it is vital that deterministic service be given as the application only proceeds as fast as the slowest process. With mixed workloads this is very difficult to do in a parallel setting. We will explore how QoS support could be added to portions of the I/O stack in object based parallel file systems to enable determinism for multiple parallel applications in a mixed parallel workload environment.

  • Title: Scalable Security for Petascale, High Performance Storage (Organizing, Indexing, and Searching Large-Scale File Systems)
    Institute: ISTI ISSDM
    Description:

    Storage systems are getting bigger with more high performance applications needed to access data.Data is becoming more and more confidential over time and currently security in scalable storage systems is not scalable.Systems with tens of thousands of nodes will be accessing highly distributed data using demanding I/O patterns and will need to do so securely.We intend to explore scalable security through capabilities, delegations, replication in the file system to fundamentally address scaling of security in file systems.

  • Title: Concurrent Programming Models for the Cell Broadband Engine
    Institute: ISTI ISSDM
    Description:

    The Cell is powerful, but challenging to program. Concurrent Programming Models for the Cell Broadband Engine may help. Everything old is new again, Cilk can provide a smooth transition of older C codes to the Cell.

  • Title: Large-Scale P2P Document Management
    Institute: ISTI ISSDM
    Description:

    One of the major challenges facing a globally distributed Enterprise is storing and retrieving documents. This task requires features from distributed storage systems, databases and the Internet. However, because of the unique nature of the documents to be stored and the access pattern of the documents, none of these alone suffice to provide truly reliable and searchable distributed document management. We present Ringer – a overlay network which combines elements of peerto-peer storage, distributed query processing and indexing to create an Internet scale content distribution and retrieval network.

  • Title: Multi-Dimensional Metadata and Massive Directories in Parallel File Systems
    Institute: ISTI IRHPIT
    Description:

    Large scale storage systems often store massive data sets that are highly dimensioned and massive numbers of files in directory structures. This project explores solutions to dealing with indexing and managing these data sets.

  • Title: Large Scale Parallel Tools: OpenSpeedShop Frameworks Plugins
    Institute: ISTI IRHPIT
    Description:

    Computer scientists at LANL have become intimately involved in studying the software development challenges posed by the new Roadrunner cell hybrid architecture, as well as possible approaches to addressing these. One of their findings is that the core structure of existing applications has to be revisited and understood before they can be effectively transitioned to a new architecture. Understanding this structure requires analysis tools to assess time, IO, memory usage, and other operational characteristics of the application so that its data, functions and memory patterns can be mapped to appropriate segments of the new processor architecture.


    The purpose of this project is to study the OpenSpeedShop tool framework and identify a set of constraints it imposes on tool plugins, drawing on experience from OpenSpeedShop developers at LANL.  The goal is to understand the rationale behind these constraints in terms of framework quality attributes like performance and extensibility, and to explore potential design alternatives and their tradeoffs with respect to those quality attributes.

  • Title: Network Quality of Service
    Institute: ISTI ISSDM
    Description:

    In large tightly coupled parallel systems, computation goes as fast as the slowest part.  For this reason it is necessary to pursue deterministic behavior of all parts of the system.  Quality of Service is one way to assist in providing deterministic behavior.  This project will explore providing Quality of Service on networks of interest to high performance computing.

  • Title: Web Based Pathogen Information System
    Institute: ISTI ISSDM
    Description:
    We are planning to develop a web-based system to help 'decision makers'
    quickly identify and process relevant web-based information in case of a
    disease outbreak. We will work on identifying the pathogen based on sequence
    information. We will also develop an adaptive information filtering to find,
    filter and condense the information available on the web.
  • Title: Model Reduction and Robustness Analysis for Computationally Efficient, High Fidelity Inference in Complex Physical Processes
    Institute: ISTI ISAMI
    Description:

    Modern tools of semi-definite programming (in particular, the public domain SeDuMe program) can be used successfully in black-box identification of significantly nonlinear circuits, providing performance superior to the existing methods. A Polynomial Optimization Toolbox, was successfully designed and implemented to enable efficient realization of nonlinear system optimization algorithms.  An analytical study of highly structured dynamic programming tasks associated with the design of reduced complexity digital signal processing systems was performed.

  • Title: Optimal Experimental Design for Penetrating Imaging
    Institute: ISTI ISAMI
    Description:
  • Title: Advances and Defenses: Ensembles of Weak Object Detectors
    Institute: ISTI ISSDM
    Description:

    Object detection is a pervasive area of research in Computer Vision. Its applications are vast and include medical pathology (detection of malignant cells in histology slides), brain imaging (identification of structural signs of brain disease), detection of objects in remotely sensed imagery (traffic analysis and target detection), face detection for digital camera technology, and pedestrian tracking for surveillance. A broad spectrum of object detection approaches have been proposed in the literature, often using the popular boosting algorithm, AdaBoost. I propose a new object detection system that employs a new learning framework for combining ensembles of weak detectors into a single detector.

    Prior work:

    • Grammar-guided Feature Extraction from Signals and Images
      Image and signal sensors collect data at increasingly greater rates, making it a great challenge to search and organize data in archives. Work is underway to engage the sensor, data management, and machine learning expertise of LANL and UCSC to tackle adaptive, content-based search in large remote sensing archives. We demonstrate the utility of a new method for extracting features from imagery and signals to aid the archival search problem.
    • CASCC: A Fast Algorithm for Time Series Classification
      The continual, real-time collection of time series signals makes manual analysis impractical demanding faster and more accurate analysis algorithms.  We are developing algorithms for rapid clustering and classification of time series data sets.  CASCC is a new algorithm for classifying time series. It is highly competitive in terms of speed and accuracy compared to many other algorithms. It is inspired by another leading alghorthm DTW-1NN howerver does not suffer the same computational limitations when applying the model to new time series.
  • Title: TCP/IP Networking Incast
    Institute: ISTI IRHPIT
    Description:
  • Title: Wireless Physical Layer Security, From Theory to Practice II
    Institute: ISTI IRWIN
    Description:

    - Development of methods to exploit the wireless medium to enhance the security of wireless networks.

    - Development of experimentally validated security protocols to enhance the secrecy capacity of wireless networks by utilizing novel noise forwarding strategies to inject jamming signals to impair eavesdroppers.

  • Title: Understanding Multi-scale Multi-streaming Events in Cosmological Simulations (How do multi streaming regions form and evolve?)
    Institute: ISTI ISSDM
    Description:

    The science goal of this project is to seek a better understanding of the multi-streaming phenomenon in the universe. The specific aim is to develop tools that will analyze, extract and visualize multi-streaming events from cosmological simulations. Both technical and scientific challenges must be addressed for a successful outcome of the proposed work, including the ability to handle very large data sets and to perform comparative studies on different formulations of the rather imprecise definition of what constitutes multi-streaming.

    Multi-streaming is said to happen when at some (Eulerian) location, particles arrive from different (La- grangian) locations and at different velocities. Places where multi-streaming happen are singularities in coordinate space, and are also referred to as caustics. This behavior is analogous to how light rays are reflected and/or refracted in a swimming pool to produce a brighter intensity where the light rays are convergent. In cosmological structure formation, cold flows in dark matter merge and interact in multi-streaming regions. Hence, the identification and characterization of these regions are of great importance in extending our understanding of cosmological evolution. 

  • Title: Many-core Accelerated User-programmable Rendering
    Institute: ISTI IRWIN
    Description:

    This project addresses the challenge of analyzing and visualizing extremely large data sets.  The approach taken is to accelerate data analysis and visualization computations using "on-the-fly" compiled code for today's high-performance processors.  We will concentrate on execution of the code on large-scale, distributed memory, parallel computer systems.  In addition to the complexities of supporting these larger systems this work requires the compositing of imagery generated by the distributed rendering tasks.

  • Title: Wireless Physical Layer Security, From Theory to Practice I
    Institute: ISTI IRWIN
    Description:

    - Exploit a physical primitive for security with and without shared secrets.

    - Guarantee confidentiality by allowing receiver to perform synchronized jamming at unilaterally, privately chosen times.

  • Title: Machine Learning For Image and Time Series Problems
    Institute: ISTI ISSDM
    Description:

          This project involves the development of new machine learning techniques for object detection and time series analysis problems. The main focus is on the rapid detection and identification of objects in large streams of images.  Another focus involves the classification of time series. 

         Supervised machine learning techniques for object detection take a training set of images, where the objects of interest have been identified, and create prediction method for finding similar objects in new images. These techniques are often pixel-based, requiring pixel-level annotations of the training set. Window-based methods create an object detector that focuses on a small window, and then this detector is slid over every location in the image. In contrast, the methods being developed work directly on the image at the object level and require simpler annotation. They involve new feature generation techniques as well as well as modifications to the standard boosting-based learning techniques. Another branch of this project involves clustering time series. Dynamic Time Warping (DTW) is a technique for evaluating the distance between two time series that is widely used in speech recognition because it elegantly handles differences in tempo. Although nearest neighbor classification techniques with DTW distances are effective in other time series domains, the computational cost of a DTW distance computation can make determining the nearest neighbor to a new time series prohibitively expensive. Our approach creates an approximation to nearest neighbor while requiring many fewer DTW computations.

  • Title: Scalable Simulation of Parallel File Systems
    Institute: ISTI ISSDM
    Description:
  • Title: Self-Tuning Database Systems
    Institute: ISTI ISSDM
    Description:
  • Title: Worst-Case Optimal Analog to Digital Converters
    Institute: ISTI ISAMI
    Description:

    The goal of this project is to optimize quantification decision making in hybrid systems with application to Analog to Digital Converters (ADCs).

  • Title: Human Computation
    Institute: ISTI ISSDM
    Description:

    Humans can be used as a processor for certain tasks, in the same way that CPUs and GPUs are now used. Rather than thinking of humans as the primary director of computation, with computers as their subordinate tools, we explicitly advocate treating these computational platforms equally and characterizing the performance of Human Processing Units (HPUs). We expect to find that on some tasks HPUs outperform CPUs, and on some tasks the reverse is true, as cartooned in figure 1. The most efficient systems are likely to be joint systems that make use of each type of processor on the subroutines for which it is most appropriate. Importantly, we believe that HPUs are likely to have a long term impact on the way real‐world applications that require “computer vision” are developed.

    The spread of networked computing has given rise to HPUs as a viable computational platform. Tools like Amazon Mechanical Turk allow small jobs to be micro‐outsourced to humans for completion. The intermediation that the network provides allows the requester to abstract the complexities of who is accomplishing the task, and from where. Nearly all current use of micro‐outsourcing is similar to the traditional way we might use an employed assistant in our office, to outsource from human to human. This proposal explicitly suggests we should quantify performance, treat this as a new computational platform, and build real systems which make use of HPU co‐processors for certain tasks which are too computationally expensive, or insufficiently robust when computed on CPUs.

  • Title: Foundations of Access Control for Secure Storage
    Institute: ISTI ISSDM
    Description:
  • Title: Stealing a BotnetStealing a Botnet
    Institute: ISTI ISSDM
    Description:
  • Title: ISSDM Cluster Management
    Institute: ISTI ISSDM
    Description:
  • Title: Information-Driven Cooperative Sampling Strategies for Spatial Estimation by Robotic Sensor Networks
    Institute: ISTI ISSDM
    Description:
  • Title: Reliability and Power-Efficiency in Erasure-Coded Storage Systems
    Institute: ISTI ISSDM
    Description:
  • Title: Predictable High Performance Data Management - Leveraging System Resource Characteristics to Efficiently Improve Performance and Predictability
    Institute: ISTI ISSDM
    Description:
  • Title: Faceted search
    Institute: ISTI ISSDM
    Description:
  • Title: MS Project: A Characterization of LANL HPC Systems
    Institute: ISTI ISSDM
    Description:
  • Title: Managing the Soft Real-Time Processes in RBED
    Institute: ISTI ISSDM
    Description:
  • Title: End to End Performance Management in Storage
    Institute: ISTI ISSDM
    Description:
  • Title: Secure, Energy-Efficient, Evolvable, Long-Term Archival Storage
    Institute: ISTI ISSDM
    Description:
  • Title:
    Institute: ISTI ISSDM
    Description:
  • Title: Program Inconsistency Detection: Universal Reachability Analysis and Conditional Slicing
    Institute: ISTI ISSDM
    Description:
  • Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA
    Inside | © Copyright 2008-09 Los Alamos National Security, LLC All rights reserved | Disclaimer/Privacy | Web Contact