
Nicole Targowski
UCSC Student

Herbie Lee
UCSC Instructor

Bruno Sanso
UCSC Instructor

David Higdon
Mentor

Object detection is a pervasive area of research in Computer Vision. Its applications are vast and include medical pathology (detection of malignant cells in histology slides), brain imaging (identification of structural signs of brain disease), detection of objects in remotely sensed imagery (traffic analysis and target detection), face detection for digital camera technology, and pedestrian tracking for surveillance. A broad spectrum of object detection approaches have been proposed in the literature, often using the popular boosting algorithm, AdaBoost. I propose a new object detection system that employs a new learning framework for combining ensembles of weak detectors into a single detector.
Prior work:

Damian Eads
UCSC Student

Ethan Miller
UCSC Instructor

Hai Tao
UCSC Instructor

David Helmbold
UCSC Instructor

James Theiler
Mentor

Simon Perkins
Mentor

Edward Rosten
Mentor

Andrew Fraser
Mentor

RAID systems have traditionally offered increased performance and data security in small storage systems. An opportunity now exists to extend traditional RAID principles into the area of large-scale object-based storage devices in order to offer greater data security and space efficiency. In a system where component failures can be expected on a daily basis, the importance of redundancy mechanisms is obvious, and RAID principles offer an appropriate model. Ceph is an excellent platform to test these RAID systems and to learn how they function in a new environment.

David Bigelow
UCSC Student

Scott Brandt
UCSC Instructor
Gary Grider
Mentor

James Nunez
Mentor

John Bent
Mentor

HB Chen
Mentor

The ISIS team in ISR-2 primarily uses supervised learning techniques to solve classification problems in imagery and therefore has a strong interest in finding linear classification algorithms that are both robust and efficient. Boosting algorithms take a principled approach to finding linear classifiers and they has been shown to be so effective in practice that they are widely used in a variety of domains. In this proposal we present evidence that smoothing is not necessarily the optimal way

Karen Glocer
UCSC Student

Manfred Warmuth
UCSC Instructor

James Theiler
Mentor

Simon Perkins
Mentor

Tuning is one of the most challenging and important tasks in setting up a database system. A typical assumption is that a representative workload can provide the means to perform some initial tuning automatically, which can then be refined manually by the database administrator. However, several emerging application domains challenge this assumption. A characteristic example is scientific data
management, where the use of ad-hoc data analytics cancels the ability to gather a representative workload, and the lack of database expertise among scientists lowers the possibility for efficient manual tuning. This motivates us to look into autonomic tuning techniques that operate continuously and require minimal intervention from an administrator. In particular, we examine online methods to tune the physical database design, i.e., the set of materialized data structures, such as indices and materialized views, that are crucial for efficient query processing.

Karl Schnaitter
UCSC Student

Neoklis Polyzotis
UCSC Instructor
Gary Grider
Mentor

James Nunez
Mentor

John Bent
Mentor
The Cell is powerful, but challenging to program. Concurrent Programming Models for the Cell Broadband Engine may help. Everything old is new again, Cilk can provide a smooth transition of older C codes to the Cell.
Andrew Shewmaker
UCSC Student

Scott Brandt
UCSC Instructor
Gary Grider
Mentor

We propose to work on the problem of calibrating the parameters of computer code used for simulation of physical phenomena. We will explore statistical methods based on a Bayesian approach implemented with Sampling Importance Resampling (SIR).

Tracy Holsclaw
UCSC Student

Herbie Lee
UCSC Instructor

Bruno Sanso
UCSC Instructor

David Higdon
Mentor

Katrin Heitmann
Mentor

Salman Habib
Mentor

Ujjaini Alam
Mentor

Pseudorandom placement in distributed storage systems offers scalability benefits. Pseudorandom placement makes load balancing harder; new techniques are required. We explore different load balancing techniques using Ceph, an object-based storage system developed at UCSC.
Esteban Molina-Estolano
UCSC Student

Scott Brandt
UCSC Instructor

Carlos Maltzahn
UCSC Instructor
Gary Grider
Mentor

James Nunez
Mentor

John Bent
Mentor

Alfred Torrez
Mentor

Meghan McClelland
Mentor

Anna Povzner
UCSC Student

Roberto Pineiro
UCSC Student

Scott Brandt
UCSC Instructor
Gary Grider
Mentor

James Nunez
Mentor

John Bent
Mentor

Large data centers are typically composed of many hard drives arranged in a striped layout with redundancy, to provide adequate performance and reliability. Solid state disk drives (SSDs) are a newer technology that has better performance and lower power requirements compared to hard drives. This research examines the use of solid state disk drives in disk-SSD hybrid storage systems.

Rosie Wacha
UCSC Student

Scott Brandt
UCSC Instructor
Gary Grider
Mentor

James Nunez
Mentor

John Bent
Mentor

Definition: Mutivalue data are where you have multiple values about the same variable in repeated measurements.
Challenge: Lack of current comprehensive visualization tools for the probabilistic or uncertain nature of the multivalue data.
Research: Investigate probabilistic streamlines using different ways of combining vector distributions.
The size of the data sets and the uncertainty in the data sets come from the fact that we are dealing with ensemble data sets. These are usually from Monte Carlo simulations where each output (out of many runs) represents a possible solution. The degree of agreement (or disagreement) provides some indications of certainty (or uncertainty) about the results. Because Monte Carlo simulations can potentially involve large number of repetitions, the total data size can very quickly get very large. This project will explore uncertainty and how visualization can be used as a tool to help deal with it.

Eddy Chandra
UCSC Student

Alex Pang
UCSC Instructor

Katrin Heitmann
Mentor

James Ahrens
Mentor

Jonathan Koren
UCSC Student

Yi Zhang
UCSC Instructor

Shelly Spearing
Mentor

Jorge Roman
Mentor

Avik Chaudhuri
UCSC Student

Martin Abadi
UCSC Instructor

James Nunez
Mentor
Humans can be used as a processor for certain tasks, in the same way that CPUs and GPUs are now used. Rather than thinking of humans as the primary director of computation, with computers as their subordinate tools, we explicitly advocate treating these computational platforms equally and characterizing the performance of Human Processing Units (HPUs). We expect to find that on some tasks HPUs outperform CPUs, and on some tasks the reverse is true, as cartooned in figure 1. The most efficient systems are likely to be joint systems that make use of each type of processor on the subroutines for which it is most appropriate. Importantly, we believe that HPUs are likely to have a long term impact on the way real‐world applications that require “computer vision” are developed.
The spread of networked computing has given rise to HPUs as a viable computational platform. Tools like Amazon Mechanical Turk allow small jobs to be micro‐outsourced to humans for completion. The intermediation that the network provides allows the requester to abstract the complexities of who is accomplishing the task, and from where. Nearly all current use of micro‐outsourcing is similar to the traditional way we might use an employed assistant in our office, to outsource from human to human. This proposal explicitly suggests we should quantify performance, treat this as a new computational platform, and build real systems which make use of HPU co‐processors for certain tasks which are too computationally expensive, or insufficiently robust when computed on CPUs.

Mario Rodriguez
UCSC Student

James Davis
UCSC Instructor

Reid Porter
Mentor

To help users make sense of collaboratively-generated information, we are developing algorithmic notions of information trust

Ian Pye
UCSC Student

Bo Adler
UCSC Student

Luca de Alfaro
UCSC Instructor

Shelly Spearing
Mentor

Jorge Roman
Mentor

Rishi Graham
UCSC Student
Jorge Cortes
UCSC Instructor

David Higdon
Mentor

Katrin Heitmann
Mentor

Salman Habib
Mentor

Ujjaini Alam
Mentor
We focus on developing intelligent interactive search and browsing techniques to help users and the information they are looking for from billions of non-relevant files

Jessica Gronski
UCSC Student

Yi Zhang
UCSC Instructor

Herbert Van de Sompel
Mentor

Michael Elliott
UCSC Student

Scott Brandt
UCSC Instructor

Carlos Maltzahn
UCSC Instructor
Gary Grider
Mentor
One of the major challenges facing a globally distributed Enterprise is storing and retrieving documents. This task requires features from distributed storage systems, databases and the Internet. However, because of the unique nature of the documents to be stored and the access pattern of the documents, none of these alone suffice to provide truly reliable and searchable distributed document management. We present Ringer – a overlay network which combines elements of peerto-peer storage, distributed query processing and indexing to create an Internet scale content distribution and retrieval network.

Ian Pye
UCSC Student

Carlos Maltzahn
UCSC Instructor

Linn Collins
Mentor
This project involves the development of new machine learning techniques for object detection and time series analysis problems. The main focus is on the rapid detection and identification of objects in large streams of images. Another focus involves the classification of time series.
Supervised machine learning techniques for object detection take a training set of images, where the objects of interest have been identified, and create prediction method for finding similar objects in new images. These techniques are often pixel-based, requiring pixel-level annotations of the training set. Window-based methods create an object detector that focuses on a small window, and then this detector is slid over every location in the image. In contrast, the methods being developed work directly on the image at the object level and require simpler annotation. They involve new feature generation techniques as well as well as modifications to the standard boosting-based learning techniques. Another branch of this project involves clustering time series. Dynamic Time Warping (DTW) is a technique for evaluating the distance between two time series that is widely used in speech recognition because it elegantly handles differences in tempo. Although nearest neighbor classification techniques with DTW distances are effective in other time series domains, the computational cost of a DTW distance computation can make determining the nearest neighbor to a new time series prohibitively expensive. Our approach creates an approximation to nearest neighbor while requiring many fewer DTW computations.

Damian Eads
UCSC Student

David Helmbold
UCSC Instructor

James Theiler
Mentor

Caixue Lin
UCSC Student

Scott Brandt
UCSC Instructor
Gary Grider
Mentor

James Nunez
Mentor

John Bent
Mentor

Eric Lalonde
UCSC Student

Scott Brandt
UCSC Instructor
Gary Grider
Mentor

In large tightly coupled parallel systems, computation goes as fast as the slowest part. For this reason it is necessary to pursue deterministic behavior of all parts of the system. Quality of Service is one way to assist in providing deterministic behavior. This project will explore providing Quality of Service on networks of interest to high performance computing.
Andrew Shewmaker
UCSC Student

Scott Brandt
UCSC Instructor

Josip Loncaric
Mentor

Nathan Debardeleben
Mentor

Tim Kaldewey
UCSC Student

Scott Brandt
UCSC Instructor
Gary Grider
Mentor

James Nunez
Mentor

Aaron Tomb
UCSC Student

Cormac Flanagan
UCSC Instructor

Steve Painter
Mentor
In large scale storage and file systems, multiple parallel applications can be asking for service simultaneously. In parallel workloads it is vital that deterministic service be given as the application only proceeds as fast as the slowest process. With mixed workloads this is very difficult to do in a parallel setting. We will explore how QoS support could be added to portions of the I/O stack in object based parallel file systems to enable determinism for multiple parallel applications in a mixed parallel workload environment.

Joel Wu
UCSC Student

Scott Brandt
UCSC Instructor
Gary Grider
Mentor

James Nunez
Mentor

John Bent
Mentor

Kevin Greenan
UCSC Student

Ethan Miller
UCSC Instructor

James Nunez
Mentor

John Bent
Mentor
Storage systems are getting bigger with more high performance applications needed to access data.Data is becoming more and more confidential over time and currently security in scalable storage systems is not scalable.Systems with tens of thousands of nodes will be accessing highly distributed data using demanding I/O patterns and will need to do so securely.We intend to explore scalable security through capabilities, delegations, replication in the file system to fundamentally address scaling of security in file systems.

Andrew Leung
UCSC Student

Ethan Miller
UCSC Instructor
Gary Grider
Mentor

James Nunez
Mentor

Esteban Molina-Estolano
UCSC Student

Carlos Maltzahn
UCSC Instructor
Gary Grider
Mentor

James Nunez
Mentor

John Bent
Mentor

HB Chen
Mentor

Meghan McClelland
Mentor
Data users need effective means to locate their files in huge and increasing scaled out file systems containing millions or even billions of files. Use of simple directories is not sufficient for managing this number of files. Users already have search tools for file systems but they are based on directory structure oriented searches. We propose to explore how indexing of metadata about files could provide users more utility for searching for files in extremely large parallel file systems. We will concentrate on the interfaces users would use to search indexed metadata about files.

Sasha Ames
UCSC Student

Ethan Miller
UCSC Instructor
Gary Grider
Mentor

James Nunez
Mentor

John Bent
Mentor

Mark Storer
UCSC Student

Ethan Miller
UCSC Instructor
Gary Grider
Mentor

James Nunez
Mentor

Cody J. Scott
Mentor

Neoklis Polyzotis
UCSC Instructor

The research objective of this proposal is to measure human body shape and motion without augmenting the subject. The hypothesis is that replacing traditional cameras with high accuracy 3D shape measurement devices and utilizing a carefully constructed prior model of human surface shape are the critical factors that have been missing from prior attempts to meet this goal. The long term accuracy targets are shape to 1mm and motion to 1deg.

Steve Scher
UCSC Student

James Davis
UCSC Instructor

Sriram Swaminarayan
Mentor

Tatiana Djidjeva
UCSC Student

Scott Brandt
UCSC Instructor

Stephan Eidenbenz
Mentor
Storage system design and optimization require usage data in order to determine where in the system to focus resources. Typically this usage data is created by observing the response of the system while running synthetic benchmarks designed to place high stress on particular aspects of the system. The problem with many benchmarks is that they encourage improvements to areas of the system that do not improve actual user performance on typical workloads. Ideally, the benchmarks should be similar to actual applications so improvements can much more easily be made to storage systems. User applications often are proprietary or secure so using user applications for benchmarks is often not practical. This project will explore how synthetic benchmark applications can be synthetically generated using traces of real parallel user applications.

Rosie Wacha
UCSC Student

Darrell Long
UCSC Instructor
Gary Grider
Mentor

James Nunez
Mentor

John Bent
Mentor

The science goal of this project is to seek a better understanding of the multi-streaming phenomenon in the universe. The specific aim is to develop tools that will analyze, extract and visualize multi-streaming events from cosmological simulations. Both technical and scientific challenges must be addressed for a successful outcome of the proposed work, including the ability to handle very large data sets and to perform comparative studies on different formulations of the rather imprecise definition of what constitutes multi-streaming.
Multi-streaming is said to happen when at some (Eulerian) location, particles arrive from different (La- grangian) locations and at different velocities. Places where multi-streaming happen are singularities in coordinate space, and are also referred to as caustics. This behavior is analogous to how light rays are reflected and/or refracted in a swimming pool to produce a brighter intensity where the light rays are convergent. In cosmological structure formation, cold flows in dark matter merge and interact in multi-streaming regions. Hence, the identification and characterization of these regions are of great importance in extending our understanding of cosmological evolution.

Uliana Popov
UCSC Student

Alex Pang
UCSC Instructor

Katrin Heitmann
Mentor

James Ahrens
Mentor

Salman Habib
Mentor

Jonathan Woodring
Mentor
We are planning to develop a web-based system to help 'decision makers' quickly identify and process relevant web-based information in case of a disease outbreak. We will work on identifying the pathogen based on sequence information. We will also develop an adaptive information filtering to find, filter and condense the information available on the web.

Yi Zhang
UCSC Instructor

Carla Kuiken
Mentor