- Grammar-guided Feature Extraction from Signals and Images
Image and signal sensors collect data at increasingly greater rates, making it a great challenge to search and organize data in archives. Work is underway to engage the sensor, data management, and machine learning expertise of LANL and UCSC to tackle adaptive, content-based search in large remote sensing archives. We demonstrate the utility of a new method for extracting features from imagery and signals to adi the archival search problem.
DocumentsFaculty:- Hai Tao UCSC
- David Helmbold UCSC
Students:Mentors:- James Theiler ISR-2
- Simon Perkins ISR-2
- Edward Rosten ISR-2
- CASCC: A Fast Algorithm for Time Series Classification
CASCC is a new algorithm for classifying time series. It is highly competitive in terms of speed and accuracy compared to many other algorithms. It is inspired by another leading alghorthm DTW-1NN howerver does not suffer the same computational limitations when applying the model to new time series.
DocumentsFaculty:Students:Mentors:- James Theiler ISR-2
- Simon Perkins ISR-2
- Andrew Fraser ISR-2
- Alternative Reliability Models in Ceph
RAID systems have traditionally offered increased performance and data security in small storage systems. An opportunitynow exists to extend traditional RAID principles into the area of large-scale object-based storage devices in order to offer greater data security and space efficiency. In a system where component failures can be expected on a daily basis, the importance of redundancy mechanisms is obvious, and RAID principles offer an appropriate model. Ceph is an excellent platform with whic
DocumentsFaculty:Students:Mentors:- Gary Grider HPC-DO
- James Nunez HPC-5
- John Bent HPC-5
- HB Chen HPC-5
- COLT: Continuous On-Line Tuning of Databases
We propose to do fundamental research in the development of self-organizing configure automatically its physical schema, in an on-line fashion.
DocumentsFaculty:Students:Mentors:- Gary Grider HPC-DO
- James Nunez HPC-5
- John Bent HPC-5
- Dynamic Load-Balancing in Petabyte-scale File Systems that use Pseudo-Random Data
Pseudorandom placement in distributed storage systems offers scalability benefits. Pseudorandom placement makes load balancing harder; new techniques are required. We explore different load balancing techniques using Ceph, an object-based storage system developed at UCSC.
DocumentsFaculty:Students:- Esteban Molina-Estolano UCSC
Mentors:- Gary Grider HPC-DO
- James Nunez HPC-5
- John Bent HPC-5
- Cosmic Calibration
We propose to work on the problem of calibrating the parameters of computer code used for simulation of physical phenomena. We will explore statistical methods based on a Bayesian approach implemented with Sampling Importance Resampling (SIR).
DocumentsFaculty:- Herbie Lee UCSC
- Bruno Sanso UCSC
Students:- Rishi Graham UCSC
- Tracy Holsclaw UCSC
Mentors:- David Higdon CCS-6
- Katrin Heitmann ISR-1
- Erasure Codes for Reliability and Recoverability
We propose to study erasure codes and the relationship between reliability and performance in storage systems. When a disk fails, clearly the reliability for the data that was stored on that disk is reduced until it is replaced.
DocumentsFaculty:Students:Mentors:- Gary Grider HPC-DO
- James Nunez HPC-5
- John Bent HPC-5
- Boosting for Image Recognition
The ISIS team in ISR-2 primarily uses supervised learning techniques to solve classification problems in imagery and therefore has a strong interest in finding linear classification algorithms that are both robust and efficient. Boosting algorithms take a principled approach to finding linear classifiers and they has been shown to be so effective in practice that they are widely used in a variety of domains. In this proposal we present evidence that smoothing is not necessarily the optimal way
DocumentsFaculty:Students:Mentors:- James Theiler ISR-2
- Simon Perkins ISR-2
- Exploring Uncetainty Visualization in Large Data Sets
The size of the data sets and the uncertainty in the data sets come from the fact that we are dealing with ensemble data sets. These are usually from monte carlo simulations where each output (out of many runs) represents a possible solution. The degree of agreement (or disagreement) provides some indications of certainty (or uncertainty) about the results. Because monte carlo simulations can potentially involve large number of repetitions, the total data size can very quickly get very large eve
DocumentsFaculty:Students:Mentors:- Katrin Heitmann ISR-1
- James Ahrens CCS-1
- Sensing Human Shape and Motion
The research objective of this proposal is to measure human body shape and motion without augmenting the subject. The hypothesis is that replacing traditional cameras with high accuracy 3D shape measurement devices and utilizing a carefully constructed prior model of human surface shape are the critical factors that have been missing from prior attempts to meet this goal. The long term accuracy targets are shape to 1mm and motion to 1deg.
DocumentsFaculty:Students:Mentors:- Sriram Swaminarayan CCS-2
- Information Trust
To help users make sense of collaboratively-generated information, we are developing algorithmic notions of information trust
DocumentsFaculty:Students:Mentors:- Shelly Spearing HPC-1
- Jorge Roman HPC-1
- Linn Collins STBPO-RL
- Interactive Search and Browsing
We focus on developing intelligent interactive search and browsing techniques to help users and the information they are looking for from billions of non-relevant files
DocumentsFaculty:Students:Mentors:- Herbert Van de Sompel STBPO-RL
- Network Quality of Service
Network Quality of Service
DocumentsFaculty:Students:Mentors: - Web Based Pathogen Information System
We are planning to develop a web-based system to help 'decision makers'
quickly identify and process relevant web-based information in case of a
disease outbreak. We will work on identifying the pathogen based on sequence
information. We will also develop an adaptive information filtering to find,
filter and condense the information available on the web.
Faculty:Mentors: