- University of California, Santa Cruz, Computer Science, Faculty Memberadd
Massively parallel file systems must provide high bandwidth file access to programs running on their machines. Most accomplish this goal by striping files across arrays of disks attached to a few specialized I/O nodes in the massively... more
Massively parallel file systems must provide high bandwidth file access to programs running on their machines. Most accomplish this goal by striping files across arrays of disks attached to a few specialized I/O nodes in the massively parallel processor (MPP). This arrangement requires programmers to give the file system many hints on how their data is to be laid out on disk if they want to achieve good performance. Additionally, the custom interface makes massively parallel file systems hard for programmers to use and difficult to seamlessly integrate into an environment with workstations and tertiary storage. The RAMA file system addresses these problems by providing a massively parallel file system that does not need user hints to provide good performance. RAMA takes advantage of the recent decrease in physical disk size by assuming that each processor in an MPP has one or more disks attached to it. Hashing is then used to pseudo-randomly distribute data to all of these disks, insuring high bandwidth regardless of access pattern. Since MPP programs often have many nodes accessing a single file in parallel, the file system must allow access to different parts of the file without relying on a particular node. In RAMA, a file request involves only two nodes — the node making the request and the node on whose disk the data is stored. Thus, RAMA scales well to hundreds of processors. Since RAMA needs no layout hints from applications, it fits well into systems where users cannot (or will not) provide such hints. Fortunately, this flexibility does not cause a large loss of performance. RAMA's simulated performance is within 10-15% of the optimum performance of a similarly-sized striped file system, and is a factor of 4 or more better than a striped file system with poorly laid out data.
Research Interests:
The supercomputer center at the National Center for Atmospheric Research (NCAR) migrates large numbers of files to and from its mass storage system (MSS) because there is insufficient space to store them on the Cray supercomputer's local... more
The supercomputer center at the National Center for Atmospheric Research (NCAR) migrates large numbers of files to and from its mass storage system (MSS) because there is insufficient space to store them on the Cray supercomputer's local disks. This paper presents an analysis of file migration data collected over two years. The analysis shows that requests to the MSS are periodic, with one day and one week periods. Read requests to the MSS account for the majority of the periodicity; as write requests are relatively constant over the course of a week. Additionally , reads show a far greater fluctuation than writes over a day and week since reads are driven by human users while writes are machine-driven.
Research Interests:
Users are storing ever-increasing amounts of information digitally, driven by many factors including government regulations and the public's desire to digitally record their personal histories. Unfortunately, many of the security... more
Users are storing ever-increasing amounts of information digitally, driven by many factors including government regulations and the public's desire to digitally record their personal histories. Unfortunately, many of the security mechanisms that modern systems rely upon, such as encryption, are poorly suited for storing data for indefinitely long periods of time—it is very difficult to manage keys and update cryptosystems to provide secrecy through encryption over periods of decades. Worse, an adversary who can compromise an archive need only wait for cryptanalysis techniques to catch up to the encryption algorithm used at the time of the compromise in order to obtain " secure " data. To address these concerns, we have developed POT-SHARDS, an archival storage system that provides long-term security for data with very long lifetimes without using encryption. Secrecy is achieved by using prov-ably secure secret splitting and spreading the resulting shares across separately-managed archives. Providing availability and data recovery in such a system can be difficult ; thus, we use a new technique, approximate pointers , in conjunction with secure distributed RAID techniques to provide availability and reliability across independent archives. To validate our design, we developed a prototype POTSHARDS implementation, which has demonstrated " normal " storage and retrieval of user data using indexes, the recovery of user data using only the pieces a user has stored across the archives and the reconstruction of an entire failed archive.
Research Interests:
Galois Field arithmetic forms the basis of Reed-Solomon and other erasure coding techniques to protect storage systems from failures. Most implementations of Galois Field arithmetic rely on multiplication tables or discrete logarithms to... more
Galois Field arithmetic forms the basis of Reed-Solomon and other erasure coding techniques to protect storage systems from failures. Most implementations of Galois Field arithmetic rely on multiplication tables or discrete logarithms to perform this operation. However, the advent of 128-bit instructions, such as Intel's Streaming SIMD Extensions, allows us to perform Galois Field arithmetic much faster. This short paper details how to leverage these instructions for various field sizes, and demonstrates the significant performance improvements on commodity microprocessors. The techniques that we describe are available as open source software.
Research Interests:
As the world moves to digital storage for archival purposes , there is an increasing demand for reliable, low-power, cost-effective, easy-to-maintain storage that can still provide adequate performance for information retrieval and... more
As the world moves to digital storage for archival purposes , there is an increasing demand for reliable, low-power, cost-effective, easy-to-maintain storage that can still provide adequate performance for information retrieval and auditing purposes. Unfortunately, no current archival system adequately fulfills all of these requirements. Tape-based archival systems suffer from poor random access performance, which prevents the use of inter-media redundancy techniques and auditing, and requires the preservation of legacy hardware. Many disk-based systems are ill-suited for long-term storage because their high energy demands and management requirements make them cost-ineffective for archival purposes. Our solution, Pergamum, is a distributed network of intelligent, disk-based, storage appliances that stores data reliably and energy-efficiently. While existing MAID systems keep disks idle to save energy, Perga-mum adds NVRAM at each node to store data signatures , metadata, and other small items, allowing deferred writes, metadata requests and inter-disk data verification to be performed while the disk is powered off. Perga-mum uses both intra-disk and inter-disk redundancy to guard against data loss, relying on hash tree-like structures of algebraic signatures to efficiently verify the cor-rectness of stored data. If failures occur, Pergamum uses staggered rebuild to reduce peak energy usage while rebuilding large redundancy stripes. We show that our approach is comparable in both startup and ongoing costs to other archival technologies and provides very high reliability. An evaluation of our implementation of Perga-mum shows that it provides adequate performance.
Research Interests:
We have developed a scheme to secure network-attached storage systems against many types of attacks. Our system uses strong cryptography to hide data from unauthorized users; someone gaining complete access to a disk cannot obtain any... more
We have developed a scheme to secure network-attached storage systems against many types of attacks. Our system uses strong cryptography to hide data from unauthorized users; someone gaining complete access to a disk cannot obtain any useful data from the system, and backups can be done without allowing the super-user access to cleartext. While insider denial-of-service attacks cannot be prevented (an insider can physically destroy the storage devices), our system detects attempts to forge data. The system was developed using a raw disk, and can be integrated into common file systems. All of this security can be achieved with little penalty to performance. Our experiments show that, using a relatively inexpensive commodity CPU attached to a disk, our system can store and retrieve data with virtually no penalty for random disk requests and only a 15–20% performance loss over raw transfer rates for sequential disk requests. With such a minor performance penalty, there is no longer any reason not to include strong encryption and authentication in network file systems.
Research Interests:
Research Interests: File System and MSS
Research Interests:
■ and modeling Multiple self-adaptive system challenges: ■ ■ composition and openness Goals, objectives, and trust: the human side of ■ ■ autonomics A working group was convened to study each problem. Each working group met in the... more
■ and modeling Multiple self-adaptive system challenges: ■ ■ composition and openness Goals, objectives, and trust: the human side of ■ ■ autonomics A working group was convened to study each problem. Each working group met in the afternoon and presented a report; these are briefly summarized next. single self-adaptive systems Single self-adaptive systems can now be built, but system- atic methods should be developed for building these systems. Systematic methods require good models for prediction, control, error detection/fault diagnosis, and optimization. Models must describe behavior at different time and detail scales, for different tasks (e.g., energy, error detection) and for different degrees of accuracy. Models can be self-learned or provided by expert human engineers. Models should describe both the system and its environment. Objectives need to be clearly defined for accountability, performance, and reliability of self-adaptive systems. multiple self-adaptive systems Multi...
Research Interests:
Research Interests:
In the past few years, the explosive growth of the Internet has allowed the construction of" virtual" systems containing hundreds or thousands of individual, relatively inexpensive computers. The agent paradigm is well-suited... more
In the past few years, the explosive growth of the Internet has allowed the construction of" virtual" systems containing hundreds or thousands of individual, relatively inexpensive computers. The agent paradigm is well-suited for this environment because it is based on distributed autonomous computation. Although the de nition of a software agent varies widely, some common features are present in most de nitions of agents. Agents should be autonomous, operating independently of their creator s. Agents should have the ability to ...
Research Interests:
Research Interests: Scalability and PRNG
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Managing storage in the face of relentless growth in the number and va- riety of files on storage systems creates demand for rich file s ystem meta- data as is made evident by the recent emergence of rich metadata support in many... more
Managing storage in the face of relentless growth in the number and va- riety of files on storage systems creates demand for rich file s ystem meta- data as is made evident by the recent emergence of rich metadata support in many applications as well as file systems. Yet, little suppor t exists for shar- ing metadata across file systems
Research Interests:
Research Interests:
Research Interests: Distributed Computing, Design process, Parallel & Distributed Computing, Reading and writing, High Speed, and 11 moreStorage system, Parallel and distributed Databases, Heterogeneous Computing, Perforation, Design Evaluation, University of Southern California, Input Output, Network Attached Storage, Raid, Crossbar, and Data Format
Network file system usage has grown significantly in re- cent years due to the desire to lower costs, improve storage utilization and ease administration by consolidating stor - age. In order to understand how future network file systems... more
Network file system usage has grown significantly in re- cent years due to the desire to lower costs, improve storage utilization and ease administration by consolidating stor - age. In order to understand how future network file systems should be designed, it is critical to have a detailed under- standing of they are currently used in practice. We conducted an