From: Andrew Klosterman <andrew5@ece.cmu.edu>
Date: Fri Aug 29, 2003  5:18:59 PM US/Eastern
To: sospss@wheaton.edu, jeffay@cs.unc.edu
Cc: Greg Ganger <ganger@ece.cmu.edu>
Subject: SOSP '03 Stipend Application and Poster Abstract


This is my SOSP '03 Stipend Application and Poster Abstract.

It includes a completed "SOSP Travel Scholarship Application," list of
anticipated travel expenses, research position paper, list of
publications, and description of the benefits I expect from attending
SOSP.

There is also a poster abstract at the end of this email, describing joint
work with John Strunk and Prof. Greg Ganger.

=========================================================================

SOSP Travel Scholarship Application

Name:     Andrew J. Klosterman
Address:  1233 Bellerock St., Pittsburgh, PA 15217
Email:    andrew5@ece.cmu.edu

ACM member number:
  (or date of application, if you have not received your number yet)
  August 27, 2003 (first attempt, online, no confirmation email)
  August 28, 2003 (second attempt, online, no confirmation email)

School:      Carnegie Mellon University
Department:  Electrical and Computer Engineering
Degree pursued:             PhD
When you expect to finish:  May 2005

Faculty advisor
  Name:      Greg Ganger
  E-mail:    ganger@ece.cmu.edu

Have you attended SOSP before?  Yes. (1999)
Have you previously received an SOSP scholarship?   No.

SOSP program participation
  Are you an author of an accepted paper?
    No.
    If so, are you the principal author?
    Will you be presenting it at the conference?
  Have you submitted a poster?
    Yes.
    (Draft of abstract included.)
  Is your paper/poster entered in the Student Research Competition?
    Yes.
    (I suppose so, I couldn't find any form to "enter" the
    competition.  As far as I can tell, I just send the poster
    abstract to jeffay@cs.unc.edu.)

Anticipated travel expenses

TOTAL: $879

AIRFARE: $389
  1 adult at US$ 389.00

1:10pm Depart Pittsburgh, PA (PIT)
2:22pm Arrive Albany, NY (ALB) 	Oct. 18
	US Airways flight 1190

3:15pm Depart Albany, NY (ALB)
4:33pm Arrive Pittsburgh, PA (PIT) 	Oct. 22
	US Airways flight 363

MEALS: $152
  Per Diem of $38/day (Warren County, NY)

GROUND TRANSPORTATION: $0
  Shuttle provided by hotel (with reservation).

HOTEL: $338
  4 nights, 2-person shared "Lodge Room"  4 / 2 * (157 + 6 + 6) = 338
  (Tax exempt organization, must fill out form ST 119.1.)

Will you require a hotel room for Saturday night?
  Yes.  (Much less expensive airfare!)

========================================================================

RESEARCH

Self-* Storage Systems

As computer complexity has grown and system costs have shrunk, system
administration has become a dominant factor in both ownership cost and
user dissatisfaction. The research community is quite aware of this
problem, and there have been well-publicized calls to action. Storage
systems are key parts of the equation for several reasons. First, storage
is where the persistent data is kept, making it a critical system
component. Second, storage represents 40-60% of hardware costs in modern
data centers, and 60-80% of the total cost of ownership. Third, storage
administration (including capacity planning, backup, load balancing, etc.)
is where much of the administrative effort lies; Gartner and others have
estimated the task at one administrator per 1-10 terabytes, which is a
scary ratio with multi-petabyte data centers on the horizon.

Many industry efforts are working to simplify storage management by
developing tools to reduce the effort of managing traditional storage
system designs. We believe that this approach is fundamentally limited,
much like that of adding security to a finished system rather than
integrating it into the system design. We believe that what is needed is a
step back and a clean-slate redesign of storage in the data center.
Without re-architecting storage systems, administration complexity will
continue to dominate administrator workloads and ownership cost.

Our research plan includes developing a storage architecture that
integrates automated management functions to simplify the storage
administration tasks.  We refer to such systems as self-* storage systems
in an attempt to capture many recent buzzwords in a single meta-buzzword;
self-* (pronounced "self-star") storage systems should be
self-configuring, self-organizing, self-tuning, self-healing,
self-managing, etc. Ideally, human administrators should have to do
nothing more than provide muscle for component addition, guidance on
acceptable risk levels (e.g., reliability goals), and current levels of
satisfaction.

We think in terms of systems composed of networked intelligent storage
bricks, each consisting of CPU(s), RAM, and a number of disks. Although
special-purpose hardware may speed up network communication and data
encode/decode operations, it is the software functionality and
distribution of work that could make such systems easier to administer and
competitive in performance and reliability.

Our self-* storage design meshes insights from AI, corporate theory, and
storage systems. It borrows the management hierarchy concept from
corporate structure: supervisory oversight without micro-management. Each
storage brick (a worker) tunes and adapts its operation to its observed
workload and assigned goals. Data redundancy across and within storage
bricks provides fault tolerance and creates opportunities for automated
reconfiguration to handle many problems. Out-of-band supervisory processes
assign datasets and goals to workers, track their performance and
reliability status, and exchange information with human administrators.
Dataset assignments and redundancy schemes are dynamically adjusted based
on observed and projected performance and reliability.

Storage administration

The human costs of storage administration are estimated to be 4-8 times
that of storage hardware and software costs, even with today's expensive,
special purpose storage subsystems. With the industry's push towards using
commodity-priced, consumer-quality components, administrative overheads
will only worsen unless there are changes in approach.

The tasks of system administrators can be broken down into various tasks
that the self-* system must be able to handle.

Data protection: Perhaps the most important role of a storage
infrastructure, and thus task source for storage administrators, is
ensuring the continued existence and accessibility of stored data in the
face of accidents and component failures.

Performance tuning: The performance of storage systems always seems to be
a concern for administrators.  Load balancing and parameter tuning occupy
considerable administrator time.

Planning and deployment: Capacity planning, or determining how many and
which types of components to purchase, is a key task for storage
administrators.

Monitoring and record-keeping: System administration involves a
significant amount of effort just to stay aware of the environment.

Diagnosis and repair: Despite prophylactic measures, problem situations
arise that administrators must handle.

The current state-of-the-art for addressing these tasks does not scale
well to petabyte sized data centers.  System administrators are burdened
with a hodge-podge of vendor specific tools and home-brew scripts for
managing storage.  There is some research and development work being
performed to aid in the management of current systems (e.g., HP Labs, IBM,
SNIA).  However, these efforts will face difficulties handling the
frequently failing commodity components of storage bricks.  A new
architecture is necessary to cope with the tasks of storage
administration.

Self-* storage architecture

Dramatic simplification of storage administration requires that associated
functionalities be designed-in from the start and integrated throughout
the storage system design. System components must continually collect
information about tasks and how well they are being accomplished. Regular
re-evaluation of configuration and workload partitioning must occur, all
within the context of high-level administrator guidance.

The self-* storage project is designing an architecture from a clean slate
to explore such integration. The high-level system architecture borrows
organizational concepts from corporate structure. Briefly, workers are
storage bricks that adaptively tune themselves, routers are logical
entities that deliver requests to the right workers, and supervisors plan
system-wide and orchestrate from out-of-band.

============================= Figure 1 ==============================
  Administrative Console
   Supervisor Hierarchy
         Workers
         Router
         Clients
---------------------------------------------------------------------
The administrative console distributes goals to the hierarchy of
supervisors and interacts with the system administrator.  Supervisors
distribute data to workers and adjust the placement of data.  Workers
store data and internally tune to meet goals.  Routers transfer data
between clients and workers.
=====================================================================

Administration and organization: Several components work together to allow
a self-* system to organize its components and partition its tasks. We do
not believe that complex global goals can be achieved with strictly local
decisions; some degree of coordinated decision making is needed. The
supervisors, processes playing an organizational role in the
infrastructure, form a management hierarchy. They dynamically tune
dataset-to-worker assignments, redundancy schemes for given datasets, and
router policies. Supervisors also monitor the reliability and performance
status of their subordinates and predict future levels of both. The top of
the hierarchy interacts with the system administrator, receiving
high-level goals for datasets and providing status and procurement
requests. Additional services (e.g., event logging and directory), which
we refer to as administrative assistants, are also needed.

Data access and storage: Workers store data and routers ensure that I/O
requests are delivered to the appropriate workers for service. Thus,
self-* clients interact with a self-* routers to access data.

Simpler storage administration

Data protection: Two concerns, user mistakes and component failures, are
addressed directly by internal versioning and cross-worker redundancy; a
self-* storage system should allow users to access historical versions
directly (to recover from mistakes), and should automatically reconstruct
the contents of any failed worker. System administrators should only have
to deal with replacing failed physical components at times of their
convenience. The other two concerns, disaster tolerance and archiving,
involve a self-* administrative assistant and some mechanism outside of a
single self-* installation. For example, a self-* backup service could
periodically copy snapshots to a remote replica or archival system.
Alternately, a self-* mirroring service could use the event
logging/notification service to identify new data and asynchronously
mirror it to a remote site.

Performance tuning: A self-* installation should do all performance tuning
itself, including knob setting, load balancing, data migration, etc. An
administrator's only required involvement should be in providing guidance
on performance targets not being met. The administrative interface should
provide the administrator with as much detail about internals as is
desired, which builds trust, but not require manual tuning of any kind.

Planning and deployment: A self-* administrative interface should also
help system administrators decide when to acquire new components. To do
so, it must provide information about the trade-offs faced, such as, "The
system could meet your specified goals with X more workers or Y more
network bandwidth."  The initialization and configuration aspects of
adding components should be completely automated. Plugging in and turning
on a worker should be sufficient for it to be automatically integrated
into a self-* installation.

Monitoring and record-keeping: A self-* storage system will track and
record everything it knows about, which includes the datasets, the
components, their status, and their connections. A reasonable system would
retain a history of this information for future reference.

Diagnosis and repair: Ideally, a self-* storage system will isolate and
re-configure around most problems, perhaps leaving broken components in
place to be attended to later. A system administrator will need to
physically remove broken components, and perhaps replace them. The self-*
system can help in two ways: by giving simple and clear instructions about
how to effect a repair, and by completing internal reconfiguration before
asking for action (whenever possible). The latter can reduce occurrences
of mistake induced multiple faults, such as causing data/access loss by
simply removing the wrong disk in a disk array that already has one
broken. Such problems can also be reduced by having worker storage be
self-labelling, such that reinstalling components allows them to be used
immediately with their current contents.

Overall, a self-* storage system should make it possible for data centers
to scale to multiple petabytes without ridiculous increases in
administrator head-counts. System administrators will still play an
invaluable role in providing goals for their self-* installation,
determining when and what to purchase, physically installing and removing
equipment, and maintaining the physical environment. But, many complex
tasks will be offloaded and automated.

RECENT PUBLICATIONS

Self-* Storage: Brick-based Storage with Automated Administration. Gregory
R. Ganger, John D. Strunk, Andrew J. Klosterman. Carnegie Mellon
University Technical Report CMU-CS-03-178, August 2003.

BENEFITS OF ATTENDING SOSP

Conferences are a wonderful opportunity to meet and interact with other
researchers.  Over the past year I have had the opportunity, through
contacts made at previously attended conferences, to share in the research
being performed at an industry lab and a university. Each of these groups
have requested information on my research and I have also enjoyed the
benefit of using data acquired by the university group.

The diverse research to be presented at SOSP has many connections to my
current project.  Large scale system development (the self-* storage
system project at Carnegie Mellon University) requires a broad range of
systems knowledge, such as is presented at SOSP.  Furthermore, exposure to
academic and industrial partners in the late stage of my PhD education
provides crucial opportunities to develop in-person contacts.  Such
contacts have already led to fruitful discussions between myself and
industry leaders conducting similar research.  Continuing to develop the
current contacts and developing new contacts is currently a personal goal.

The program for the conference lists several useful presentations and
papers that are closely related to my current projects.  Perhaps the most
appropriate research being presented is the work from Google and their
paper titled "The Google File System."  Their research describes a large
scale distributed file system with many features relevant to my work on
self-* storage systems.  Mindful of security concerns in self-*, the paper
titled "Decentralized User Authentication in a Global File System" (not
yet posted) may have useful ideas that can be modified or incorporated
into the security model for self-* storage systems.  The paper from
Berkeley, "Capriccio: Scalable Threads for Internet Services," may present
techniques invaluable to scaling self-* storage systems.

Attending conferences, and the personal interactions that accompany such
events, provide invaluable opportunities for in-person technical
discussions that simply cannot be reasonably held over long distances
(either via e-mail or telephone).  I am very excited about attending SOSP
to take advantage of the gathering of individuals in my field.

========================================================================

POSTER EXTENDED ABSTRACT (DRAFT)

We are exploring the design and implementation of self-* storage systems:
self-organizing, self-configuring, self-tuning, self-healing,
self-managing systems of storage bricks.  Borrowing organizational ideas
from corporate structure and automation technologies from AI and control
systems, we hope to dramatically reduce the administrative burden
currently faced by data center administrators.

As computer complexity has grown and system costs have shrunk, system
administration has become a dominant factor in both ownership cost and
user dissatisfaction. Storage systems are key parts of the equation.
Storage represents 40-60% of hardware costs in modern data centers, and
60-80% of the total cost of ownership. Storage administration (including
capacity planning, backup, load balancing, etc.) is where much of the
administrative effort lies; Gartner and others have estimated the task at
one administrator per 1-10 terabytes, which is a scary ratio with
multi-petabyte data centers on the horizon.

Dramatic simplification of storage administration requires that associated
functionalities be designed-in from the start and integrated throughout
the storage system design. Regular re-evaluation of configuration and
workload partitioning must occur, all within the context of high-level
administrator guidance.

The self-* storage project is designing an architecture from a clean slate
to explore such integration. The high-level system architecture borrows
organizational concepts from corporate structure. Briefly, workers are
storage bricks that adaptively tune themselves, routers are logical
entities that deliver requests to the right workers, and supervisors plan
system-wide and orchestrate from out-of-band.

The administrative interface must provide information and assistance to
the system administrator when problems arise or trade-offs are faced. For
example, there is usually a trade-off between performance and reliability.
The administrator needs to be made aware of such tradeoffs when they
complain about performance beyond the point of the current system's
ability to tune. In addition to identifying problems, the system needs to
help the administrator find solutions.

The supervisor nodes, arranged into a hierarchy, control how data is
partitioned among workers and how requests are distributed. A supervisor's
objective is to partition data and goals among its subordinates (workers
or lower-level supervisors) such that, if its children meet their assigned
goals, the goals for the entire subtree will be met. Creating this
partitioning is not easy. Prior to partitioning the workload, the
supervisor needs to gain some understanding of the capabilities of each of
its workers. Similar to this interaction in human organizations, the
information will be imperfect, resulting in some trial-and-error and
observation-based categorization.

Routers deliver client requests to the appropriate workers. Doing so
requires metadata for tracking current storage assignments, consistency
protocols for accessing redundant data, and choices of where to route
particular requests (notably, READs to replicated data). We do not
necessarily envision the routing functionality in hardware routers. It
could be software running on each client, software running on each worker,
or functionality embedded in interposed nodes.

Workers service requests for and store assigned data. We expect them to
have the computation and memory resources needed to internally adapt to
their observed workloads by, for example, reorganizing on-disk placements
and specializing cache policies. Workers also handle storage allocation
internally, both to decouple external naming from internal placements and
to allow support for internal versioning. Workers keep historical versions
of all data to assist with recovery from dataset corruption.

--Andrew J. Klosterman
andrew5@ece.cmu.edu
http://www.ece.cmu.edu/~andrew5