From: Andrew Klosterman Date: Fri Aug 29, 2003 5:18:59 PM US/Eastern To: sospss@wheaton.edu, jeffay@cs.unc.edu Cc: Greg Ganger Subject: SOSP '03 Stipend Application and Poster Abstract This is my SOSP '03 Stipend Application and Poster Abstract. It includes a completed "SOSP Travel Scholarship Application," list of anticipated travel expenses, research position paper, list of publications, and description of the benefits I expect from attending SOSP. There is also a poster abstract at the end of this email, describing joint work with John Strunk and Prof. Greg Ganger. ========================================================================= SOSP Travel Scholarship Application Name: Andrew J. Klosterman Address: 1233 Bellerock St., Pittsburgh, PA 15217 Email: andrew5@ece.cmu.edu ACM member number: (or date of application, if you have not received your number yet) August 27, 2003 (first attempt, online, no confirmation email) August 28, 2003 (second attempt, online, no confirmation email) School: Carnegie Mellon University Department: Electrical and Computer Engineering Degree pursued: PhD When you expect to finish: May 2005 Faculty advisor Name: Greg Ganger E-mail: ganger@ece.cmu.edu Have you attended SOSP before? Yes. (1999) Have you previously received an SOSP scholarship? No. SOSP program participation Are you an author of an accepted paper? No. If so, are you the principal author? Will you be presenting it at the conference? Have you submitted a poster? Yes. (Draft of abstract included.) Is your paper/poster entered in the Student Research Competition? Yes. (I suppose so, I couldn't find any form to "enter" the competition. As far as I can tell, I just send the poster abstract to jeffay@cs.unc.edu.) Anticipated travel expenses TOTAL: $879 AIRFARE: $389 1 adult at US$ 389.00 1:10pm Depart Pittsburgh, PA (PIT) 2:22pm Arrive Albany, NY (ALB) Oct. 18 US Airways flight 1190 3:15pm Depart Albany, NY (ALB) 4:33pm Arrive Pittsburgh, PA (PIT) Oct. 22 US Airways flight 363 MEALS: $152 Per Diem of $38/day (Warren County, NY) GROUND TRANSPORTATION: $0 Shuttle provided by hotel (with reservation). HOTEL: $338 4 nights, 2-person shared "Lodge Room" 4 / 2 * (157 + 6 + 6) = 338 (Tax exempt organization, must fill out form ST 119.1.) Will you require a hotel room for Saturday night? Yes. (Much less expensive airfare!) ======================================================================== RESEARCH Self-* Storage Systems As computer complexity has grown and system costs have shrunk, system administration has become a dominant factor in both ownership cost and user dissatisfaction. The research community is quite aware of this problem, and there have been well-publicized calls to action. Storage systems are key parts of the equation for several reasons. First, storage is where the persistent data is kept, making it a critical system component. Second, storage represents 40-60% of hardware costs in modern data centers, and 60-80% of the total cost of ownership. Third, storage administration (including capacity planning, backup, load balancing, etc.) is where much of the administrative effort lies; Gartner and others have estimated the task at one administrator per 1-10 terabytes, which is a scary ratio with multi-petabyte data centers on the horizon. Many industry efforts are working to simplify storage management by developing tools to reduce the effort of managing traditional storage system designs. We believe that this approach is fundamentally limited, much like that of adding security to a finished system rather than integrating it into the system design. We believe that what is needed is a step back and a clean-slate redesign of storage in the data center. Without re-architecting storage systems, administration complexity will continue to dominate administrator workloads and ownership cost. Our research plan includes developing a storage architecture that integrates automated management functions to simplify the storage administration tasks. We refer to such systems as self-* storage systems in an attempt to capture many recent buzzwords in a single meta-buzzword; self-* (pronounced "self-star") storage systems should be self-configuring, self-organizing, self-tuning, self-healing, self-managing, etc. Ideally, human administrators should have to do nothing more than provide muscle for component addition, guidance on acceptable risk levels (e.g., reliability goals), and current levels of satisfaction. We think in terms of systems composed of networked intelligent storage bricks, each consisting of CPU(s), RAM, and a number of disks. Although special-purpose hardware may speed up network communication and data encode/decode operations, it is the software functionality and distribution of work that could make such systems easier to administer and competitive in performance and reliability. Our self-* storage design meshes insights from AI, corporate theory, and storage systems. It borrows the management hierarchy concept from corporate structure: supervisory oversight without micro-management. Each storage brick (a worker) tunes and adapts its operation to its observed workload and assigned goals. Data redundancy across and within storage bricks provides fault tolerance and creates opportunities for automated reconfiguration to handle many problems. Out-of-band supervisory processes assign datasets and goals to workers, track their performance and reliability status, and exchange information with human administrators. Dataset assignments and redundancy schemes are dynamically adjusted based on observed and projected performance and reliability. Storage administration The human costs of storage administration are estimated to be 4-8 times that of storage hardware and software costs, even with today's expensive, special purpose storage subsystems. With the industry's push towards using commodity-priced, consumer-quality components, administrative overheads will only worsen unless there are changes in approach. The tasks of system administrators can be broken down into various tasks that the self-* system must be able to handle. Data protection: Perhaps the most important role of a storage infrastructure, and thus task source for storage administrators, is ensuring the continued existence and accessibility of stored data in the face of accidents and component failures. Performance tuning: The performance of storage systems always seems to be a concern for administrators. Load balancing and parameter tuning occupy considerable administrator time. Planning and deployment: Capacity planning, or determining how many and which types of components to purchase, is a key task for storage administrators. Monitoring and record-keeping: System administration involves a significant amount of effort just to stay aware of the environment. Diagnosis and repair: Despite prophylactic measures, problem situations arise that administrators must handle. The current state-of-the-art for addressing these tasks does not scale well to petabyte sized data centers. System administrators are burdened with a hodge-podge of vendor specific tools and home-brew scripts for managing storage. There is some research and development work being performed to aid in the management of current systems (e.g., HP Labs, IBM, SNIA). However, these efforts will face difficulties handling the frequently failing commodity components of storage bricks. A new architecture is necessary to cope with the tasks of storage administration. Self-* storage architecture Dramatic simplification of storage administration requires that associated functionalities be designed-in from the start and integrated throughout the storage system design. System components must continually collect information about tasks and how well they are being accomplished. Regular re-evaluation of configuration and workload partitioning must occur, all within the context of high-level administrator guidance. The self-* storage project is designing an architecture from a clean slate to explore such integration. The high-level system architecture borrows organizational concepts from corporate structure. Briefly, workers are storage bricks that adaptively tune themselves, routers are logical entities that deliver requests to the right workers, and supervisors plan system-wide and orchestrate from out-of-band. ============================= Figure 1 ============================== Administrative Console Supervisor Hierarchy Workers Router Clients --------------------------------------------------------------------- The administrative console distributes goals to the hierarchy of supervisors and interacts with the system administrator. Supervisors distribute data to workers and adjust the placement of data. Workers store data and internally tune to meet goals. Routers transfer data between clients and workers. ===================================================================== Administration and organization: Several components work together to allow a self-* system to organize its components and partition its tasks. We do not believe that complex global goals can be achieved with strictly local decisions; some degree of coordinated decision making is needed. The supervisors, processes playing an organizational role in the infrastructure, form a management hierarchy. They dynamically tune dataset-to-worker assignments, redundancy schemes for given datasets, and router policies. Supervisors also monitor the reliability and performance status of their subordinates and predict future levels of both. The top of the hierarchy interacts with the system administrator, receiving high-level goals for datasets and providing status and procurement requests. Additional services (e.g., event logging and directory), which we refer to as administrative assistants, are also needed. Data access and storage: Workers store data and routers ensure that I/O requests are delivered to the appropriate workers for service. Thus, self-* clients interact with a self-* routers to access data. Simpler storage administration Data protection: Two concerns, user mistakes and component failures, are addressed directly by internal versioning and cross-worker redundancy; a self-* storage system should allow users to access historical versions directly (to recover from mistakes), and should automatically reconstruct the contents of any failed worker. System administrators should only have to deal with replacing failed physical components at times of their convenience. The other two concerns, disaster tolerance and archiving, involve a self-* administrative assistant and some mechanism outside of a single self-* installation. For example, a self-* backup service could periodically copy snapshots to a remote replica or archival system. Alternately, a self-* mirroring service could use the event logging/notification service to identify new data and asynchronously mirror it to a remote site. Performance tuning: A self-* installation should do all performance tuning itself, including knob setting, load balancing, data migration, etc. An administrator's only required involvement should be in providing guidance on performance targets not being met. The administrative interface should provide the administrator with as much detail about internals as is desired, which builds trust, but not require manual tuning of any kind. Planning and deployment: A self-* administrative interface should also help system administrators decide when to acquire new components. To do so, it must provide information about the trade-offs faced, such as, "The system could meet your specified goals with X more workers or Y more network bandwidth." The initialization and configuration aspects of adding components should be completely automated. Plugging in and turning on a worker should be sufficient for it to be automatically integrated into a self-* installation. Monitoring and record-keeping: A self-* storage system will track and record everything it knows about, which includes the datasets, the components, their status, and their connections. A reasonable system would retain a history of this information for future reference. Diagnosis and repair: Ideally, a self-* storage system will isolate and re-configure around most problems, perhaps leaving broken components in place to be attended to later. A system administrator will need to physically remove broken components, and perhaps replace them. The self-* system can help in two ways: by giving simple and clear instructions about how to effect a repair, and by completing internal reconfiguration before asking for action (whenever possible). The latter can reduce occurrences of mistake induced multiple faults, such as causing data/access loss by simply removing the wrong disk in a disk array that already has one broken. Such problems can also be reduced by having worker storage be self-labelling, such that reinstalling components allows them to be used immediately with their current contents. Overall, a self-* storage system should make it possible for data centers to scale to multiple petabytes without ridiculous increases in administrator head-counts. System administrators will still play an invaluable role in providing goals for their self-* installation, determining when and what to purchase, physically installing and removing equipment, and maintaining the physical environment. But, many complex tasks will be offloaded and automated. RECENT PUBLICATIONS Self-* Storage: Brick-based Storage with Automated Administration. Gregory R. Ganger, John D. Strunk, Andrew J. Klosterman. Carnegie Mellon University Technical Report CMU-CS-03-178, August 2003. BENEFITS OF ATTENDING SOSP Conferences are a wonderful opportunity to meet and interact with other researchers. Over the past year I have had the opportunity, through contacts made at previously attended conferences, to share in the research being performed at an industry lab and a university. Each of these groups have requested information on my research and I have also enjoyed the benefit of using data acquired by the university group. The diverse research to be presented at SOSP has many connections to my current project. Large scale system development (the self-* storage system project at Carnegie Mellon University) requires a broad range of systems knowledge, such as is presented at SOSP. Furthermore, exposure to academic and industrial partners in the late stage of my PhD education provides crucial opportunities to develop in-person contacts. Such contacts have already led to fruitful discussions between myself and industry leaders conducting similar research. Continuing to develop the current contacts and developing new contacts is currently a personal goal. The program for the conference lists several useful presentations and papers that are closely related to my current projects. Perhaps the most appropriate research being presented is the work from Google and their paper titled "The Google File System." Their research describes a large scale distributed file system with many features relevant to my work on self-* storage systems. Mindful of security concerns in self-*, the paper titled "Decentralized User Authentication in a Global File System" (not yet posted) may have useful ideas that can be modified or incorporated into the security model for self-* storage systems. The paper from Berkeley, "Capriccio: Scalable Threads for Internet Services," may present techniques invaluable to scaling self-* storage systems. Attending conferences, and the personal interactions that accompany such events, provide invaluable opportunities for in-person technical discussions that simply cannot be reasonably held over long distances (either via e-mail or telephone). I am very excited about attending SOSP to take advantage of the gathering of individuals in my field. ======================================================================== POSTER EXTENDED ABSTRACT (DRAFT) We are exploring the design and implementation of self-* storage systems: self-organizing, self-configuring, self-tuning, self-healing, self-managing systems of storage bricks. Borrowing organizational ideas from corporate structure and automation technologies from AI and control systems, we hope to dramatically reduce the administrative burden currently faced by data center administrators. As computer complexity has grown and system costs have shrunk, system administration has become a dominant factor in both ownership cost and user dissatisfaction. Storage systems are key parts of the equation. Storage represents 40-60% of hardware costs in modern data centers, and 60-80% of the total cost of ownership. Storage administration (including capacity planning, backup, load balancing, etc.) is where much of the administrative effort lies; Gartner and others have estimated the task at one administrator per 1-10 terabytes, which is a scary ratio with multi-petabyte data centers on the horizon. Dramatic simplification of storage administration requires that associated functionalities be designed-in from the start and integrated throughout the storage system design. Regular re-evaluation of configuration and workload partitioning must occur, all within the context of high-level administrator guidance. The self-* storage project is designing an architecture from a clean slate to explore such integration. The high-level system architecture borrows organizational concepts from corporate structure. Briefly, workers are storage bricks that adaptively tune themselves, routers are logical entities that deliver requests to the right workers, and supervisors plan system-wide and orchestrate from out-of-band. The administrative interface must provide information and assistance to the system administrator when problems arise or trade-offs are faced. For example, there is usually a trade-off between performance and reliability. The administrator needs to be made aware of such tradeoffs when they complain about performance beyond the point of the current system's ability to tune. In addition to identifying problems, the system needs to help the administrator find solutions. The supervisor nodes, arranged into a hierarchy, control how data is partitioned among workers and how requests are distributed. A supervisor's objective is to partition data and goals among its subordinates (workers or lower-level supervisors) such that, if its children meet their assigned goals, the goals for the entire subtree will be met. Creating this partitioning is not easy. Prior to partitioning the workload, the supervisor needs to gain some understanding of the capabilities of each of its workers. Similar to this interaction in human organizations, the information will be imperfect, resulting in some trial-and-error and observation-based categorization. Routers deliver client requests to the appropriate workers. Doing so requires metadata for tracking current storage assignments, consistency protocols for accessing redundant data, and choices of where to route particular requests (notably, READs to replicated data). We do not necessarily envision the routing functionality in hardware routers. It could be software running on each client, software running on each worker, or functionality embedded in interposed nodes. Workers service requests for and store assigned data. We expect them to have the computation and memory resources needed to internally adapt to their observed workloads by, for example, reorganizing on-disk placements and specializing cache policies. Workers also handle storage allocation internally, both to decouple external naming from internal placements and to allow support for internal versioning. Workers keep historical versions of all data to assist with recovery from dataset corruption. --Andrew J. Klosterman andrew5@ece.cmu.edu http://www.ece.cmu.edu/~andrew5