JavaObjectWeb:

JavaObjectWeb:

Regions of Dynamic Objects

Embedded in the World Wide Web

John B. Smith & F. Donelson Smith
Department of Computer Science
University of North Carolina
Chapel Hill, NC 27599-3175
jbs@cs.unc.edu ---- smithfd@cs.unc.edu
919-962-1792 ---- 919-962-1884
919-962-1799 fax

Overview

Our research project is located at the intersection of three technologies:

Hypertext systems
The World Wide Web
Object-Oriented Programming

We are developing a general object-oriented architecture that incorporates many of the best features found in the technologies listed above in order to overcome some of their more important limitations. The result is a design that is powerful and coherent, yet one that complements and is compatible with all three technologies.

We named the project JavaObjectWeb to suggests several key ideas about the research.

It is concerned with multiple, separate regions of the WWW name space where additional functions and semantics exist.
It is based on object-oriented principles, in general, and Java, in particular.
It provides authorship, reliable links, and a framework in which arbitrary types of objects may be displayed and edited.

Specific goals for the project are to:

Complete the architecture, described below
Build a proof-of-concept implementation, in Java, that is a reliable, full-function system
Use that implementation to explore issues of compatibility, scale, and performance
Demonstrate its effectiveness under actual use conditions.

Specific research questions that will be addressed include:

What are the advantages and disadvantages of using (Java) objects vs. typed files and markup languages for authoring and maintaining large documents and sites?
Can an implementation of the proposed architecture scale and perform adequately for regions of arbitrary size and for arbitrary numbers of such regions?
Can an implementation support dynamic reorganization of objects, maintain reliable links among them, and still meet criteria for performance and scale?

This research is expected to produce the following results:

A general object storage, access, and retrieval system, based on hypertext graph semantics, that can be used for a broad range of applications.
Java class libraries and packages that support additional applications
Regions within the World Wide Web where users have access to functions not provided in the rest of Web-space.
Just as the paradigm of editing can be seen as the basis for a number of applications that aren’t primarily thought of as "editors," this architecture for editing arbitrary types of objects, through dynamic loading of classes, will provide a framework and paradigm for developing a number of network-centric applications that don’t appear to be object editors.
A better understanding of how design trade-offs affect performance and scale and of how to measure or estimate those effects.

Background

Hypertext/hypermedia systems began with an idea described by Vannevar Bush in 1945, were first implemented by Doug Engelbart in the 1960’s, were kept alive by researchers at Brown University during the 1970s, and flourished in the 1980s. Key concepts included in many of these systems were (a) the representation of data as a set of nodes and relations among them as links, (b) visualization and direct manipulation of the resulting graph structure, (c) inherent support for authorship and link reliability, and (d) access through browsing and navigation rather than search or specification of file names. While many hypermedia systems offered interesting alternatives to file systems and databases, most were constrained in capacity and most were single platform systems.

Hypermedia during the 1990s has been dominated by the World Wide Web (henceforth referred to as WWW or the Web). Originally designed as a document delivery system for a few hundred scientists, the Web has become the user interface to a wide range of services and data types for hundreds of millions of people, world wide. Key concepts included in the Web are (a) designed as a distributed, client/server system intended to be used by remote users through the Internet, (b) access from multiple platforms through independent implementations of a common protocol, (c) exploitation of the point and click metaphor within a GUI interface for access and navigation, and (d) support for multiple MIME types.

Features found in earlier hypertext systems that the Web omitted may be as important in explaining its success as those it included. Chief among those omissions were reliability of links and authorship. Whereas earlier hypermedia systems had assumed that it was essential to ensure the integrity of links when nodes are moved or rearranged, the Web simply ignored this issue. If a file was moved from one directory to another or renamed, links represented as HTML anchors with URLs would break, but so be it. Similarly, the Web largely ignored authorship and left it to users to find other means to create and edit content files and insert them into a Web server’s file space. These restrictions are starting to be addressed through HTML extensions (e.g., XML and DHTML), HTML editors, and server support for the HTTP POST method, but authoring new content is still not well integrated into the Web and few tools exist for visualizing and directly manipulating the structure of large, complex documents and sites.

The third technology on which our research is based on is object-oriented (O-O) programming. Key concepts in the O-O perspective include encapsulating function and data within an object, reuse of components, and the incremental building of more specialized components from more general ones through inheritance. In the particular O-O environment in which we will work (Java), platform independence – the notion that the same program can run on many different hardware and software systems -- is also important. However, object-based systems have not provided many tools for browsing collections of objects or accessing persistent object storage through the Web.

Each of these technologies offers a number of useful features. Yet, each has also left out features found in the others that would make it even more useful if they could be added. How much more powerful would conventional hypertext systems have been if they had been integrated into the Internet and had achieved the scale shown to be possible by the Web? How much more useful would the Web be if all links remained valid so long as the target exists regardless of its location, if users could build new pages as easily as they can browse them, or if users could see and manipulate the logical structure of their sites through GUI direct manipulation tools? (Tools such as FrontPage allow some visualization of physical structure.) How much more useful would object-based systems be if persistent storage systems could be organized and accessed like hypermedia systems or the Web?

We believe that for the foreseeable future the designs for systems that would try to solve these problems must begin with the addressing architecture of the WWW (URLs) and support HTTP as one mode of access.

We have based our architecture on principles included in the O-O paradigm; we are implementing it in Java. Key concepts in the architecture developed in our research include the following:

Objects subsume files and markup languages a expressions of content.
Objects are typed, similar to MIME types.
Objects have their own editing and display methods that can be loaded dynamically from remote locations; consequently, all objects can be displayed and edited (and new ones created) as a basic feature of the system.
Objects have identifiers that are globally unique, meaning that once created, they can be located regardless of where they are stored. Thus, conventional Web anchors/links based on URLs that include these identifiers never break.
Graphs constitute an important new class of object supported by the system. All content objects are embedded in a collection of graph structures that may be displayed and restructured through direct GUI manipulation.

The object storage system is Web compatible and can coexist with it.

The current status of our work is this: We have developed an architecture based on the ideas presented (although some portions are more complete than others). We have implemented a substantial portion of our initial design in the form of a demonstration prototype, but several key features have not yet been implemented.

In the research that we propose for the next three years, we will complete all aspects of the architecture. We will also complete the demonstration prototype. We will then implement a second generation of this prototype as a robust, full-function proof-of-concept version capable of supporting actual users. Finally, using the latter, full-function version, we will test and evaluate key claims, including the following:

The system can provide performance comparable to or better than that provided by the Web.
It can handle the data for a substantial site, multiple sites can be logically integrated, and there are no inherent restrictions for continued extrapolation and inclusion of additional sites (scalability).
It can provide a visual representation and direct manipulation of objects organized as graph structures that is powerful and easy to use.
Links in the form of WWW anchors with URLs and system-defined hyperlinks will remain valid even when objects are moved from one location to another (logically or physically).
The system can support authorship and editing for an open-ended set of different object types.
A set of real users can use the system and will find it helpful for their own work.

In carrying out this agenda, we will address a number of issues that are important for basic system design in other contexts. For example, we will add new classes and packages for general use in other Java applications, and we will add new architectural abstractions that apply to other O-O languages. Further, this project has grown out of prior research in collaboration systems. While not the focus of the research described here, the storage system will support asynchronous collaboration through its strong access control and concurrency control components; in future work we expect to add support for collaboration and cooperative work through the browsing, direct manipulation, and authoring component. However, the most important benefit of the research described here will be to demonstrate that an alternative architecture can coexist with the Web, provide important new capabilities, and still provide comparable performance at similar scale.

Project description

In this section we describe the systems research we plan to carry out over the next three years. The project will build on our earlier research in collaboration systems, but will take it in important new directions. It will include substantial architectural and implementation components. Evaluation of this work will be done through system measurements to assess performance and scalability and through use by individuals doing actual work to evaluate usefulness.

In the remainder of this section, we first describe the general architecture we are designing, then the data model which is the foundation for that architecture, and, finally, the design and implementation of our current demonstration system, which is the starting point for a complete implementation.

Architecture

The architecture proposed here is actually composed from three distinct functional groupings, each of which provides a useful subset of the total function. By implementing an application programming interface (API) expressed in Java for each functional grouping, we provide opportunities for applying this architecture in novel applications not explicitly considered in this proposal. The conceptual layering of these three functional groupings is shown in Figure 1.

Figure 1. JavaObjectWeb’s layered architecture.

The basic building block, shown as the bottom layer, for this architecture is the storage system for linked Java objects. Its data model and related functions are described in detail in the following section. The API provided for the storage system can be used in any application that requires its functions. We have chosen to use it to create an application that supports authoring and reliable links within regions of the overall World Wide Web name space. We also use this API for the test tools described in Section 4.

An important idea that will be tested is that these regions of enhanced function can be comprised of objects, rather than conventional files of MIME-typed data or markup languages. The middle layer of our system provides a general framework, based on Java object reflection, that allows arbitrary object types to be created along with viewers and editors for those types. A unique aspect of this design is that it uses the distributed storage system to locate and distribute, when and where needed, any classes required to display and edit an object at the time that object is accessed. This approach stands in contrast to the current practice of using preloaded plugins that are not part of the WWW architecture to enable a browser to display a given MIME type of data. Details on this part of the architecture are given in Section C, below.

At the topmost layer of this architecture are functions that support creation and maintenance of structural and semantic relationships among objects. It is based on storing, along with objects for conventional data, a set of graph objects in which the edges of the graph represent relationships among the nodes of the graph, each of which represents an object in the storage system. Using the framework of the middle architectural layer, facilities are provided that give users (programs as well as people) the ability to create, browse, traverse (search), edit, and maintain the graph objects that define the overall structure of the collection.

Data Storage Model

Two fundamental principles underlie the data storage model: (1) the smallest addressable unit in the store is a content object of arbitrary (but known) type, and (2) every content object in the storage system has a globally unique address which is also a valid URL and is thus completely compatible with the addressing scheme used in the World Wide Web.

The specific form of URLs used in the storage model includes conventional host[:port] components. However, instead of a path component, our system uses a 64-bit object identifier (OID) that is unique within a given region, defined by the host[:port] components of the URL. In the example below, the OID is represented by 16 hexadecimal digits. (Note: human beings are not expected to read or create OIDs; they are maintained by the system software.) Thus, for example, the URL http://wwwng.cs.unc.edu:8888/00D00A7001FE00C6 represents an object that is uniquely addressed by OID 00D00A7001FE00C6 within the region associated with a process that implements the HTTP protocol and is running at port 8888 on host wwwng.cs.unc.edu. Once a content object is created, its OID is never changed and that value is never reused even if the object is deleted. Although an OID may have internal structure that is used by the storage system for efficient access, it is treated as an "opaque" (uninterpreted) value by applications.

Content objects are composed of two parts: (1) a set of properties that uniquely define the characteristics of a specific type of content object, and (2) data that represent the true "content" of the object. For example, an object’s data content could be HTML text, GIF or JPEG images, digitized audio or video (MPEG), a Java object, or any other data. Among the important properties stored for a content object are its type (based on the MIME typing architecture [Borenstein & Freed, 1993] ), specifications required for proper editing or viewing of the data content, and system-defined properties (such as creation time or ownership) that are maintained automatically by the storage system.

A particularly important type of content object is a graph. To provide a well-defined model for access and traversal of the object store, all content objects are explicitly organized into graphs (stored as content objects of type:graph) that represent structural and semantic relationships. This graph-based storage model explicitly encourages users to organize information according to principles of modularity and decomposition by making it easy to represent relationships among elemental content objects. This organization improves human comprehension and increases potential for concurrent access to individual components. For example, the structural relationships among content objects that comprise a document might show the order and links among text- or image-type content objects that define sub-sections, sections, and chapters. Semantic relationships might link a text object introducing a software concept to a figure object showing its design, to a class object giving its implementation, and to an image object depicting its user interface.

Abstractions for grouping and ordering related nodes and for composing them hierarchically are essential in a large-scale storage system. Many relationships among content objects are structural, especially those that indicate access order (e.g., if a group of content objects represent parts of a document, it is necessary to explicitly define the structural relationships of sub-sections to sections, sections to chapters, and chapters to the document). A natural expression of structural relationships is an explicit graph where the nodes and links (edges) of the graph are abstractions showing the logical appearance of some content object (node) within a specific structural relationship represented by the links. Each node of the graph may contain the URL of an associated content object. A node exists in one and only one graph but a given content object may be referenced by an arbitrary number of nodes. Since any node may contain the URL of a content object, each node may logically "contain" a graph-type content object. This recursion provides another simple but powerful model for representing structural relationships by allowing hierarchical composition of a complex information structure from content objects. Each node is assigned a URL address but nodes are not first-class objects and exist only as private data within a graph-type content object. The URL for a node may be used the same way as the URL for a content object (e.g., in an anchor reference in an HTML file) but it is mapped (internally to the storage system) into the URL for the graph-type content object in which it appears. A link is represented by the URLs for the two nodes involved. These relationships are illustrated in Figure 2.

Figure 2. JavaObjectWeb supports two types of links. Structural links denote relationships between nodes in the same graph and are constrained by graph type semantics (e.g., tree). Hypertextual links denote relationships between nodes in different graphs and from anchors within HTML data; they are represented as special forms of URLs.

Because a link in a graph typically represents a structural relationship between two nodes in a collection of related nodes, we use the terms structural-link (abbreviated S-link) and structural graph (S-graph) to denote these concepts. A common case, however, is an S-graph containing nodes but no links; it represents a set of related nodes having non-structural relationships. Nodes may have arbitrary numbers of in-coming and out-going S-links. S-links have a direction, although traversal is supported in either direction. In addition to the basic S-graph with no links, the data model also provides a predefined set of strongly typed S-graphs. Currently five types are defined: general directed graphs, connected graphs, acyclic connected graphs, trees, and lists. The system will guarantee that typed S-graphs are always in a state consistent with their type. No operations are permitted that would violate the integrity of the type. For example, an application is not allowed to create a cycle in an S-graph of type tree. Typed S-graphs are useful for dealing with issues such as integrity, consistency, and completeness in supporting tools for authoring and maintaining complex information structures. Note that the data storage model subsumes the organization of data in a conventional file system (consider S-graphs as directories and nodes with data content as files) while adding new functions for representing structure among nodes within an S-graph (directory).

While composition and structure are necessary for organizing complex information, they are not sufficient -- many useful relationships cannot be modeled as structure. The most obvious example is the fundamental role of anchor references in HTML files that create the cross-structure relationships which form the World Wide Web. Because all content objects have valid URLs, these URLs may be used for anchor references in HTML files or in any other Web-based context. The practice of embedding URLs as anchor references in HTML files has, however, lead to serious problems with the maintainability of information in the Web. (We note that there are other proposed solutions to this problem, e.g., XML. But believe the data model proposed here represents a stronger foundation.) Unlike conventional Web servers which often invalidate embedded URLs when files are moved or deleted, the storage system described here provides stronger semantics for references to its URLs. Specifically, it provides that (a) URL references to nodes that have been moved are always valid (accomplished with a forwarding mechanism), and (b) URL references to deleted nodes return useful information, including the option for recovery of the related content object

Figure 3. JavaObjectWeb’s data storage model, showing both structural and hypertextual links. Sets of hyperlinks form hypergraphs.

To express semantic relationships within the data model analogous to HTML anchors with URLs, we define a more flexible kind of link, called a hyperlink (H-link). Hyperlinks can represent any semantic relationship between two nodes. H-links are used for associations between nodes in different S-graphs or non-structural relationships between nodes within the same S-graph. This latter use permits links that would violate the type constraints of the particular graph type (e.g., tree), were they defined as S-links (see Figure 3). H-links and the nodes they link are grouped into hypergraphs (H-graphs). The properties of URLs, nodes, and links discussed above for structural graphs apply to hypergraphs as well. Links similar in function to H-links are usually the key elements of conventional hypertext systems.

Implementation

The goals for our implementation efforts are two fold. First, we will demonstrate that a Web compatible system built in Java is possible. Second, we will use the system as a testbed to explore key issues, including scale, performance, extensible data types, and reliability across multiple regions.

Figure 4. JavaObjectWeb’s component architecture. It shows a client applet, loaded by conventional Web browser, communicating with a JavaObjectWeb server, running within the context of a JavaServer and comprising a region. Also shown is communication with multiple JavaObjectWeb regions.

As a basis from which to build the system proposed here, we have implemented an initial prototype. We describe the design and implementation of that prototype here and indicate how it will be extended in order to explore key systems research issues.

The main components of the system are shown in Figure 4.

As the figure indicates, the system can be divided into a client side and a server side. Both client and server include conventional WWW components. However, those components are currently used for convenience and to insure Web compatibility. Both could be replaced with Java classes in the future, if this should prove desirable.

The client side includes a conventional Web browser and a client applet that provides the user interface to our system.

The Web browser provides two functions. First, it allows a user to bootstrap the system. By selecting a well-known URL or a URL for any object within a JavaObjectWeb region, one of our servers will return an applet tag within conventional HTML that launches the client applet, providing a user interface. The second function is to render HTML data returned from the storage system; however, Java classes are now available for rendering HTML and we could eliminate this browser function to make the system more self-contained.

The client applet is responsible for launching new windows for specific data types, for supporting cut/copy/paste operations between windows, and for other point-of-control functions. During a typical session, a user will open and close a number of different windows to browse the structure of a region and to access particular content objects. This is done through direct connections with a JavaObjectWeb region, rather than through HTTP

As described in the data model section, graphs play a particularly important role in our system. Consequently, visual renderings of graphs provide the primary visual metaphor for logical structure. The windows for graph objects allow users to create new nodes through simple point and click operations, and users can reorganize the links that denote logical and semantic relations among nodes by similar drag and drop operations. Data from non-graph content objects are displayed and edited in windows that implement the appropriate semantics for the data type. We currently support only HTML as a non-graph type. A major new initiative will be to support an open-ended set of data types by having the client query the content object for a URL that identifies display and editing classes for that object type, request a nearby region to fetch and cache them, and dynamically load them.

The server side of our prototype system includes a JavaServer and several servlets that implement our storage system. These components constitute what we refer to as a JavaObjectWeb region.

The JavaServer provides two main functions. Like the role of the Web Browser on the client side, the JavaServer insures WWW compatibility, and it is used to bootstrap the client applet. The second major function is to support servlets, which provide the architectural abstraction used to implement the object storage system.

The object storage system provides several services. Its most important function is providing persistent storage of Java content objects. Those objects can be accessed through conventional HTTP requests, which return conventional HTTP messages with the content object’s data included in the body of the message. More often, content objects are accessed through direct connections with client applets. The system also provides strong access control and concurrency control, and it locates and caches the display and editing classes needed to support arbitrary object types in the client applet.

We have implemented the basic architecture for the storage, access, and concurrency components, but we have not yet implemented the dynamic class caching scheme. We also need to extend the architecture to address a number of issues pertaining to performance and scaling, including possible replication of regions, forwarding of links across regions, and the problem of maintaining consistency among replicated objects. Consequently, we will redesign and rebuild our current demonstration prototype to provide a full-function testbed system that we will use to explore these and other systems issues, to evaluate the system under actual use conditions, and to demonstrate that it can scale and perform as well as the WWW.

Evaluation

To demonstrate that our architecture can address the issues discussed throughout this proposal and summarized above, we will carry out carry out several different kinds of evaluations. The first set will be system performance measurements to show that the prototype implementation can perform at a level similar to the Web and that it can scale to serve a similar size user community. A second series of studies will address the issue of usefulness. Both studies are described in more detail, below.

System Evaluation

We will conduct a number of empirical investigations to evaluate our hypothesis that an implementation of the architecture can provide good performance and, most importantly, scalability. By scalability we mean that the number of users that can be supported by fixed resources (e.g., a single server machine) has a wide range, and that additional resources (servers) can be added incrementally to support very large numbers of users. One critical aspect of scalability that we will evaluate concerns the efficiency and performance of the implementation for links that remain valid when objects are moved or reorganized in the storage system. The key aspects of the experiments are described in the following paragraphs.

Generating large webs of objects: In order to perform these experiments we must first have a large collection of stored objects along with the graphs that reflect structural and semantic relationships among objects. It is unreasonable to expect that the user installations that we use to evaluate the more qualitative aspects of the system will naturally generate sufficiently large collections of objects during the project. Instead we must rely on generating such large collections using generating programs of two types: (1) a Web-site import program that can process the file system used by a conventional Web server and generate as output the corresponding organization expressed as objects and graphs, and (2) a program that uses the object-store API to generate artificial objects and object webs based on random sampling from a number of distributions that characterize important aspects of such collections (e.g. frequency of inbound and outbound links among nodes, nesting depths of hierarchical organizations, distributions of object types, distributions of object sizes, etc.).

We already have a working prototype of an import program that converts a conventional file-system based Web site into a JavaObjectWeb (currently it deals only with directory-to-graph structural conversions and simple types of objects such as HTML files, including conversion of embedded anchor references). This import program will be extended to cover more object types and structures. We also have a prototype of the second type of program (generator of object stores based on statistical distributions of properties) that was developed as part of earlier NSF support under IRI-9015443). This program will be extended to generate object webs based on new characterizations developed by ourselves and others from data obtained by various Web-searching ("spider") programs.

Generating access requests: Just as it is unlikely that our real users will generate sufficiently large object webs, it is also unlikely that their use of the prototype system will provide any sort of stress test. To evaluate scalability we must be able to generate the request loads on a server (or set of servers) that would result from large numbers of users of the system. To do this we use benchmark programs running on multiple client machines, each instance generating requests by sampling at random from distribution functions that characterize the behavior of a user population (note that this technique has been widely used to evaluate the performance and scalability of distributed file systems). Again, we already have a prototype of such a benchmarking program that was developed as part of the earlier NSF support. We will use data from a number of existing sources of such distributions derived from monitoring user behaviors as they use conventional Web browsers (we assume that Web user behaviors for such characteristics as think times, link following, etc. will be reasonable approximations for our system).

Experiments: With these tools for generating large object stores and for benchmarks of user requests, we propose to conduct a number of experiments investigating scale and performance questions such as:

How many "typical" users can a single server support?
How does the performance of the system change as a function of the number, sizes, and types of stored objects and of the complexity of the structural and semantic links?
How does performance scale as more server instances are added to support a logical region?
How does the frequency of object movement or reorganization (e.g., the ratio of forwarded links) affect performance?

Usefulness Evaluation

To demonstrate that the extensions we have included in our architecture and implementation are useful, we will carry out several actual-use studies. Evaluation will done through data obtained from on-line feedback forms built into the system, questionnaires, and interviews with selected participants.

Several groups who are using the Web for extensive, on-going work will use our system. One such group is instructors using the WWW for semester-long, Web intensive courses within UNC and other universities to be selected. In particular, we anticipate working with four or five instructors who will be teaching pilot sections of a WWW Programming course. The course includes an extensive set of lecture materials, and students present all of their course work during the semester through Web pages. Instructors will be able to use standard lecture material, customize individual lessons, or add new ones of their own. Consequently, this application will test our efforts to provide reliable links and support for copy/paste operations. Both instructor and student will exercise our authorship and hypertextual link provisions.

At least two other groups will test our system. We will develop an alternative version of the Department of Computer Science’s regular Web site. We will do the same thing for the Carolina Health and Environment Community Center site in the UNC School of Public Health. This site serves as a focal point for materials concerned with health and the environment as well as cooperative work tools. For both groups, we will build alternative sites using our system and test them under actual use conditions. We will continue to look for additional groups or sites that have specialized needs or characteristics.