GraphDomain:

Design Notes


Goals

Basic goals for this project include the following:

Architecture

An approximation of our current view of the GraphDomain architecture is shown below:

GraphDomain Architecture

Components

The overall architecture is defined in terms of seven major components.

Interaction

First level component interactions include the following:

Specific Issues

Following are brief discussions of specific design issues. Readers should have a detailed knowledge of the DGS architecture, since most are concerned with possible changes being considered for GraphDomain or with problems in DGS.

OIDs

OIDs will be needed. They will play a very important role. They must be unique within a domain and they must support fast access. For administrative reasons, they should also support underlying partitioning so that resources may be balanced and maintained. To avoid frequent contact with a GraphDomain, GraphApplets must be able to create their own OIDs, from a seed supplied by the Domain.

OIDs will be 64 bits long. They will be treated as 16 hex characters, divided into four blocks of varying length.

The lowest order group ( 16 bits) will designate a particular graph object, such as the nodes within a graph and content objects. Normally, a graph will receive a seed which will serve as its own OID. It will then generate subsequent OIDs for its nodes from that seed. One implementation for this is that the graph OID has all of its low order bits set to zero and the OIDs derived from it are simply integers set to last_value+1.

The second group of values (20 bits) will designate a clustering object: the graph itself or a set of file content objects that are related, such as the content files for the nodes in a particular graph. This clustering concept will be invisible to the user and, hence, not part of the data model.

The third block (16 bits) will designate a partition of files, such as a set associated with a particular user or group.

The highest block (12 bits) will designate a high order resource at the IP level. It could represent different IP synonyms for the same DSN value, different machines, etc.

OIDs will be stored internally as 64bit integers, but they will be presented to the user as 16 hex characters divided into four block. Blocks will be colon delimited. Thus, [012:3456:789AB:CDEF] will be interpreted as:

Thus, a given GraphDomain may include up to 4k machines, each of which supports up to 64k partitions or users, each of which may have 2m graphs ro groups of files, each of which can include 64k nodes or individual files.

Graph Access

Graph objects will be stored within GraphDomains. The primary access to them will be by OID. Thus, GraphDomain will require a fast index and retrieval scheme keyed by OID.

In memory tables and/or hashing are anticipated for the two high order blocks. The file system will access the third block. The mapping from object to location can thus be determined by masking low-order bits, (at worst)accessing two in-memory tables, and fetching a single file.

Move Node

The capability to both generate new OIDs in the client and access graphs quickly (e.g., without maintaining global OID tables) does not come without costs. One of those costs is incurred when a node is moved from one graph to another. We would like very much to maintain the integrity of hyperlinks (both WWW and GD hyperlinks) when a node is moved from one graph to another. However, if we are to retain the capability to identify the graph by masking low order bits for node OIDs, a node cannot be truly moved from one graph to another; rather, it must be deleted in one and a new node created in the other. The problem with this is that hyperlinks, consisting of URLs (DID+OID), would not be valid for the "new" node.

A workable implementation to solve this problem appears to be to not delete the node from the original graph but, rather, to mark it as invalid and store a pointer (URL) to the newly created node in a different graph. Additional function can be provided by also storing a pointer (URL) in the new node back to the original node and, hence, its original context. If the node is moved again, the linking would be extended, forming a doubly-linked list.

There are two costs for this strategy. First, the links must be maintained. Done is a simple way, this could be handled straight-forwardly on the client side when the node is "moved." If shortcuts are introduced, the implementation becomes more complicated. Second, when a hyperlink is followed, a process would have to verify that the node to which the hyperlink was connected is still valid; if not, it would intervene and fetch the pointed-to graph, etc., until a graph with a valid pointed to link is found.

GraphContent Objects

Several problems are solved if a new first class object is defined: GraphContent. They will be managed by GraphDomains and will be assigned OIDs, like other first class Graph objects.

All public access to GraphContent objects will be through a Node for which a particular GraphContent object is designated as that Node's contents. Designation of the Node's contents will be through the GraphContent object's OID.

All copies of Nodes will be copy by value, including the Node's content reference. This results in a copy by reference for the content object. Thus, multiple Nodes can have the same contents while allowing each node to be a unique object.

As part of its responsibility to maintain Graph objects, the GraphDomain will most likely have to maintain a count of GraphObject references. When a Node is deleted, the GraphDomain must be notified. It would probably delete the Content object when the last Node is deleted that refers to it, but other semantics are possible.

URLs

URLs that refer to Graph objects will include the following parts: Thus, they will conform to standard HTTP. IP addresses will be maintained as DNS strings. Thus, access will require a call to a DNS server to obtain the numeric IP address. However, this indirection allows for a site to be moved from one IP address to another and it resolves the nx1 problem in which multiple IP addresses may map to a single DNS name.

HyperLinking

Hyperlinks (and HyperGraphs) are the least understood components of the architecture. This discussion represents an attempt to record our current understanding of the issues and our very tentative design "decisions" in this area. Two types of hyperlinks will be supported. First, conventional anchors within HTML documents that include URLs pointing to GraphDomain object. The second type will be hyperlinks external to HTML content, similar to DGS's notion of hyperlink. These hyperlinks will be included within HyperGraphs. Both types of hyperlinks will be guaranteed under moves of the endpoint node from one graph to another and for insertion and deletion of graphs/directories in the node/graph-content hierarchy.

HyperLinking is simplified by making all Nodes unique and, thereby, eliminating the need to determine the particular context in which a Node appears (since it can, by definition, appear in only one context). Further, since all objects in a Domain are referenced by OID, embedded hyperlinks -- consisting of an IP address for a particular GraphDomain and an OID -- are guaranteed so long as the node on the end exists within the Domain. Thus, hyperlinks will "follow" Nodes under moves to whatever contexts (Graphs) in which they appear.

This will also be true with respect to a Region, since we take the position that Regions support the same semantics as Domains. But it may require the forwarding of requests from one Domain to another and, possibly, for OIDs to be unique within Regions, not just Domains.

Several options are possible for deletions. Domains could simply report an error; they could return a message stating when the Node was deleted; or they might retain the Node -- disembodied from an Graph context -- for some period of time to provide recovery and meaningful response to user (when deleted, by whom, option to recover, etc.).

An open set of issues exists with respect to notification. What users/processes should be notified of what changes that occur in a GraphRegion?

Operations affected by HyperLinking Requirements

Many of the implications of hyperlinking are uncovered by considering the semantics of several common operations. We discuss several such relevant operations here. HyperGraph. A HyperGraph consists of a set of HyperLinks and HyperNodes. HyperNodes are similar to StructuralNodes except that their content objects are restricted to only StructuralNodes (i.e., their URLs) or NULL. HyperNodes will often be created by coping a Structural Node from a StructuralGraph to a HyperGraph. HyperLinks join HyperNodes within HyperGraphs. One interpretation of a HyperGraph is a "view" of some portion of the StructuralGraph space.

Thus, HyperGraphs will be assigned OIDs similar to StructuralGraph OIDs. HyperNodes will have their own unique OIDs, also. Their content objects, however, will be OIDs for StructuralNodes. Hyperlinks will be pairs HyperNodes.

MoveNode . Moving a node from one StructuralGraph to another results in a new node object in the second context, with its own OID. The content object of the node is copied. The node in the first graph is not truly deleted, but marked as invalid. A forward pointer to the new node will be stored in the first node, and a back pointer will be stored in the new node to the first node.

DeleteNode. as in MoveNode, the node is not truly deleted, but marked invalid. Thus, an as-yet-to-be-defined restore operation could be supported.

CopyNode. Copying a node from one graph to another results in a new object in the second graph, with its own OID. The content object URL for the first node is copied to the second.

HyperGraph Access

A key requirement for HyperGraphs is access. Once one obtains a URL to a HyperGraph, access is conventional, but obtaining that URL is problematic.

The key issue is the "multiline subway station" problem. Think of StructuralGraphs as providing one "line" of access, in which a user may move from one context to another by opening node contents or by ascending to a parent node for a graph. Think of HyperGraphs as providing a second "line" of access, in which a user may follow links form one HyperNode to another (and, subsequently, opening a given HyperNode's corresponding StructuralNode/Graph). If users are traversing the first line, enter a given StructuralGraph, how can they know what HyperGraphs are present for a given StructuralNode in that context?

The solution taken in DGS was to store within a given StructuralNode all of the HyperLinks that included it as an endpoint; however, that approach will not scale. What is needed is a scaleable design that will provide fast access for most needs and slower access for infrequent requirements.

Consider a typing of HyperGraphs. The author's HyperGraph should receive highest priority. Often, these Hyperlinks will exist as embedded anchors/URLs in HTML content objects, but not always. For Graph content objects, external HyperGraphs will be needed. The specific user's own HyperGraphs should receive second priority. This will require a Personal Identifier (PID) analogous to an OID. Third priority is group HyperGraphs in which the specific user is a member. A fourth and (much) lower priority is the HyperGraphs of other users for which the specific user has read access.

One way to address these needs is to introduce a search capability into the GraphDomain in which a mapping between StructuralGraphs (or StructuralNodes) and HyperGraphs is maintained. This is the problem that was avoided in the hierarchical structure of OID bits, but we don't currently have an alternative approach.


email: jbs@cs.unc.edu
url: http://www.cs.unc.edu/~jbs/