File System: Coarse-grained Data

Next: Traditional DBMS: Fine-Grained Up: Shared Distributed Repositories Previous: Shared Distributed Repositories

File System: Coarse-grained Data

Given a file system shared by multiple processes, here is a general scheme for developing a collaborative application: For each user interacting with the application, we create one or more processes that interact with the user, and linking among the users is implemented via one or more files shared among these processes. A file may directly store binary representation of shared data or a textual representation of it. It may be created by one of the user processes or by a special process executed before any user accesses it.

Example: A special initializer program is executed with the session name as an argument:

session -name ourHelloWorld

The program creates a greetings file called helloWorld and initialize its contents and access list.

To join the session, another user can execute the helloWorld program:

helloWorld -join ourHelloWorld

This program is an editor that allows its user to view and change the greeting stored in the file.

Several variations of it are possible: The simplest approach is for it to provide users with explicit commands to load/store the greeting in the file. Instead of requiring the user to explicitly load a new greeting, it can periodically poll the file. Instead of requiring the user to explicitly store a new greeting, it can automatically store the greeting on every character. Moreover, instead of writing directly to the shared file, it can create a separate version, and always read the latest version of the file. It can check-out a version in the locked mode to prevent conflicts. Otherwise, it can use merge facilities to combine two versions that were concurrently modified.

There are several advantages of using a file system for implementing collaborative applications. A file system automates the implementation of persistence, access control, and concurrency control. Moreover, in artifact-based collaborative applications supporting implicit sessions, the naming scheme provided by a file system can be used as a basis for naming sessions, thereby automating details of hierarchical names, symbolic links, and other rich concepts provided by modern file systems. Furthermore, file-based version control systems automate the creation of multiple versions of shared data and the diffing and merging of these versions. Finally, file systems also come with programs that provide efficient searches of textual data.

However, in comparison to some of the other infrastructures discused below, a file system has four main disadvantages. First, a processes must poll the file to determine if it has been changed by some other process. Since a high polling frequency (e.g. 3 seconds) can severely degrade the system performance, this approach is not suitable for real-time collaboration. This is a problem in all systems that facilitate sharing through passive repositories of data. Later we will look at several examples of active repositories, that is, repositories that allow user-defined triggers to be associated with updates to the data.

Second, a file resides primarily on disk, and thus communication between processes can involve potentially costly disk accesses. This problem is reduced but not eliminated by caching, which makes this technology further unsuitable for real-time sharing of rapidly-changing state such as scrollbar and pointer positions.

Third, a file system automates a small subset of the functions of a collaborative application. For instance, while it enables (non real-time) coupling, the programmer is responsible for implementing the details of depositing and fetching data from the file. Similarly, it does not provide automatic support for undo. Later we will look at systems that make application programmers totally unaware of these functions.

Fourth, a file system provides coarse-grained units of data, and functions such as concurrency and access control that operate on these units cannot be used directly by all applications. For instance, applications that need different records of a bibliography to be associated with separate locks must implement their own concurrency control.

Finally, this approach to developing collaborative applications does not work if the collaborators do not share a file system. Thus, it is not suitable for wide-area collaboration.

Next: Traditional DBMS: Fine-Grained Up: Shared Distributed Repositories Previous: Shared Distributed Repositories

Prasun Dewan
Sun Mar 16 14:09:55 EST 1997