Argus: Distributed Transactions

Next: MVC Up: Collaborative Infrastructures Previous: Rover: Disconnected Replicated

Argus: Distributed Transactions

Argus [], also developed at MIT, is an example of a system that addresses these problems.

Like conventional database systems, it supports transactions. Instead of manipulating relational data in some global database, they manipulate user-defined data dispersed across multiple, autonomous ''databases'' called guardians. A guardian is like a process, module, or monitor in that it is associated with its own address space and exports an interface that can be invoked remotely from another address space. It extends the notion of a module in several ways.

It can declare certain variables as stable - changes made to these variables are saved on stable storage.

Moreover, its types are atomic - that is, concurrent accesses to objects of these types are automatically synchronized. Thus, programmers do not have to worry about synchronization. (This is also the case with monitors, how is a guardian different?)

Argus also allows a process to group together a series of operations on one or more distributed guardians in a transaction. It allows different transactions to be executed concurrently and automatically locks atomic objects to ensures their serialization, that is, their execution is equivalent to a serial schedule. A transaction might commit successfully or abort. In case the transaction commits, all changes made by the transaction to stable variables are written to stable storage. In case it aborts, all changed stable variables are restored to the version saved in stable storage.

A transaction might be decomposed into one or more subtransactions. Like transactions, subtransactions are groups of operations that can execute concurently and are serialized. However, changes made by them are not written to stable storage unless the top-level transaction commits. Moreover, aborting a subtransaction does not abort the enclosing transaction, which can try an alternative subtransaction.

Finally, a guardian is associated with a background process that executes when no other operation is being executed in it.

Example- Centralized case: Like the standard IPC case except that IPC is encapsulated in transactions. Thus, an I/O manager sends input to the master in a transaction, and a master broadcasts the output in same transaction. As a result, changes to the master are synchronized and a failure to send output to a single I/O module causes the whole transaction to fail.

Example - Replicated case: Like the standard IPC case, except that a replica creates a transaction for processing a series of changes from the user and broadcasting them to other replicas. As a result, if the transaction aborts for any reason, all the replicas are restored to their original state.

Argus was used to implement the CES editor discussed earlier, and the designers of CES, while appreciating its automatic support for nested transactions, noticed an important shortcoming for creating interactive programs: When a transaction is aborted, any I/O performed by it is not undone. To illustrate why this is a problem, consider the replicated implementation above, which creates a transaction to process a series of changes from its user and distribute them to other replicas. To give the user feedback, the transaction updates the local screen in response to each user command. Now if the transaction aborts (because it could not distribute the change to some failed site, for instance) all replicas are restored to their previous state, thereby ignoring all effects of the user input. But the local display is not restored.

CES used an interesting trick to address this problem. Here is my impression regarding how it works: It encapsulated each user-display in a separate guardian and provides operations to update the display buffer. It also defines a stable lock, which is acquired at the start of each operation and released at the end of it. It also defined two non stable variables, action-count and commit-count, which were incremented at the start and end of each operation. If commit-count was less than action-count and the lock was not busy, then some transaction that updated the display had aborted. So the display could refresh the screen from the guardian that kept its state.

But how does the display discover this condition? Its background process can poll - but CES provides a more efficient solution. The display guardian also defines a trigger-queue, into which the background process enqueues itself. When a transaction executes an operation is the display, it dequeues the background process after acquiring the lock. The background process then blocks itself waiting for the lock. When it acquires the lock, it checks for the abort condition, refreshing the screen in case it holds, and releases the lock. It then goes back and enqueues itself.

There is a timing problem here: Between the time the background process was dequeued and it tested the lock, another process might enter the display and manipulate an invalid display state. Therefore, every operation checks for the abort condition before manipulating the display.

A more elegant solution would have been possible if Argus called programmer-defined abort and and commit handlers for an object when changes to the object are committed/aborted respectively.

Next: MVC Up: Collaborative Infrastructures Previous: Rover: Disconnected Replicated

Prasun Dewan
Sun Mar 16 14:09:55 EST 1997