ISIS: Process Groups and Causal Multicast

Next: MBone: Network Multicast Up: Distributed Communication Previous: Interprocess Communication

ISIS: Process Groups and Causal Multicast

ISIS [] illustrates how the peer awareness problem can be addressed. The system can provide the abstraction of a process group, which processes can join or leave; and allows a process to multicast a message to a whole process group. The tasks of managing memberships of process groups and multicasting to them, thus, is the task of the system, thereby making individual processes peer unaware.

Example-Replicated Case: As in the previous cases, we create a central session manager and replica on the workstation of each user. Instead of expliciting sending a new string value to each of the other user processes, a replica sends the message to a process group including the session manager and all replicas. The session manager creates the process group and initializes the string, each replica contacts it for the process group id and the current value of the string.

Example-Centralized Case: The single master process now creates a process group including all I/O agents and multicasts it output to this group.

Consider a tricky issue in message delivery that we have so far ignored. Suppose a process receives two messages, m1 and m2, that are causally related, that is, one of these messages, m2, would not have been sent had the other message, m1, not been sent. In our example, that may happen if m1 initializes an erroneous string ( insertAt: 1 str: "helo world"") and m2 contains an edit to the string ( insertAt: 3 str: "l"). To preserve the semantics of the interaction, we would want the IPC mechanism to ensure that the cause (m1) is received at each process before the effect (m2).

Of course, a general purpose IPC mechanism does not know what the messages are about, so it can never know the causality relationship among the messages. But it can take a conservative approach to ordering messages by making two assumptions: (1) If a process sends two messages, m1 and m2, in succession, then the first message, m1, is a cause of the second, m2. (2) If a process sends a message, m2, after receiving a message m1, from another message, then message received, m1, is the cause of the message sent, m2. Stream-based protocols such as TCP/IP provide support for the first, intra-process, ordering. They are sufficient to support applications such as the centralized example above, where only one process is multicasting to the group. ISIS takes this idea a step further by also supporting the second, inter-process, ordering of messages, thereby supporting applications such as the replicated example above, where more than one process multicasts messages. For this reason, the multicast it supports is termed as causal multicast.

Causal multicast ensures that each process in a multicast group receives a cause before an effect. However, it does not ensure that each process in the group receives the set of multicast messages in the same order. In particular, two messages are multicast concurrently (that is, one message is not a cause of the other), then members of the multicast group may receive them in different orders. Continuing with the repl, if two users concurrently change the string to "hello world" and "goodbye world" respectively, then some users would see the first one as the final value and some the second one. Thus, causal multicast suffices as long the application ensures (by providing an appropriate concurrency control protocol) that concurrent messages never conflict with each other. For applications that cannot provide this guarantee, ISIS also provides a stronger version of causal multicast, called atomic multicast, which ensures processes in a process group see multicast messages in the same order. In general, an atomic multicast may not be causal, though in the case of ISIS it is.

ISIS was designed mainly to support replication of applications for fault tolerance. However, as we have seen above, it can also support replication for good interactive performance in a collaborative applications. For this reason, some collaborative applications such as the MASSIVE [] VR system have used it for multicasting messages. However, this multicast is not ideal for all collaborative applications, for two reasons.

First, causal multicast may be too conservative for messages that commute with each other. Consider a replicated implementation of a GROVE-like structured outline in which the replicas exchange fine-grain updates with each other, that is, describe an edit in terms of the smallest structure that changed rather than the whole buffer. In this situation, changes to different parts of the structure would commute with each other. For instance, the edit, section: 1 insertAt: 1 str: "hello world" would commute with section: 1 insertAt: 1 str: "goodbye world" . Even if the first change caused the other, there is no harm done in processing it after the second one. Thus there is no advantage in using ISIS's implementation of causal multicast. On the other hand, there is a disadvantage in using it, since it delays a message until all of its predecessors are received, thereby giving poorer response.

More important, causal multicast is too liberal when concurrent messages conflict with each other, that is, do not commute with each other. Such conflicts would occur in an application such as Grove that does not provide concurrency control, and even in applications that provide fine-grained concurrency control but exchange large-grained updates. Consider again the replicated implementation of a Grove-like editor but this time assume that changes are communicated in terms of the whole buffer. In this case, even if the editor ensures that two users do not concurrently edit the same section, since edits that changes the buffer length (e.g insertAt 1 str: "hello world, insertAt: 100 str: "goodbye world") have conflicts. In cases such as these, causal multicast is not sufficient, and what we want is atomic multicast. But implementations of atomic multicast must serialize concurrent messages through a central process, thereby reducing the performance benefits of replication. For this reason, some applications such as Grove use application-specific optimistic schemes for ensuring consistency. Centralization is less of an issue in systems in which replication is introduced for fault-tolerance rather than performance.

Next: MBone: Network Multicast Up: Distributed Communication Previous: Interprocess Communication

Prasun Dewan
Sun Mar 16 14:09:55 EST 1997