Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System08 Aug 2016
Bayou  is a replicated storage system providing weakly consistent guarantees designed for mobile computing environments. The system and paper were published in 1995. At that time PDA's (Personal Digital Assistant) were common and played a big influence in Bayou's design1.
Bayou only requires occasional, pair-wise communication between computers. This is because the collaborating computers were not all connected simultaneously, disconnections happened with certain frequency and connection time was expensive. Even though nowadays the price of connection for mobile devices has dropped, allowing "full connectivity", the characteristics mentioned by the authors can't simply be left out when developing systems today.
Based on this notion that devices will be offline for a good portion of time, Bayou doesn't have a "disconnected" mode. Instead, it considers that several "connectedness" levels exist. One well known problem in distributed systems are network partitions. Systems with stronger consistency guarantees might (or probably must) prohibit one side of the partition from making progress. That is not the case with Bayou, as the authors state:
Supporting disconnected workgroups is a central goal of the Bayou system.
The normal mode of operation in Bayou is pair-wise communication so when devices are partitioned there is no difference.
Since weak connectivity is taken into account, the system provides weakly consistent, replicated data. Clients can read and write to any replica without explicit coordination from other replicas. The changes are propagated through the replicas via pair-wise interactions meaning that eventually all servers will store the same data.
Applications must be aware that they might read inconsistent data and Bayou provides support for application-specific conflict detection and resolution. Clients can even read conflicting data while the conflict is not yet resolved, either because human intervention is required or because conflicting operations are still being spread through the system.
The authors cited several examples of applications that could be built using Bayou: shared calendars, mail and bibliographic databases, and others.
Bayou's Basic System Model
In the Bayou system, the data is replicated in full at a number of servers. The
applications acting as clients interact with the Bayou system via an API. The
API provide two operations:
Read allows queries over a
data collection and
Write can insert, update, or delete a number of data items
in a collection.
Only one server is required for clients to perform some work. Clients can read
and write data in the server and don't need to wait the
Write to propagate.
Clients don't need to interact with a single server, and Bayou even provides
session guarantees to reduce client-observed inconsistencies when interacting
with multiple servers2.
Write also carries information that lets the receiving server to detect if
there is a conflict and how to resolve it.
The storage system can be thought as an ordered log of the
and the data resulting from the execution of these operations. Each server
performs the operations locally, the conflict detection and resolution are also
performed as they are encountered during execution of the
Write operations. As
mentioned before, the data after all known
Write ops are immediately available to any client.
When the pair-wise connections are made, the servers exchange information about
Write ops that each one knows about. The information includes which operations
they are, and the order they should be executed. This contact between servers is
called anti-entropy session.
The following figure illustrates the Bayou System Model:
Server ┌────────────────┐ ┌───────────────────┐ │ ┌───────────┐ │ │ │ │ │ Storage │ │ │ Application │ │ │ System │ │ │ ┌──────────┐ │ │ └───────────┘ │ ├───┤Bayou API ├────┤ Read │ │ ┌────────────────┐ │ └──────────┘ │──────or─────▶│ Server State │ │ ┌───────────┐ │ │ ┌───────────┐ │ Write │ ┌─────────────┐│ │ │ Storage │ │ │ │Client Stub│ │ │ │█████████████││ │ │ System │ │ │ └───────────┘ │ │ │█████████████││ │ └───────────┘ │ └───────────────────┘ │ └─────────────┘│ │ │ Client └────────────────┘ ┌──────────────▶│ Server State │ ▲ │ │ ┌─────────────┐│ │ ▼ │ │█████████████││ │ ┌────────────────┐ │ │█████████████││ │ │ ┌───────────┐ │ │ └─────────────┘│ │ │ │ Storage │ │ └────────────────┘ │ │ │ System │ │ ▲ Anti-entropy │ └───────────┘ │ │ │ │ │ Servers │ └───────────▶│ Server State │ │ │ ┌─────────────┐│ ▼ │ │█████████████││ ┌────────────────┐ │ │█████████████││ │ ┌───────────┐ │ │ └─────────────┘│ │ │ Storage │ │ └────────────────┘ │ │ System │ │ ▲ │ └───────────┘ │ │ │ │ └──────────────▶│ Server State │ │ ┌─────────────┐│ │ │█████████████││ ┌───────────────────┐ │ │█████████████││ │ │ │ └─────────────┘│ │ Application │ └────────────────┘ │ ┌──────────┐ │ ▲ ├───┤Bayou API ├────┤ Read │ │ └──────────┘ │◀────────or─────┘ │ ┌───────────┐ │ Write │ │Client Stub│ │ │ └───────────┘ │ └───────────────────┘ Client
Conflict Detection and Resolution
One of the goals of the system is to not restrict how conflicts are detected and resolved in a single way for all applications. What works for one application might not work for a different one. The intention is to support arbitrary applications, with different forms of conflict detection and resolution.
In Bayou, the application specifies its notion of conflicts and also the policy for their resolution, and the system implements the mechanisms for reliably detecting conflicts, and resolving them as the application specified. The mechanisms included in Bayou are: dependency checks and merge procedures.
Application-specific conflict detection is done via the usage of dependency
Write operation includes a dependency check that consists of an
application-supplied query and its expected result. A conflict is detected if
when running the query the expected result is not returned at the server that
the query was executed. If a conflict is detected the
Write operation is not
executed and the Bayou system will then invoke the mechanism for resolving the
You probably have heard about the bank transaction example where a transaction
from A to B is being attempted. Let's say the amount to be transferred is $100.
Now let's say that the before the application issues the
Write operation, it
has read that account A has $150. In a system with a more traditional optimistic
concurrency control, it would check if account A still had $150 before applying
Write. If between issuing and applying the
Write, a different process
removed $50 from account A the $100
Write would be aborted. Note that account
A would still have sufficient funds to complete the operation successfully,
according to the application semantics. In Bayou because the application is
responsible for defining if there is a conflict or not, the application could
simply provide a dependency check that would require that account A had $100
before applying the write, even though that prior to issuing the
application has read that A had $150.
When a conflict is detected, Bayou executes another procedure provided by the
application to resolve the conflict in question. This procedure is called a
merge procedure, and it is given by the application along with the
operation and the dependency check. These merge procedures are written in a
high-level, interpreted language. They can have embedded data, and can perform
Read ops on the current replica's state. The merge procedure is expected to
resolve any conflicts detected by its dependency check and provide a revised
update to apply. The whole process of detecting a conflict, resolving it is done
atomically at each server as part of executing the
If a conflict cannot be resolved by the merge procedure, it is still expected to run to completion, and log that the detected conflict couldn't be resolved. This allows that a person could later resolve the conflict.
Bayou doesn't forbid other clients from issuing operations when a conflict is detected. The replicas can always remain accessible. A potential problem is that new operations might depend on data that is in conflict and this could lead to a cascade of conflicts.
As mentioned before, Bayou guarantees that all servers eventually receive all
Write ops via the pair-wise anti-entropy process and that two servers holding
the same set of
Write ops will have the same data contents. To achieve
eventual consistency Bayou has two features:
Write ops are performed in the
same, well-defined order at all servers, and the conflict detection and merge
procedures are deterministic so that all servers resolve the conflicts in the
Write operations can have two types: tentative and committed. When Bayou
Write from a client it characterizes the operation as tentative.
These operations are ordered according to the timestamps of the server that
first accepted them. Eventually, each
Write will become committed, and they
are ordered according to the time at which they commit and before any tentative
Servers are not required to maintain synchronized clocks, although it is desired
to keep them reasonably close so that the induced order is more or less the same
order the user sees when submitting operations. The timestamps at each
server must be monotonically increasing so that the pair
<timestamp, Id of server that assigned it>
produces a total order on the operations. To timestamp the
the Bayou system maintains a logical clock which it is generally synchronized
with its wall clock, but to preserve the causal ordering of
Write ops, the
server may need to advance its logical clock when receiving new
during the anti-entropy process.
One question you might be asking now is if each server applies the
operations as soon as they receive them, how the operations will then be
eventually applied in the same order at each server?
The answer is that the servers are able to undo the effects of a previously
Write and re-execute it in a different order, and a single
server is responsible for defining the total order for execution.
Write Stability and Commitment
Write ops may be executed multiple times, Bayou characterizes a
Write as stable when it has been executed at that server for the last time.
This happens when the set of operations before the
Write in the log is fixed.
Bayou uses a commit procedure to mark the operations as stable. That is, a
Write becomes stable when it is explicitly committed. As mentioned before,
Write ops are placed ahead of any tentative
Write operations in
each server's log.
The commit system used in Bayou is called primary commit. This is, a single
server is responsible for committing the operations. The information about which
Write operations are committed is then spread during anti-entropy. Each data
collection can have a different designated primary server.
Considering the context from 1995, presented in the beginning of the paper,
where most of the time the devices were disconnected from each other, you can
imagine that a commit protocol that requires the majority of servers to be
connected at the same time to decide on committed operations would be
unreasonable. The protocol used by Bayou allows maximizing the rate of committed
Write ops when the primary is chosen to be the main point originating updates.
Although Bayou uses a protocol that has a primary server in charge of committing
operations, its design doesn't impose restrictions when the primary is
unavailable. The other servers still accept updates, allowing the system to
continue functioning. The Bayou API provides means for inquiring about the
stability of operations. Clients can ask if a given
Write operation (using the
operation unique identifier) is stable at the server. Note that the answer may
vary depending on which server was contacted. The API also makes possible for
clients to work only with stable data.
Another thing to keep in mind is that the order in which the operations are committed depend on the order the servers make contact with the primary. So, if a server stays disconnected for a long time with the primary it could have operations committed after operations (from different servers) with later timestamps. Operations for a single server will commit in timestamp order since they are ordered in the server that accepted them and exchanged in that order during anti-entropy.
Storage System Implementation Issues
The following figure illustrates how the storage system in Bayou looks like:
┌────────────────────┐ │████████████████████│O ┌──│████████████████████│ │ └────────────────────┘ Timestamp Vectors │ Write Log ┌────────────────────┐ └───────────▶┌───────┐◀─┐ Tuple Store (checkpoint) │████████████████████│C ├───────┤ │ ┌─────────────────────────────────┐ │████████████████████│────────┐ ├───────┤ │ │ ┌───────┐ │ └────────────────────┘ │ ├───────┤Committed │ │Table 1│ ┌───────────┐ │ ┌────────────────────┐ │ ├───────┤ │ │ ├─┬─┬─┬─┤ │ Table 2 │ │ │████████████████████│F │ ├───────┤◀─┼──┐ │ ├─┼─┼─┼─┤ ├─┬─┬─┬─┬─┬─┤ │ │████████████████████│─┐ └────────▶├───────┤◀─┘ │ │ ├─┼─┼─┼─┤ ├─┼─┼─┼─┼─┼─┤ │ └────────────────────┘ │ Undo Log ├───────┤◀─┐ │ │ ├─┼─┼─┼─┤ ├─┼─┼─┼─┼─┼─┤ │ │ ┌───────┐◀─────├───────┤ │ │ │ ├─┼─┼─┼─┤ ├─┼─┼─┼─┼─┼─┤ │ │ ├───────┤ ├───────┤ │ │ │ ├─┼─┼─┼─┤ └─┴─┴─┴─┴─┴─┘ │ │ ├───────┤ ├───────┤ │ │ │ ├─┼─┼─┼─┤ │ │ ├───────┤ ├───────┤ │ └─────▶│ ├─┼─┼─┼─┤ │ │ ├───────┤ ├───────┤ │ │ └─┴─┴─┴─┘ │ │ ├───────┤ ├───────┤Tentative │ ┌───────────────────┐ │ │ ├───────┤ ├───────┤ │ │ │ Table 3 │ │ │ ├───────┤ ├───────┤ │ │ ├─┬─┬─┬─┬─┬─┬─┬─┬─┬─┤ │ │ ├───────┤ ├───────┤ │ │ ├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ │ │ ├───────┤ ├───────┤ │ │ ├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ │ │ ├───────┤ ├───────┤ │ │ ├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ │ │ └───────┘◀────┐├───────┤ │ │ └─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘ │ └───────────────▶└───────┘◀─┘ └─────────────────────────────────┘ ▲ Tuple Store │ ┌─────────────────────────────────┐ │ │ ┌───────┐ │ │ │ │Table 1│ ┌───────────┐ │ │ │ ├─┬─┬─┬─┤ │ Table 2 │ │ │ │ ├─┼─┼─┼─┤ ├─┬─┬─┬─┬─┬─┤ │ │ │ ├─┼─┼─┼─┤ ├─┼─┼─┼─┼─┼─┤ │ │ │ ├─┼─┼─┼─┤ ├─┼─┼─┼─┼─┼─┤ │ │ │ ├─┼─┼─┼─┤ ├─┼─┼─┼─┼─┼─┤ │ │ │ ├─┼─┼─┼─┤ └─┴─┴─┴─┴─┴─┘ │ │ │ ├─┼─┼─┼─┤ │ │ │ ├─┼─┼─┼─┤ │◀─────────────┘ │ └─┴─┴─┴─┘ │ │ ┌───────────────────┐ │ │ │ Table 3 │ │ │ ├─┬─┬─┬─┬─┬─┬─┬─┬─┬─┤ │ │ ├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ │ │ ├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ │ │ ├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ │ │ └─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘ │ └─────────────────────────────────┘
There are three main components: the Write Log, the Tuple Store, and the Undo Log.
The Write Log stores
Write that have been received by the server. They are
ordered by their global committed order or tentative order. Once a
committed the server may discard that operation from the Log since it will not
be re-executed anymore, so the Write Log will actually contain a tail of
committed operations and all known tentative operations following them.
To keep track of which
Write operations the server has received and were
already discarded, each server maintains a timestamp vector, in the figure
represented as the "O vector" (O for "ommitted"). Each server stores the
timestamps of the latest
Write it has received (and discarded) from a given
server as a way to prevent re-accepting the same operation in a future
The Tuple Store is a database where the updates will act on and provide the
necessary data for
Read operations. It was implemented as an in-memory
relational database, something that was considered a practical limitation by the
authors3, but not something that is intrinsic to the overall Bayou design. As
pointed out earlier, clients may ask to work only with stable data so the Tuple
Store needs to maintain two views of its data: a full view and a committed view.
This is done by having a 2-bit characteristic vector in each tuple identifying
the set of views that contain the tuple in question.
The other two vectors shown in the figure above, vectors C and F, store the
timestamps for committed and tentative operations, respectively. They are used
to quickly identify the sets of
Write operations that need to be exchanged
The Undo Log, as the name suggests, is used to store tentative
Write that need
to have their effects rolled back. This is the case when a tentative
been executed but later the server discovers new committed operations that need
to be executed. The server needs to leave the Tuple Store state the same as when
the newly received operation was inserted.
For crash recovery purposes, the full Write Log and a checkpoint of the Tuple
Store are stored in stable storage. The checkpoint Tuple Store only reflects a
Write operations and it contains the effects of any
Write ops that
have already been discarded. In the figure you see two Tuple Stores and
two Write Logs because the current Tuple Store and the Write Log are stored in
memory for performance reasons. The system also records in stable storage the
unique id of the last
Write that is reflected in the checkpoint Tuple Store.
With this information the system can recover the state of the server prior to a
crash by recovering the recorded Tuple Store and replaying a suffix of
operations present in the Write Log.
This paper was an interesting read because many of the challenges that influenced Bayou's design still exist today, if not identical but similar. The idea of pushing the conflict detection and resolution out of the system and closer to the application can broaden the range of detected conflicts and also resolve them in a better way than simply rejecting them. The key is that the application can offer more context with its semantics and work together with the system. The servers always make progress locally which in turn gives high availability since all operations are accepted. At some point these servers need to agree on the total order of execution of these operations and the commit protocol in Bayou is responsible for providing the eventual consistency claimed by the system. Clients can even ask for different consistency levels via the session guarantees presented in .
Since many of the challenges still exist you might recognize the solutions being applied in newer systems. This is a good thing too keep in mind, learning from older systems can help and inspire new ideas and new systems for the future.
-  D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer, and C. H. Hauser, “Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System,” 1995, pp. 1–12.
-  D. B. Terry, A. J. Demers, K. Petersen, M. J. Spreitzer, M. M. Theimer, and B. B. Welch, “Session Guarantees for Weakly Consistent Replicated Data,” 1994, pp. 1–10.