Concurrent Document Mutations
- how-to
You can use the CAS value to control how concurrent document modifications are handled. It helps avoid and control potential race conditions in which some mutations may be inadvertently lost or overridden by mutations made by other clients.
The CAS is a value representing the current state of an item. Each time the item is modified, its CAS changes.
The CAS value itself is returned as part of a document’s metadata whenever a document is accessed.
In the SDK, this is presented as the cas
field in the result object from any operation which executes successfully.
CAS is an acronym for Compare And Swap, and is known as a form of optimistic locking. The CAS can be supplied as parameters to the replace and remove operations. When applications provide the CAS, server will check the application-provided version of CAS against the CAS of the document on the server:
-
If the two CAS values match (they compare successfully), then the mutation operation succeeds.
-
If the two CAS values differ, then the mutation operation fails.
CAS, on the server-side might be implemented along these lines (pseudocode):
uint Replace(string docid, object newvalue, uint oldCas=0) {
object existing = this.kvStore.get(docid);
if (!existing) {
throw DocumentDoesNotExist();
} else if (oldCas != 0 && oldCas != existing.cas) {
throw CasMismatch();
}
uint newCas = ++existing.cas;
existing.value = newValue;
return newCas;
}
Demonstration
The following demonstrates how the server handles CAS. A use case for employing the CAS is when adding a new field to an existing document. At the application level, this requires the following steps:
-
Read entire document.
-
Perform modification locally.
-
Store new document to server.
Assume the following two blocks of code are executing concurrently in different application instances:
Thread #1 | Thread #2 |
---|---|
>>> result = cb1.get('docid') >>> new_doc = result.value >>> new_doc['field1'] = 'value1' >>> cb1.replace('docid', new_doc) |
>>> result = cb2.get('docid') >>> new_doc = result.value >>> new_doc['field2'] = 'value2' >>> cb2.replace('docid', new_doc) |
Retrieving the document again yields:
>>> cb1.get('docid').value
{u'field2': u'value2', u'a_field': u'a_value'}
Note that field1
is not present, even though the application inserted it into the document.
The reason is because the replace on Thread #2 happened to run after the replace on Thread #1, however Thread #1’s replace was executed after Thread #2’s get: Since the local version of the document on Thread #2 did not contain field1 (because Thread #1’s update was not stored on the server yet), by executing the replace, it essentially overrode the replace performed by Thread #1.
1 |
(#2): |
2 |
(#1): |
3 |
(#1): |
4 |
(#2): |
5 |
(#1): |
6 |
(#2): |
Using CAS - Example
In the prior example, we saw that concurrent updates to the same document may result in some updates being lost. This is not because Couchbase itself has lost the updates, but because the application was unaware of newer changes made to the document and inadvertently overwrote them.
>>> result = cb1.get('docid') >>> new_doc = result.value >>> print new_doc {u'a_field': u'a_value'} >>> cur_cas = result.cas >>> print cur_cas 272002471883283 >>> new_doc['field1'] = 'value1' >>> new_result = cb1.replace( 'docid', new_doc, cas=cur_cas)
>>> print new_result.cas 195896137937427 |
>>> result = cb2.get('docid') >>> new_doc = result.value >>> print new_doc {u'a_field': u'a_value'} >>> cur_cas = result.cas >>> print cur_cas 272002471883283 >>> new_doc['field2'] = 'value2' >>> new_result = cb2.replace( 'docid', new_doc, cas=cur_cas)
|
Handling CAS errors
If the item’s CAS has changed since the last operation performed by the current client (i.e. the document has been changed by another client), the CAS used by the application is considered stale. If a stale CAS is sent to the server (via one of the mutation commands, as above), the server will reply with an error, and the Couchbase SDK will accordingly return this error to the application (either via return code or exception, depending on the language).
How to handle this error depends on the application logic. If the application wishes to simply insert a new property within the document (which is not dependent on other properties within the document), then it may simply retry the read-update cycle by retrieving the item (and thus getting the new CAS), performing the local modification and then uploading the change to the server. For example, if a document represents a user, and the application is simply updating a user’s information (like an email field), the method to update this information may look like this:
{
lcb_CMDGET *cmd = nullptr;
check(lcb_cmdget_create(&cmd), "create GET command");
check(lcb_cmdget_key(cmd, document_id.c_str(), document_id.size()),
"assign ID for GET command");
/**
* Time in seconds, note that the server might reset time to default, if it larger than
* maximum time (both durations are configurable). The following command will help to
* discover effective values for the feature.
*
* $ cbstats -u Administrator -p password localhost all | grep ep_getl
* ep_getl_default_timeout: 15
* ep_getl_max_timeout: 30
*/
check(lcb_cmdget_locktime(cmd, 5), "lock for 5 seconds");
check(lcb_get(local_instance, &result, cmd), "schedule GET command");
check(lcb_cmdget_destroy(cmd), "destroy GET command");
lcb_wait(local_instance, LCB_WAIT_DEFAULT);
if (result.rc == LCB_ERR_DOCUMENT_LOCKED
|| result.rc == LCB_ERR_TEMPORARY_FAILURE) {
std::stringstream msg;
msg << "Document is locked for " << item_value
<< ". Retrying in 100 milliseconds...\n";
std::cout << msg.str();
std::this_thread::sleep_for(std::chrono::milliseconds(100));
continue;
} else {
check(result.rc, "could not find list document");
}
cas = result.cas;
}
Sometimes more logic is needed when performing updates, for example, if a property is mutually exclusive with another property; only one or the other can exist, but not both.
Performance considerations
CAS operations incur no additional overhead. CAS values are always returned from the server for each operation. Comparing CAS at the server involves a simple integer comparison which incurs no overhead.
CAS value format
The CAS value should be treated as an opaque object at the application level. No assumptions should be made with respect to how the value is changed (for example, it is wrong to assume that it is a simple counter value). In the SDK, the CAS is represented as a 64 bit integer for efficient copying but should otherwise be treated as an opaque 8 byte buffer.
Pessimistic locking
While CAS is the recommended way to perform locking and concurrency control, Couchbase also offers explicit locking. When a document is locked, attempts to mutate it without supplying the correct CAS will fail.
Documents can be locked using the get-and-lock operation and unlocked either explicitly using the unlock operation or implicitly by mutating the document with a valid CAS. While a document is locked, it may be retrieved but not modified without using the correct CAS value. When a locked document is retrieved, the server will return an invalid CAS value, preventing mutations of that document.
This handy table shows various behaviors while an item is locked:
Operation | Result |
---|---|
get-and-lock |
Locked error. |
get |
Always succeeds, but with an invalid CAS value returned (so it cannot be used as an input to subsequent mutations). |
unlock with bad/missing CAS value |
Locked error. |
unlock with correct CAS |
Item is unlocked. It can now be locked again and/or accessed as usual. |
Mutate with bad/missing CAS value |
CasMismatch error. |
Mutate with correct CAS value |
Mutation is performed and item is unlocked. It can now be locked again and/or accessed as usual. |
A document can be locked for a maximum of 30 seconds, after which the server will unlock it. This is to prevent misbehaving applications from blocking access to documents inadvertently. You can modify the time the lock is held for (though it can be no longer than 30 seconds).
Setting a lock greater than 30 seconds will cause Couchbase Server to set the lock duration at the Server’s default value, which is 15 seconds. |
Be sure to keep note of the cas value when locking a document.
You will need it when unlocking or mutating the document.
The following blocks show how to use lock
and unlock
operations.
check(lcb_cmdstore_cas(cmd, cas), "assign CAS value for REPLACE command");
The handler will unlock the item implicitly via modifying the item with the correct CAS.
If the item has already been locked, the server will respond with CasMismatch which means that the operation could not be executed temporarily, but may succeed later on.
APIs and Additional Information
API information for working with CAS can be found in our API docs.