Atomic instructions are used by the OS to provide higher-level concurrency constructs such as locks and semaphores. Probably the best known is the cmpxchg
instruction, which takes two operands: a source register and destination register or address. To be useful in concurrent code, the destination operand will be a memory address. It is described in the Intel Software Developer’s Manual Volume 2A at 3-188 (or page 260 according to my PDF reader) as follows:
Compares the value in the AL, AX, EAX, or RAX register with the […] destination operand. If the two values are equal, the […] source operand is loaded into the destination operand. Otherwise, the destination operand is loaded into the AL, AX, EAX or RAX register. RAX register is available only in 64-bit mode.
…
In 64-bit mode, the instruction’s default operation size is 32 bits. […] Use of the REX.W prefix promotes operation to 64 bits.
Anyway, here’s an example. Assume we are trying to increment a value in memory whose address is in %rbx
. We first read the value stored at that address into %rax
, then use the lea
instruction to store and increment it into %rcx
. We then issue the cmpxchg
instruction (with a quad-word suffix), using %rcx
as the source and (%rbx)
as the destination operand.
If the operation was successful, the ZF
flag is set: we test this with the jnz
instruction. If it was unsuccessful (some other thread changed the value between the read and the CAS instructions), then immediately try again by looping to label .L1
. This is possible since in the failure case, cmpxchg
stores the value at the destination address in %rax
, so we don’t need to repeat the initial read.
movq (%rbx), %rax # Move value of counter into RAX
.L1:
leaq 1(%rax), %rcx # Copy and increment the counter value into RCX
lock cmpxchgq %rcx, (%rbx) # Compare RAX with the value at address $counter. If equal,
# replace value at $counter with value in RCX, otherwise
# put value of counter in RAX
jnz .L1 # If the CAS was unsuccessful, goto .L1: the value of
# counter is returned in RAX (Intel p621)
Obviously, we could have achieved the same effect with a lock add
or lock xadd
, but then this example code.