Some Assembly Required - Stack Alignment

The System V ABI “AMD64 Architecture Processor Supplement” stipulates that the stack should be aligned to a 16-byte boundary before calling a function. It provides the following:

The end of the input argument area shall be aligned on a 16 (32, if __m256 is passed on stack) byte boundary. In other words, the value (%rsp + 8) is always a multiple of 16 (32) when control is transferred to the function entry point. The stack pointer, %rsp, always points to the end of the latest allocated stack frame.

So… aligning the stack to a 16-byte boundary is easy, right? Well, mostly, so long as you remember the rules under which you have to pass parameters on the stack rather than in registers. After that, you can use one of the following ways of clearing the lowest 4 bits in rsp:

andq  $~0xF, %rsp    # Align the stack to a 16-byte boundary

andq  $-0x10, %rsp   # Align the stack to a 16-byte boundary

The source operands are identical in both cases, and take advantage of the fact that that stack grows “downwards”, i.e. pushing a value onto the stack appears at a lower memory address than the value pushed before it. Clearing the least significant 4 bits has the effect of subtracting the “right” number of bytes to align the destination operand to a 16-byte boundary. The source operands to andq look like this in binary format:

1111111111111111111111111111111111111111111111111111111111110000

By way of example, let’s take a properly-aligned stack value such as 0x7fffffffe1c0, which in binary is

0000000000000000011111111111111111111111111111111110000111000000

Performing an AND operation with ~0xF will leave the source-value unchanged. However, given a value which is not properly aligned, such as 0x7fffffffe1b8, you get the following:

1111111111111111111111111111111111111111111111111111111111110000 &
0000000000000000011111111111111111111111111111111110000110111000 =
0000000000000000011111111111111111111111111111111110000110110000

Or put another way, 0x7fffffffe1b8 & ~0xF = 0x7fffffffe1b0.