Well, here it is, the ubiquitous “Hello, World!” example.
Despite the title, there are a couple of interesting things to note in the code below. The first is calculating the length of the string hellotxt
, the second being the different usage of the $
operator in GNU assembler.
The string length is calculated using the .dot symbol which is interpreted as “the current address that
as
is assembling into” (thegas
documentation can be found here).I’ve entered three equivalent lines below, two of which I’ve commented out. They would, if compiled, define the symbol
msg_len
and set its value to be the length of the stringhellotxt
using the technique described above. This value is referenced later in the instructionmovq $msg_len, %rdx
in which it is set as the value to%rdx
.The reason I wanted to draw your attention to this is the incongruent meaning of the
$
symbol. In the line above (movq $hellotxt, %rsi
) it means “copy the address ofhellotxt
to%rsi
”, whereas when it references a symbol, it means “replace this placeholder with the value of the symbolmsg_len
”. So to get this right, you need to know your symbols from your memory references.
Anyway, without further ado, here’s the code and a Makefile
which should build it on a 64-bit Linux distro. I’m currently using as
from GNU Binutils 2.20.1-20100303. My kernel is 2.6.32-33 and I’m using Xubuntu.
hello.s
.section .data
hellotxt: .asciz "Hello, World!\\n"
= . - hellotxt # define a *symbol* to represent the length of the hellotxt string
msg_len #.equ msg_len , . - hellotxt # defines the same symbol using an equate
#.set msg_len , . - hellotxt # defines the same symbol using the .set directive
.section .text
.globl _start
_start:
movq $1, %rax # sys_write
movq $1, %rdi # stdout
movq $hellotxt, %rsi # use '$' to get address-of 'hellotxt'
movq $msg_len, %rdx # use '$' to reference the symbol 'msg_len', define above
syscall
movq $60, %rax # sys_exit
movq $0, %rdi # exit code
syscall
Makefile
hello: hello.o
ld -o hello hello.o
hello.o: hello.s
as -gstabs -o hello.o hello.s
clean:
rm hello.o hello
A quick examination of the output of objdump -D hello
shows the following:
Disassembly of section .text:
00000000004000b0 <_start>:
4000b0: 48 c7 c0 01 00 00 00 mov $0x1,%rax
4000b7: 48 c7 c7 01 00 00 00 mov $0x1,%rdi
4000be: 48 c7 c6 e0 00 60 00 mov $0x6000e0,%rsi
4000c5: 48 c7 c2 0f 00 00 00 mov $0xf,%rdx
You can see that the value 0xf
has been substituted for $msg_len
, the length of hellotxt
plus it’s trailling null-byte which was added by the .asciz
directive.
If you were to compile it with msg_len
as a .quad
instead, however, the code would look like this:
hello.s
.section .data
hellotxt: .asciz "Hello, World!\\n"
msg_len: .quad . - hellotxt
.section .text
.globl _start
_start:
movq $1, %rax # sys_write
movq $1, %rdi # stdout
movq $hellotxt, %rsi # use '$' to get address-of 'hellotxt'
movq msg_len, %rdx # value-at 'msg_len'
syscall
movq $60, %rax # sys_exit
movq $0, %rdi # exit code
syscall
The output of objdump -D hello
then looks like this:
Disassembly of section .text:
00000000004000b0 <_start>:
4000b0: 48 c7 c0 01 00 00 00 mov $0x1,%rax
4000b7: 48 c7 c7 01 00 00 00 mov $0x1,%rdi
4000be: 48 c7 c6 e0 00 60 00 mov $0x6000e0,%rsi
4000c5: 48 8b 14 25 ef 00 60 mov 0x6000ef,%rdx
...
Disassembly of section .data:
00000000006000e0 :
6000e0: 48 'H'
6000e1: 65 'e'
6000e2: 6c 'l'
6000e3: 6c 'l'
6000e4: 6f 'o'
...
00000000006000ef :
6000ef: 0f 00 00 The value '15'
6000f2: 00 00
6000f4: 00 00
...
In this, you can see that instead of loading literal 0xF
into %rdx
, the instruction now loads the value at 0x6000ef
into the register. Helpfully, objdump
shows that the value at 0x6000ef is… 15
.