If you’ve been inquisitive enough to read the “About” pages1 you’ll see that my day job involves writing software in Java. To that end, I’ve put together some code which demonstrates calling a function in a shared library (written in assembler). Hopefully the following will illustrate the steps involved fairly clearly.
Let’s start, then, with that ubiquitous “Hello, World!” program again, except that this time we’ll call a static
, native
method to do the printing.
HelloWorld.java
public class HelloWorld {
static {
System.loadLibrary("hello");
}
public static void main(String\[\] args) throws Exception {
sayHello();
}
static native void sayHello();
}
You’ll note that it’s in the default package, but otherwise it’s a vanilla implementation. We can also start by creating a Makefile
to build our code:
Makefile
all: HelloWorld.class
HelloWorld.class: HelloWorld.java
javac -cp . HelloWorld.java
clean:
rm *.class
With a bit of luck (and a JDK installed on your system) this should compile into a new file called HelloWorld.class
. Typically, in order to avoid bugs and minimise on the typing, you would use javah
to create the header file which we’ll use as a cheat-sheet from which to identify the function name we need to implement in assembler. We only need to build the header file once so there’s no point adding it to the Makefile
. From the working directory with the HelloWorld.class
file in it, execute:
javah -classpath . HelloWorld
Its contents look like this:
HelloWorld.h
/* DO NOT EDIT THIS FILE - it is machine generated */
#include
/* Header for class HelloWorld */
#ifndef _Included_HelloWorld
#define _Included_HelloWorld
#ifdef __cplusplus
extern "C" {
#endif
/*
* Class: HelloWorld
* Method: sayHello
* Signature: ()V
*/
void JNICALL Java_HelloWorld_sayHello
JNIEXPORT (JNIEnv *, jclass);
#ifdef __cplusplus
}
#endif
#endif
So, what’s there to say about this? Well, it references the JDK’s jni.h
header and accommodates a C or C++ compiler. However, as I alluded to earlier, we’re only going to use this as a template from which to steal the symbol Java_HelloWorld_sayHello
. The as
compiler does not permit dependent header files to be declared in the source-file, or anywhere for that matter. Other compilers such as nasm
require the definition of external symbols and warn you at compile-time if you refer to a symbol which is undefined; as
simply assumes the symbol will be satisfied at link-time, something which folk more able than I suggest leads to extremely hard-to-find bugs later on.
So, to the assembly code, now we know what the function should be called:
HelloWorld.s
.section .data hellotxt: .asciz "Hello, World!\n"
= . - hellotxt
msg_len
.section .text
, @function
.type Java_HelloWorld_sayHello
.globl Java_HelloWorld_sayHello
Java_HelloWorld_sayHello:
%rbp # store the parent stack frame's base-pointer
pushq %rsp, %rbp # store the updated stack-pointer as our base-pointer
movq
$1, %rax # sys_write
movq $1, %rdi # stdout
movq (%rip), %rsi # address-of 'hellotxt' using RIP-relative addressing
leaq hellotxt, %rdx # value-of symbol 'msg_len', will insert literal 0xF/15
movq $msg_len
# make the sys_write call
syscall
%rbp, %rsp # restore the previous stack-pointer from %rbp
movq %rbp # restore the previous base-pointer from the stack
popq # return to the instruction after the 'call'
ret
All fairly straightforward, really. There’s no _start
label since we don’t intend this to become an application, and we’ve used the as
compiler’s .type
directive and @function
declaration to tell it about how to treat the symbol Java_HelloWorld_sayHello
. Omitting this directive didn’t affect the performance of the function, strangely enough; I suspect the declaration’s importance lies elsewhere.
You will also note the way we load the address of hellotxt
is different from our original “Hello, World!” example. That code was compiled as a static binary, so the compiler had absolute control over the address to which it wrote the bytes which comprise the output string. When compiling the assembly code for a shared library, the compiler has no such knowledge, and some runtime indirection has to take place in order to reference its address. To this end, we benefit enormously from the fact that we’re writing 64-bit assembly, as we can use the %rip
register to calculate the offset to the hellotxt
string. 32-bit relative-addressing is horrendous by comparison, and relies on knowing your relative offset from the Global Offset Table (GOT
). You then see such code as this:
call __i686.get_pc_thunk.bx
addl $_GLOBAL_OFFSET_TABLE_, %ebx
In this case the call
instruction pushes the address of the next execution instruction onto the stack so that the ret
instruction can operate; I’d hazard a guess that function __i686.get_pc_thunk.bx
inspects that stack-value and returns it (i.e. the instruction pointer’s value) in the EBX
register. To this is then added the address of the GOT
.
Back in our 64-bit example, the leaq
instruction writes into the %rsi
register the value offsetOf(hellotxt) + valueIn(%rip)
. To be a bit more precise, the pseudo-code value offsetOf(hellotxt)
is the address of a relocation
. Try this resource for much more detail than I want to go into here. All you need to know is that if you intend to use your code in a shared library, you need to use position-independent code.
Right, the next incarnation of the Makefile
gets a bit more interesting:
Makefile
all: HelloWorld.class libhello.so
HelloWorld.class: HelloWorld.java
javac -cp . HelloWorld.java
libhello.so: HelloWorld.o
ld -fPIC -shared -o libhello.so HelloWorld.o
HelloWorld.o: HelloWorld.s
as --64 -g -o HelloWorld.o HelloWorld.s
clean:
rm *.o *.class *.so
This Makefile
should build and link a shared library by virtue of the -fPIC
and -shared
arguments to ld
. PIC
stands for position independent code, of course.
You should now be in a position to execute the following command line:
java -Djava.library.path=$(pwd) -cp . HelloWorld
Which, of course, should result in the expected output. ;)
Footnotes
I’ve changed jobs a few times since this was written ;)↩︎