Calling ASM functions from Java


January 16, 2012

If you’ve been inquisitive enough to read the “About” pages1 you’ll see that my day job involves writing software in Java. To that end, I’ve put together some code which demonstrates calling a function in a shared library (written in assembler). Hopefully the following will illustrate the steps involved fairly clearly.

Let’s start, then, with that ubiquitous “Hello, World!” program again, except that this time we’ll call a static, native method to do the printing.
public class HelloWorld {

 static {

 public static void main(String\[\] args) throws Exception {

 static native void sayHello();


You’ll note that it’s in the default package, but otherwise it’s a vanilla implementation. We can also start by creating a Makefile to build our code:

all: HelloWorld.class

        javac -cp .

        rm *.class  

With a bit of luck (and a JDK installed on your system) this should compile into a new file called HelloWorld.class. Typically, in order to avoid bugs and minimise on the typing, you would use javah to create the header file which we’ll use as a cheat-sheet from which to identify the function name we need to implement in assembler. We only need to build the header file once so there’s no point adding it to the Makefile. From the working directory with the HelloWorld.class file in it, execute:

javah -classpath . HelloWorld

Its contents look like this:

/* DO NOT EDIT THIS FILE - it is machine generated */
/* Header for class HelloWorld */

#ifndef _Included_HelloWorld
#define _Included_HelloWorld
#ifdef __cplusplus
extern "C" {
 * Class:     HelloWorld
 * Method:    sayHello
 * Signature: ()V
JNIEXPORT void JNICALL Java_HelloWorld_sayHello
  (JNIEnv *, jclass);

#ifdef __cplusplus

So, what’s there to say about this? Well, it references the JDK’s jni.h header and accommodates a C or C++ compiler. However, as I alluded to earlier, we’re only going to use this as a template from which to steal the symbol Java_HelloWorld_sayHello. The as compiler does not permit dependent header files to be declared in the source-file, or anywhere for that matter. Other compilers such as nasm require the definition of external symbols and warn you at compile-time if you refer to a symbol which is undefined; as simply assumes the symbol will be satisfied at link-time, something which folk more able than I suggest leads to extremely hard-to-find bugs later on.

So, to the assembly code, now we know what the function should be called:

.section .data  
        hellotxt:  .asciz "Hello, World!\n"  
        msg_len =  . - hellotxt  
.section .text  
.type   Java_HelloWorld_sayHello, @function  
.globl  Java_HelloWorld_sayHello  
        pushq       %rbp                   # store the parent stack frame's base-pointer  
        movq        %rsp, %rbp             # store the updated stack-pointer as our base-pointer  
        movq        $1, %rax               # sys_write  
        movq        $1, %rdi               # stdout  
        leaq        hellotxt(%rip), %rsi   # address-of 'hellotxt' using RIP-relative addressing  
        movq        $msg_len, %rdx         # value-of symbol 'msg_len', will insert literal 0xF/15   
        syscall                            # make the sys_write call  
        movq        %rbp, %rsp             # restore the previous stack-pointer from %rbp  
        popq        %rbp                   # restore the previous base-pointer from the stack  
        ret                                # return to the instruction after the 'call'  

All fairly straightforward, really. There’s no _start label since we don’t intend this to become an application, and we’ve used the as compiler’s .type directive and @function declaration to tell it about how to treat the symbol Java_HelloWorld_sayHello. Omitting this directive didn’t affect the performance of the function, strangely enough; I suspect the declaration’s importance lies elsewhere.

You will also note the way we load the address of hellotxt is different from our original “Hello, World!” example. That code was compiled as a static binary, so the compiler had absolute control over the address to which it wrote the bytes which comprise the output string. When compiling the assembly code for a shared library, the compiler has no such knowledge, and some runtime indirection has to take place in order to reference its address. To this end, we benefit enormously from the fact that we’re writing 64-bit assembly, as we can use the %rip register to calculate the offset to the hellotxt string. 32-bit relative-addressing is horrendous by comparison, and relies on knowing your relative offset from the Global Offset Table (GOT). You then see such code as this:

 call __i686.get_pc_thunk.bx
 addl $_GLOBAL_OFFSET_TABLE_, %ebx

In this case the call instruction pushes the address of the next execution instruction onto the stack so that the ret instruction can operate; I’d hazard a guess that function __i686.get_pc_thunk.bx inspects that stack-value and returns it (i.e. the instruction pointer’s value) in the EBX register. To this is then added the address of the GOT.

Back in our 64-bit example, the leaq instruction writes into the %rsi register the value offsetOf(hellotxt) + valueIn(%rip). To be a bit more precise, the pseudo-code value offsetOf(hellotxt) is the address of a relocation. Try this resource for much more detail than I want to go into here. All you need to know is that if you intend to use your code in a shared library, you need to use position-independent code.

Right, the next incarnation of the Makefile gets a bit more interesting:

all: HelloWorld.class  
        javac -cp . HelloWorld.o  
        ld -fPIC -shared -o HelloWorld.o  
HelloWorld.o: HelloWorld.s  
        as --64 -g -o HelloWorld.o HelloWorld.s  
        rm *.o *.class *.so  

This Makefile should build and link a shared library by virtue of the -fPIC and -shared arguments to ld. PIC stands for position independent code, of course.

You should now be in a position to execute the following command line:

java -Djava.library.path=$(pwd) -cp . HelloWorld

Which, of course, should result in the expected output. ;)


  1. I’ve changed jobs a few times since this was written ;)↩︎