Calling external functions in C, and calling C functions from other languages, is a common issue in OS programming, especially where the other language is assembly. This page will concentrate primarily on the latter case, but some consideration is made for other languages as well.
Some of what is described here is imposed by the x86 architecture, some is special to the GNU GCC toolchain. Some is configurable, and you could be making your own GCC target to support a different calling convention. Currently, this page makes no effort of differentiating which is what.
Contents
Basics
As a general rule, a function which follows the C calling conventions, and is appropriately declared (see below) in the C headers, can be called as a normal C function. Most of the burden for following the calling rules falls upon the assembly program.
Cheat Sheets
Here is a quick overview of common calling conventions. Note that the calling conventions are usually more complex than represented here (for instance, how is a large struct returned? How about a struct that fits in two registers? How about va_list’s?). Look up the specifications if you want to be certain. It may be useful to write a test function and use gcc -S to see how the compiler generates code, which may give a hint of how the calling convention specification should be interpreted.
Note 3: Stack is 16 byte aligned at time of call. The call pushes %rip, so the stack is 16-byte aligned again if the callee pushes %rbp.
Note 4: Stack is 8 byte aligned at all times outside of prologue/epilogue of function.
System V ABI
The System V ABI is one of the major ABIs in use today and is virtually universal among Unix systems. It is the calling convention used by toolchains such as i686-elf-gcc and x86_64-elf-gcc.
External References
In order to call a foreign function from C, it must have a correct C prototype. Thus, is if the function fee() takes the arguments fie, foe, and fum, in C calling order, and returns an integer value, then the corresponding header file should have the following prototype:
int fee(int fie,char foe,double fum);
Similarly, an global variables in the assembly code must be declared extern:
C functions in assembly or other languages must be declared as appropriate for the language. For example, in NASM, the C function
int foo(int bar,char baz,double quux);
Also, in most assembly languages, a function or variable that it to be exported must be declared global:
Name Mangling
In some object formats (a.out), the name of a C function is automagically mangled by prepending it with an underscore («_»). Thus, to call a C function foo() in assembly with such a format, you must define it as extern _foo instead of extern foo. This requirement does not apply to most modern formats such as COFF, PE, and ELF.
C++ name mangling is much more severe, as the C++ compiler encodes the type information from the parameter list into the symbol. (This is what enables function overloading in C++ in the first place.) The Binutils package contains the tool c++filt that can be used to determine the correct mangled name.
Registers
The general register EBX, ESI, EDI, EBP, DS, ES, and SS, must be preserved by the called function. If you use them, you must save them first and restore them afterwards. Conversely, EAX and EDX are used for return values, and thus should not be preserved. The other registers do not need to be saved by the called function, but if they are in use by the calling function, then the calling function should save them before the call is made, and restored afterwards.
Passing Function Arguments
GCC/x86 passes function arguments on the stack. These arguments are pushed in reverse order from their order in the argument list. Furthermore, since the x86 protected-mode stack operations operate on 32-bit values, the values are always pushed as a 32-bit value, even if the actual value is less than a full 32-bit value. Thus, for function foo(), the value of quux (a 48-bit FP value) is pushed first as two 32-bit values, low-32-bit-value first; the value of baz is pushed as the first byte of in 32-bit value; and then finally bar is pushed as a 32-bit value.
To pass arguments to a C function, the calling function must push the argument values as described above. Thus, to call foo() from a NASM assembly program, you would do something like this
pusheax; low 32-bit of quuxpushedx; high 32-bit of quuxpushbl; bazpushecx; barcall foo
Accessing Function Arguments
In the GCC/x86 C calling convention, the first thing any function that accepts formal arguments should do is push the value of EBP (the frame base pointer of the calling function), then copy the value of ESP to EBP. This sets the function’s own frame pointer, which is used to track both the arguments and (in C, or in any properly reentrant assembly code) the local variables.
To access arguments passed by a C function, you need to use the EBP an offset equal to 4 * (n + 2), where n is the number of the parameter in the argument list (not the number in the order it was pushed by), zero-indexed. The + 2 is an added offset for the calling function’s saved frame pointer and return pointer (pushed automatically by CALL, and popped by RET).
Thus, in function fee, to move fie into EAX, foe into BL, and fum into EAX and EDX, you would write (in NASM):
movecx,[ebp+8]; fiemovbl,[ebp+12]; foemovedx,[ebp+16]; low 32-bit of fummoveax,[ebp+20]; high 32-bit of fum
As stated earlier, return values in GCC are passed using EAX and EDX. If a value exceeds 64 bits, it must be passed as a pointer.
A systems programmer writes program that will not directly make the systems call, rather than he will just specify which system call to use. This involves using a calling convention which is dependent or the hardware architecture of the system where the kernel sits. Hence different architectures have different calling conventions.
A calling convention is an implementation-level design for how subroutines receive parameters from their caller and how the results are returned. Differences in various implementations include where parameters, return values, return addresses and scope links are placed (registers, stack or memory etc.), and how the tasks of preparing for a function call and restoring the environment afterward are divided between the caller and the callee.
Calling convention variation
Below is a list of some of the scenarios describing how the Calling convention varies between different architecture
Which registers the called function must preserve for the caller.
How the task of setting up for and cleaning up after a function call is divided between the caller and the callee.
How return value is delivered from the callee back to the caller — on the stack, in a register, or within the heap etc.
Where parameters, return values and return addresses are placed
The order in which actual arguments for formal parameters are passed.
Comparing x86-32 and x86-64 bit
A single CPU architecture always have more than one possible calling convention but the industry has agreed to some general approach across the architectures form different producers. The 32-bit architecture has 32 registers while x64 extends x86’s 8 general-purpose registers to be 64-bit. Hence there is a difference in the implementation of calling conventions. Below is comparison of major calling conventions between these two architectures.
x-86 32bit
x-86 64 bit
The registers used for system call are — %ebx, %ecx, %edx, %esi, %edi, %ebp
The registers used for system call are — %rdi, %rsi, %rdx, %r10, %r8 and %r9
Parameters are passed on the stack using PUSH mechanism. If there are more than six arguments, %ebx must contain the memory location where the list of arguments is stored
System-calls are limited to six arguments. If there are more than 6 INTEGER parameters, the 7th INTEGER parameter and later are passed on the stack.
The return value is stored in the register %eax.
The return value is stored in the register %rax.
In x86-32 parameters were passed on stack. Last parameter was pushed first on to the stack until all parameters are done and then call instruction was executed.
First the parameters are divided into classes. The class of each parameter determines the manner in which it is passed to the called function.
The 32-bit int ABI (application binary interface) is usable in 64-bit code
The 64-bit ABI calls can not be used in 32-bit system.