justinian / linux-x64-nasm-cheatsheet.md
As well as XMM0 .. XMM15 for 128 bit floating point numbers.
Put function arguments (first to last) in the following registers (64 bit representations): RDI, RSI, RDX, RCX, R8, R9, then push to stack (in reverse, has to be cleaned up by the caller!) XMM0 — XMM7 for floats
Return values are stored in RAX ( int ) or XMM0 ( float )
RBP, RBX, R12, R13, R14, R15 will not be changed by the called function, all others may be
Align stack pointer (RSP) to 16 byte, calling pushes 8 bytes!
Keep in mind that strings (in C) are 0-terminated
Like in a normal C program, the label that is (de facto) called first is main , with the args argc (argcount) in RDI, and the char** argv in RSI (the commandline arguments as in C’s main function).
Definition size | Definition instruction |
---|---|
8 bit | db |
16 bit | dw |
32 bit | dd |
64 bit | ddq / do |
float | dd |
double | dq |
extended precision | dt |
cmp op1, op2 -> mimics sub op1, op2 but only changes the zero and carry flag for comparing.
- j~ x -> jump to x if ~
- cmov~ x, y -> conditional mov x, y if ~
- setc~ x -> set x to 1 if ~, x is 8 bit reg
- global -> exposes entry point
- extern -> declares a function in another linked .o file (e.g. C function, other asm file)
- section -> sets section, usually:
- .text -> program code
- .data -> data
The program entry point of a standalone program is the label _start . When compiled with gcc, C provides _start , which inits and then jumps to main , which should then be implemented by the program.
- put syscall number in EAX (e.g. on Linux: 60 for exit, 1 for write to stdout)
- put arguments in the registers (see above) like when calling a C function
- execute the syscall instruction
Introduction
I have been tutoring a course called ‘Computer Architecture’ here at the Free University of Berlin more then 5 times over the past three years and most students had a common problem: getting into writing their very first assembly program. As we use Linux 64 bit this tutorial will focus on 64 Bit ELF.
Why Assembly?
I know of only one reason to write programs in assembly, it’s as close as you can get to the way your CPU works, being its biggest disadvantage as well: you’re programming from the perspective of a CPU, not that of a human brain. This prooves difficult for a wide range of students, who are not familiar with CPU architecture. Thus this tutorial will not only try to teach you some assembly basics and NASM syntax, it will additionally try to shed some light on the principles of a computer.
Why NASM?
NASM (Netwide Assembler) is an open source (80×86 and x86-64 architechture) assembler and a pritty good one at that. Compared to MASM, TASM or GAS it is rather easy to use and provides a solid amount of syntactic candy.
How?
When trying something new we usually want to get some positive feedback asap. The normal approcach for an assembly tutorial would be to list all the requirements, work through those step by step, and once they’re met introduce the audience to the actual coding. As I find this highly unsatisfying I’ll get started right away with having you produce your first few lines of code and provide you with the help necessary to fix any requirements on the road. For those of you preferring the more classic approach, here is the requirements section.
Some Coding
Hello World In NASM
So let’s get right to it! Here is the code for a simple Hello World program written in NASM. Go ahead and copy paste the code to a texteditor of your choice and save it as hw.asm .
1 2 3 4 5 6 7 8 9 10 11 12 13
section .data msg db "Hello World!", 10 ; db: data byte, 10: ASCII newline section .text global _start _start: mov rax, 1 ; write mov rdi, 1 ; to stdout mov rsi, msg ; starting at msg mov rdx, 13 ; for len bytes syscall mov rax, 60 ; exit mov rdi, 0 ; with success syscall ;_
Once you have a folder containing the hw.asm file with the above content opened in a shell of your choice type
nasm -f elf64 hw.asm && ld hw.o -o hw && ./hw
and hit Return. Now one of two things can happen. It will either work (you’ll see Hello World! written on your console) or it won’t (everything else). I’ll assume it worked, otherwise take a look at the troubleshooting section.
HINT: Typing program1 && program2 in a shell is equivalent to typing program1 and hitting the Return Button first and then typing program2 .
If you ever wrote a hallo world program before, you’ll find 13 lines of code alot for such a simple thing. That is because we’re used to using libraries. If we write assembler, we’ll just have to do everything manually (not true btw, later more 😉 ). Let’s start in line one.
- 1 section .data : this is where we can use memory with an initial value.
- 2 msg db "Hello World!", 10 : we get some piece of memory and name its starting address msg, make it the size of so many data bytes as the string Hello World! contains (12 that is) and one more for the ASCII newline being the 10 . Everything preceeded by a ; is a comment.
- 3 section .text : this is where our actual code starts.
- 4 global _start : the label _start shall be visible globally, thats a good thing, as it is the standart entrypoint for your Linux programs.
- 5 _start: : create the label _start (don’t forget the colon). It’s basically telling your CPU to go to that location in your code when called.
- 6 mov rax, 1 : NASM syntax gets a Mnemonic (an instruction) at the beginning, followed by a number of arguments (number can be zero :p). Here mov rax 1 means move the number 1 into the CPU-Register RAX .
- 7 mov rdi, 1 : as above, just into CPU-Register RDI .
- 8 mov rsi, msg : i think you have a solid idea what this does, just keep in mind that msg is an address to the point in memory where our text starts.
- 9 mov rdx, 13 : yhe, repetitive ain’t it… We do indeed store a 13 into CPU-Register RDX .
- 10 syscall : throws an interrupt that is a syscall, our operating systems will now do a couple of things. First it will look for the kind of syscall in RAX . As we put a 1 there it knows it’s a write syscall. The write syscall needs three additional parameters. The 1 in RDI tells it to write to stdout (your standard output, usually your console), msg in RSI tells it where the text to be written starts and finally the 13 in RDX tells the write syscall to write the first 13 bytes. So basically we will write Hello World! with an additional newline ASCII character to stdout.
With that knowledge the next three lines are rather easy. All we need to know is that we need to manually terminate our program. A 60 in RAX tells the OS it shall perform the exit syscall, which only expects one parameter in RDI, a 0 in our case, being the return value of our program.
HINT: A list of syscalls can be found on Ryan A. Chapmans Blog.
Addition Function In NASM Used In C-Program
So in order to get things done in pure NASM we need to have some knowledge. We need not only the Mnemonics but the syscalls and a proper understanding for our CPU. To make thing a little more easy one can utalize a higher programming language like C. The following example will do an addition in assember and use that function from a C-Program to write the result to stdout.
Go ahead and save this as addit.asm ,
section .text ; code global addit addit: mov rax, rdi ;param1 in rax add rax, rsi ;additionsergebnis in rax ret
#include extern int addit(int, int); int main(void) printf("%d\n", addit(100, 78)); >
open the folder where you saved it in a shell and run it with this command:
nasm -f elf64 addit.asm && gcc -std=c11 addit.c addit.o -o addit && ./addit
This is a really nice thing to have, as we don’t need to worry about those syscalls. We’ll handle all the syscalls from within C (wich has libraries for it) and let our NASM function do the real work.
There are only two things our addit.asm needs to know:
- Where do my parameters come from? First parameter (the 100 ) in RDI and second parameter (the 78 ) in RSI .
- Where do i need to save the result? In RAX .
This also tells us that add rax, rsi in line 5 computes like: take the value from RSI , add it to the value in RAX and store the result in RAX . The ret in line 6 ends the function call.
HINT: It is possible to write Mnemonics and Registers in upper- as well as lowercase in NASM.
If we take a look at the C-Programm the advantage really shows. Most of the functionality our 13 line Hello World NASM Program had is found in line 7 of the C-Proramm …
Requirements
There are a couple of things you need in order to get started with writing code in NASM, being able to assemble and link it and then finally execute it.
- You will need a 64 bit Linux OS; either native, in a VM or through ssh.
- You will need some sort of text editor.
- You will need a Shell.
- You will need to have NASM installed.
- You will need to have ld installed.
- You will need to have GCC installed.
Troubleshooting
If you landed in this section, you’re probably not familiar with Linux. This usually happens to people who mainly used Windows or MacOS. That is the majoraty of people, even for students in computer science, so you’re in good company ;). But don’t fret as i have tought hundrets of students starting with little or nothing how to code in NASM. So let us work you through those requirements.
Operatin System
For an OS you’ll need a 64 bit Linux. I recommend Linux Mint 64 bit Cinnamon if you have no idea where to start. As you probably want to read less and do more you might like this installation tutorial. Here you can get the full User Guide in your language. This will set you up with an OS feeling alot like Windows with alot of graphical tools to use. I will write a few words about Linux Distros soon, so keep an eye out.
HINT: A good place to get to know some Linux Distros is Distro Watch.
Texteditor
There are some really great texteditors out there and i’m not going to judge them here. If you don’t know where to start i recommend ATOM because its open source and intuitive to use if you come from Windows or MacOS.
In order to install ATOM in Linux Mint type
sudo add-apt-repository ppa:webupd8team/atom sudo apt-get update sudo apt-get install atom
in a shell of your choice.
Shell
As with editors there are alot of great shells out there. I myself am using ZSH with some extras i will cover in a future post. For beginners I recommend fish, as its easy to set up and use.
Additional Software
Install the required software by typing these lines in a terminal:
sudo apt-get install gcc sudo apt-get install nasm