Linux syscall and RISC-V assembly

Sample of RISC-V assembly code

Syscall in Linux kernel, is an interface to access to kernel basic functions. They are described in section 2 of man pages. The introduction is in man 2 syscall (indirect system call), and the list of functions are described in man 2 syscalls. Update: System Calls in lectures of official Linux kernel documentation including « Linux system calls implementation », « VDSO and virtual syscalls » and « Accessing user space from system calls »

This article follow previous one about RISC-V overall progress and available tools to play with, I will try to make a short article here about Linux syscall usage and the RISC-V assembly case.

Table of Content

* Description section of the man page
* Getting the list of function and how to access them
* Passing parameters
* Function number and registers of return values
* Return values and error code
* Compiling and executing on virtual environment
* Update: Bronzebeard assembler and its baremetal environment for real hardware

Description section of the man page

* syscall() is a small library function that invokes the system call whose assembly language interface has the specified number with the specified arguments. Employing
* syscall() is useful, for example, when invoking a system call that has no wrapper function in the C library.
* syscall() saves CPU registers before making the system call, restores the registers upon return from the system call, and stores any error returned by the system call in errno(3).
* Symbolic constants for system call numbers can be found in the header file .

You can find here function, like access to files open/close/read/write/flush, access to sockets, ioctl, uid, gid, pid, messages, ptrace, restart system, etc…

Getting the list of function and how to access them

As far I know, now only a part of syscall functions are accessible easily in assembly, they are defined in /usr/include/unistd.h, and function numbers assigned in ABI are defined in /usr/include/asm-generic/unistd.h.

The more practical match I found is using /usr/include/asm-generic/unistd.h to see which function are available and there respective manpage for the function header definition. For example:
* asm-generic: #define __NR_read 63
* man 2 read: ssize_t read(int fd, void *buf, size_t count);

The ABI with RISC_V as defined in man 2 syscall in section Architecture calling conventions use the registers following 2 tables rules.

Passing parameters

The second table in this section of the man page shows the registers used to pass the system call arguments.

Arch/ABI      arg1  arg2  arg3  arg4  arg5  arg6  arg7  Notes
──────────────────────────────────────────────────────────────
riscv         a0    a1    a2    a3    a4    a5    -

Here are the arguments in the order of the function definition, for example, in read (63) function:

ssize_t      read  ( int fd, void *buf, size_t count );
a0(result) = a7(63)( a0(fd),  a1(*buf),    a2(count) )

For remember, 3 standard I/O file descriptors are STDIN=0, STDOUT=1, STDERR=2, the other are used when opened a file with open and closed by close.

So we set the arguments as this. x0 is the always 0 register:

        addi  a0, x0, 0       # Set STDIN as sources
        la    a1, buffer_addr # load address of helloworld
        addi  a2, x0, 3       # reaad 3 bytes

Function number and registers of return values

And the first one give the register in which put the function number, that will receive return value and errno (error value)

Arch/ABI    Instruction           System  Ret  Ret  Error    Notes
                                  call #  val  val2
───────────────────────────────────────────────────────────────────
riscv       ecall                 a7      a0   a1   -

So for the read function, we need to put 63 (as found in /usr/include/asm-generic/unistd.h in register a7, and registers and following registers will receive the return values, a0 will receive system call result, and a1 an error message (the errno value).

ssize_t read(int fd, void *buf, size_t count);
a0    =  63 (    a0,       a1,         a2)

So can set them as this:

        addi  a7, x0, 63     # set called function as read()
        ecall                # call the function

Return values and error code

After the man page of read(2):
* On success, the number of bytes read is returned
* On error, -1 is returned, and errno is set to indicate the error.

So we can test first the return value of a0 and if a0 < 0 then we jump to part for display the error message, else we can simply display a OK message. So for the branching part, RISC-V in its super reduced set only have < (lt) and <= (le) comparators, you just need to swap registers to compute > (gt) and >= (ge), but this avoid lots more of transistors.

    addi a3,x0,0           # x3=0
    blt  a1,a3, error_seq  # if x1<0 branch to error_seq

We so use here the syscall write function (64) defined as:
* asm-generic: #define __NR_write 64
* man 2 write: ssize_t write(int fd, const void *buf, size_t count);
* So, registers: a0=1 (STDOUT), a1=*buf, a2=count, a7=64 (function number)

And we will finish with exit() syscall, defined as:
* asm-generic: #define __NR_exit 93
* man 2 exit: noreturn void _exit(int status);
* So, registers: a0=return code, a7=93 (function number)

    la    a1, ok           # load address (pseudo code) of ok string
    addi  a2, x0, 3        # set length of text to 3 (O + K + \n)
    addi  a7, x0, 64       # set ecall to write function
    ecall                  # Call the function

    addi  a0, x0, 0        # set return code to 0 (OK) for exit (93) function
    j     end              # unconditional jump to end before quit

error_seq:
    la    a1, error        # load address (pseudo code) of error string
    addi  a2, a2, 0x30     # add 0x30 (0 ASCII code) to the error code
    sb    a2, 7(a1)        # put the (byte) value at position 7 of Error string (before \n)
    addi  a2, x0, 10       # set now length of our string
    addi  a7, x0, 64       # set ecall to write function
    ecall                  # Call the function

    addi  a0, x0, -1       # set return code to -1 (error) for exit (93) function
end:
    addi    a7, x0, 93     # set ecall to exit (93) function
    ecall                  # Call linux to terminate the program

.data:
ok:     .ascii "OK\n"
error:  .ascii "Error:  \n"

RISC-V Longan nano

Compiling and executing on virtual environment

If you don't have a RISC-V hardware (can be found as low as 3€ now), you need to have a cross compiler and qemu for emulating instructions, or a whole system installed.

Packages needed for compiling on ArchLinux x86 or ARM for example.

sudo pacman -S riscv64-linux-gnu-gcc  riscv64-linux-gnu-glibc riscv64-elf-binutils riscv64-elf-binutils riscv64-elf-gcc riscv64-elf-gdb

Note: If you have the following error, just comment the .data: line by a #

test.s: Assembler messages:
test.s:22: Error: symbol `.data' is already defined

Newlib is a lightweight RV32 (RISC-V 32bits) lightweight library for bare metal that can be used instead of a whole GNU system on embedded devices with low memory capacity (as Longan nano, less than 8€ with screen, see picture below, or 3€ Sipeed RV): riscv32-elf-newlib.

I made a simple shell script to don't have to remember the commands to assemble the code from an x86 platform (work also on ARM or RISC-V one) that take the .s as argument:

name=$1
riscv64-linux-gnu-as -march=rv64imac -o ${name}.o ${name}.s
riscv64-linux-gnu-ld -o ${name} ${name}.o

You can add a strip but better to avoid it if you need to debug it:

riscv64-elf-strip --strip-all ${name}

And it can be executed on non RISC-V platforms by using qemu-riscv64, if it doesn't depend on libraries or if you have them installed, it allow you to test it without having a full RISC-V system installed, qemu is so fantastic. On ArchLinux it is available in package qemu-arch-extra:

qemu-riscv64 ${name}

And can be disassembled (will probably use different instruction than your assembly code, due to RISC-V assembly pseudo-instructions:

riscv64-linux-gnu-objdump -d ${name}

Bronzebeard assembler and its baremetal environment for real hardware

RISC-V Longan nano

Update: Bronzebeard is an assembler with light baremetal environment builder for RISC-V, GD32VF103 as Longan Nano (about 8€ with a screen, as pictured on this article pictures) and Wio (similar board with an added ESP8266 SoC). I made an AUR package of Bronzebeard, and someone made a Mandelbrot set demo in 918 bytes pure RISC-V assembly. You can find some other example in the source of Bronzebeard. gd32vf103inator is a set of tools for GD32V, to manage from a simple random text editor.