Category Archives: Uncategorized

Short and practical introduction to FPGA, Verilog and Verilator and few words about SystemVerilogCourte introduction pratique à FPGA, Verilog et Verilator et quelques mots à propos de SystemVerilog

→ Version en français ici

Warning, I’m a less than one month beginner in Verilog, Verilator et FPGA, studied as a hobby, if there are some mistakes I will correct it. You can contact me on The Fediverse.

Table of Content


* Introduction
** Real world full process
** Things to know and understand
** How to code a FPGA
* Verilog
** Values
** Types
** Gates
** Modules
** Simple example, writing an “and” gate
** Initial and always blocks
* Simple example with Verilator
** Make the test with Verilator
** Tracing example and GTKWave
** About Verilator examples
** Basic practical example with Verilator
* Further reading

The text in strong are here to help diagonal reading.

Introduction

Still continue to go to lower layers with the world of FPGA (Field Processors Gateway Array). This is a reprogrammable development electronic tool used to build and test processor. After the processor is validated in FPGA, you can start to build ASIC (Application-specific integrated circuit), actual hardwired processors that we use every-days in our computing devices. FPGA are also used as is in several industrial appliance (avionics, audio or video processing, etc…) for their parallelism, so the fact they are faster than a general purpose ASIC and a piece of software in these cases, and the ability to update them easily in case of problem. This post is a little introduction about FPGA, the popular IEEE standard Verilog HDL (“Hardware description language”) language and how to test it with free and open source software (FOSS) Verilator simulator. If you want to use VHDL, GHDL is FOSS simulator for VHDL.

Real world full process

The steps to implement a circuitry on an FGPA after it’s design are (Opensource softwares are given as reference for each step):
* Implementation of the logic (on an HDL, “Hardware description language”), so Verilog here, any text or code editor can be used (I use amphibian VIM and NeoVIM and sometime Geany light GUI but powerful IDE.
* SimulationVerilator here, making testbench, it’s already a good step in verification too.
* Formal verificationYoSYS is an open source verification and synthesizer system, so it’s more formal verification than Verilator. We will not go through this point here.
, Synthesis — making routes and placement of logic blocks, with physical and time constraints (see basic principle and today implementations) blocks from HDL script and output a bitstream in the FPGA format, YoSYS again.
* FlashingopenFPGAloader make the job on most FPGA.
* Real world testing on FPGA.

We will only test on simulator here, for understanding the basis. When you are ok, with this, you can go further, choose a FPGA, and start working with it.

Things to know and understand

The main concept to understand and be able to manipulate to build a basic integrated circuit are:
* 0, 1: electricity moving across a wire or not, that can be intrepted as 1 and 0 sate, true and false, etc…
* Boolean operators (for logical gateway), that is the main tool to manipulate them. You don’t need to have other notions of electronics. Most basics one are OR, AND, NOT, and XOR (exclusive OR). You can made all this operator with a NAND gateway (AND+NOT), but this not the more efficient way of doing them. If you master algebraic properties of booleans and know how to simplify them, you will be able to produce simpler circuit and do more efficient one, but it is not required to make circuitry.
* ALU (arithmetics logical unit), allowing simple additions
* flip-flop: special circuitry used to memorize information and change it depending on input and clock edge.
* latch: sequential logic circuitry does not depend on clock
* clock: Time, operation are time dependant and need to be synchronized to have a coherent behaviour. Clocks are like the drummer in a music band, It give the rhythm basis necessary to every time related construction, other instruments, singers, and dancers follow his time signals. Called clock signals here.
* Multiplexers (MUX) that can aggregate several lines in one, and demultiplexer that made the opposite.
These functions are all integrated in each logical block of a FPGA.

Other important part of a FPGA are:
* One to few PLL (Phase-locked loop) that allow to have secondary clocks ticks at frequencies multiples of the main clock or to sync on external clocks.
* A flash memory that allow to keep the bitstream on the board after a shutdown, it will be automatically loaded at start.
* Lots of peripherals I/O blocks to communicate with the outside world.

How to code a FPGA

There are three main methods for developing FPGA circuitries, and both method can be mixed:
* Drawing them with graphical tool representing this logical doors and other basic tools mentioned (flip-flop, latch, MUX, etc).
* Using a HDL (hardware description language), like Verilog or VHDL for the most famous. Compilers are here to build all the stuff behind this languages, as there are assembler to transform human easily readable assembly code to machine code, or higher level language (C/C++, Python, Lua, JavaScript, FORTRAN, BASIC, Pascal, Camel, LISP, etc) compiler or interpreter to convert them to machine code. There are also higher level language like Chisel, that are higher level description languages.
* Using OpenCL (Open Computing Language), designed mainly for intensive computing, that can be spread simultaneously on GPGPU and CPUs.

When using HDL, tools are here to automatically computing the routing circuitry for you. Tools like VTR (Verilog To Routing) and NextPNR, optimize their placement for you.

Symbiyosys is a verification tool.

OpenFPGAloader is a FOSS software used to program/flash/fuse the FPGA itself.

You can do very interesting things only few hours after start study of HDL, it allows to design simple circuitry with code, near from some higher level languages, with variables, constants, registers, conditional tests and basic booleans transformations, additions and bits manipulations. A register is already itself a circuit of several, basic logical gates. They are generaly made as ASIC inside FPGA, to accelerate circuit and reduce power usage.

Verilog

So we choose here Verilog (Official IEEE Std 1364-2001 specifications), it is well spread with lot of tools, VHDL is also a very famous one, anyway it is always interesting to look at alternatives, that could have interesting aspects. I choose it as a starting language, because used in few projects and examples I’m interested to, and, there are all the FOSS tools needed to write, simulate, verify. There are also several light RISC-V implementations for embedded (SERV and PicoRV32 (Github) (including Sipeed Lichee Tang (20€ FPGA board) implementation (I actually choose a 12€ Sipeed Tang Nano 4K as first board, The Tang Nano (without 4K) cost 3€ but only have 1000 logical blocks. There are also the very convenient and efficient Verilator simulator and VTR (Verilog to Routing, route optimizer.

Values

By defaults, numbers are integers, it is possible to force the numbers of bits (before the single quote ' character) and choose signed (s character, signed meaning able to use minus sign) or unsigned (default). Values can be in b binary, o octal, h hexadecimal, and even force d decimal (case by default). The datas can also be given in string format.

32'hFFFF00FF  // 32 bits unsigned hexadecimal opaque yellow (full red+green+alpha) RGBA
8'b00101010   // 42 (8bit unsigned binary)
8'b0010_1010  // 42 (8bit unsigned binary) easier to read version
8'sb1010_1010 // -86 (8bit signed binary)
-8'sd42       // -42 (8bit signed decimal)
12'o0754      // -rwxr-x-r-- rights (12 bits unsigned octal) on Unix compatible filesystems.
"Hello"       // a character string

Constants can be defined by preceding them by simple quote followed by define 'define:

'define myvalue -8'sd42;

Types

There are two different class of types, network and logic. We cover here :
* wire, that is of network type, just implementing a wire between components.
* reg, that is a logical type, register, that can be set or unset to memorize values.

wire w; // a simple wire between 2 components called w
reg r0; // a 1 bit memory called r0 (Register 0)

Value of a wire is set dynamically by the components linked to it, but it is possible to hard set them a permanent value with the assign function:

assign a = 1;

There are also higher level types that will use several of previous items, as for example:

integer i,j; // Two signed 32bits integers called i and j
real num;    // one 64bits float called num
time t;      // one 64bits unsigned integer representing time (in clock ticks)
realtime rt; // one 64bits float representing time

About time, it is interesting to know that #integer can be used to wait n clock ticks. No semi-column (;) must be put after the value, instead the instruction that follow the delay is generally wrote on the same line. For example:

#20 out = a & b;// wait 20 ticks then compute out value from a AND b

Simple bit elements, wire and reg can be organized in vectors, their bits are noted from MSB (most significant bit) to LSB (less significant bit).

wire [7:0] Bus0; // a 8 bits bus called Bus0 connected to the system
reg [31:0] R0;   // a 32 bits register called R0

It is possible to use arithmetic expressions to define them, in this cases, using array (see bellow) is better:

reg[8*256:1] string;   // 256 bytes string, * to simplify usage, start from 1, as will not be accessed by bit
reg[16*16-1:0] sprite; // a 16×16 monochrome sprite, better to use -1 and start by 0

After these declaration, we can access to unique bits or groups of bits the following way:

Bus0[0];   // Access to LSB of Bus0
Bus0[7:5]; // Access to 3 MSB of Bus0 (defined as 8 bits from 0 to 7)

Verilog also allows to concatenate several bits ranges to another one. For example, here, to convert from 16 bit Little Endian to 16 bit Big Endian data format:

BigEndian[15:0] = {LittleEndian[7:0], LittleEndian[15,8]};

If the destination bus is 16 bits here, then [15:0] is not mandatory for the left part as we concatenate two 8 bits values).

And all this types can be organized in arrays.

Both complex types (in the general meaning, not mathematics complex numbers) and vectors can be organized in arrays.

reg[7:0] cursor[0:7];            // 8x8 pixels monochrome cursor bitmap, Here, cursor is the name
reg[15:0] sprite[0:15];          // 16*16 pixels monochrome sprite bitmap
reg[31:0] palette[0:15];         // 16 RGBA8 colours palette
reg[31:0] FrameBuffer[0:307199]; // 640×480 RGBA8 screen frame-buffer

Arrays are noted from their lower address to higher one.

Gates

Here is available logical gates (or operators) in the language, they are bitwise, but can be applied bit-to-bit to all bits of a bit vector:

~a     // NOT
a & b  // AND
a | b  // OR
a ^ b  // XOR
a ~^ b // XNOR, can also be wrote ^~

Gates Look-up tables

There are also two bitwise shift operators available:

>> // Right shift
<< // Left shift

Modules

Verilog is organized by module (like classes in object programming, or an integrated circuit on a PCB). A module has:
* A module name
* It’s interface between its inside and outside world (think to public variables or methods in object programing or IC pins in electronics), is a list of ports defined between parenthesis as for functions in a functional language module module_name (wire a, wire b, wire out); ... endmodule.
* By default, declaration are logic blocks, executed in parallel.
* For executing code sequentially and conditionally, always (and initial in testbenchs) sub-block must be used (see below for more details).

Simple example, writing an “and” gate

AND
Basic operators are already present in Verilog, but see here how it could be wrote. In Verilog boolean functions, the left argument is the output. So an and gate could be reimplemented this way:

module and(f,a,b); // The ; is not an error here
  output f;
  input a,b;

  assign f = a & b; // ^ is an AND gate
endomdule

This example should be put in an "and.v" file.
* The default type is wire
* Since Verilog 2005, input output can be put in the module header like this:

module and(output f, input a,b);

This example is a permanent unconditional gate, you can use them to do full circuitry, but Verilog is an high level language allowing to code more complex things a simple way. The always blocks is here for more complex gatewares.

Initial and always blocks

There are two types of blocks to know to start in Verilog:
* always block can be synthesized, so used in circuitry and executed each time conditions are met, so triggered by an event given between parenthesis always @(event).
* initial block can’t be synthesized, so used in testbench only and executed only one time and unconditionally. They allow to give initial values of the test. It is possible to have several initial blocks. This also allow to test/prototype quickly a simple function with text output, before put in a circuitry always block.

They both allow to:Posedge and negedge on clock signal
* execute code sequentially
* if else conditionals
* while, for, loops

A typical event to trigger an always can be, for example, a positive clock edge (posedge) or negative clock edge (negedge)

always @(posedge clk)
  ...
end

but can be a conditional logic test

always @(a or b)
  ...
end

Simple example with Verilator

A simple example given with Verilator:

module our (clk);
  input clk;  // Clock is required to get initial activation
  always @(posedge clk)
    begin $display("Hello World"); $finish;
  end
endmodule

Here, the always block is triggered at first positif clock edge (posedge), it print ($display) an “Hello World” string on the standard output, and finish ($finish) the simulation.

It is possible to test and validate lot of things with Verilator before going to further in FPGA process (Formal verification, converting to bitstream (that can be very slow with routing process) and flashing the FPGA). A good start is to make a copy of the above given example, and to modify the top.v file.

Make the test with Verilator

The example are installed on:
* Arch Linux or Manjaro: /usr/share/verilator/examples/
* Debian or Ubuntu: /usr/share/doc/verilator/examples/

So to test it, simply copy the directory to your home directory, where you will have write access, and then, build and execute with a simple make (Verilator transpile it in C and then compile it, this allow to have a very fast simulation):

On Arch Linux based distribution:

cp -a /usr/share/verilator/examples/make_hello_c ~/

On Debian based distribution:

cp -a /usr/share/doc/verilator/examples/make_hello_c ~/

Then on any distribution:

cd ~/make_hello_c
make

make allows to rebuild from the sources if needed, and then execute the testbench immediatly.

The output of the bench itself:

code>-- RUN ---------------------
obj_dir/Vtop
Hello World!
- top.v:11: Verilog $finish
-- DONE --------------------

The following output text (I didn’t paste here) is about compilation and suggestiong to use next tutorial.

Tracing example and GTKWave

GTKwave is companion tool to Verilator. It allows you to see the chronogram resulting from the simulation, to understand the temporal functioning of the circuit, and to see what could be the source of a possible problem. It uses the .vcd (Value Change Dump) file, for tracing. It is used as source to draw the chronogramme (see make_tracing_c example, output can be found in the logs/ subdir. The file is logs/vlt_dump.vcd, created after using the make command).

To use the tracing example with GTKwave, you need to install the gtkwave package, then, as for first example, duplicate the directory to a place you can write into:
* On Arch Linux based distribution:

cp -a /usr/share/verilator/examples/make_tracing_c ~/

* On Debian based distribution:

cp -a /usr/share/doc/verilator/examples/make_tracing_c ~/

Then on any distribution:

cd ~/make_tracing_c
make
gktwave logs/vlt_dump.vcd

GTKwave

The text output during the make phase give interesting informations too:

-- RUN ---------------------
obj_dir/Vtop +trace
[1] Tracing to logs/vlt_dump.vcd...

[1] Model running...

[1] clk=1 rstl=1 iquad=1234 -> oquad=1235 owide=3_22222222_11111112
[2] clk=0 rstl=0 iquad=1246 -> oquad=0 owide=0_00000000_00000000
[3] clk=1 rstl=0 iquad=1246 -> oquad=0 owide=0_00000000_00000000
[4] clk=0 rstl=0 iquad=1258 -> oquad=0 owide=0_00000000_00000000
[5] clk=1 rstl=0 iquad=1258 -> oquad=0 owide=0_00000000_00000000
[6] clk=0 rstl=0 iquad=126a -> oquad=0 owide=0_00000000_00000000
[7] clk=1 rstl=0 iquad=126a -> oquad=0 owide=0_00000000_00000000
[8] clk=0 rstl=0 iquad=127c -> oquad=0 owide=0_00000000_00000000
[9] clk=1 rstl=0 iquad=127c -> oquad=0 owide=0_00000000_00000000
[10] clk=0 rstl=1 iquad=128e -> oquad=128f owide=3_22222222_11111112
[11] clk=1 rstl=1 iquad=128e -> oquad=128f owide=3_22222222_11111112
[12] clk=0 rstl=1 iquad=12a0 -> oquad=12a1 owide=3_22222222_11111112
[13] clk=1 rstl=1 iquad=12a0 -> oquad=12a1 owide=3_22222222_11111112
[14] clk=0 rstl=1 iquad=12b2 -> oquad=12b3 owide=3_22222222_11111112
[15] clk=1 rstl=1 iquad=12b2 -> oquad=12b3 owide=3_22222222_11111112
[16] clk=0 rstl=1 iquad=12c4 -> oquad=12c5 owide=3_22222222_11111112
*-* All Finished *-*
- sub.v:29: Verilog $finish
[17] clk=1 rstl=1 iquad=12c4 -> oquad=12c5 owide=3_22222222_11111112

You can see here, each steps of the execution with the clock (clk) switching between 0 and 1 state.

The time is in tick, and in testbench, can be defined by timescale directive, that allow more precise resolution granular time. We don’t need to use it for now.

The .vcd (Value Change Dump) file is defined in top.v:

$dumpfile("logs/vlt_dump.vcd");

And you will have all the step in the logs/annotated/ dir and a coverage file in logs/coverage.dat:

-- COVERAGE ----------------
verilator_coverage --annotate logs/annotated logs/coverage.dat
Total coverage (2/31) 6.00%
See lines with '%00' in logs/annotated

-- DONE --------------------

The last one is defined in sim_main.cpp.

About Verilator examples

In each example, there is a small c++ code interface, modifying c++ code isn’t needed for basic testings. When basic concept are understood, c++ part can be tuned to interface with system libraries, to simulate devices communications, audio, graphics, etc.

In these simples starting examples, the most interesting ones for beginning are all available in Verilog:
* make_hello_c Verilog Hello World with Makefile
* make_hello_sc Verilog Hello World using SystemC with Makefile
* cmake_hello_c Verilog Hello World with cmake.
* cmake_hello_sc Verilog Hello World using SystemC with cmake.

SystemC is set of C++ classes and macros which provide an event-driven simulation interface, simulate concurrent processes in a real-time environment.

The other examples are (with their variants cmake|make and c|sc):
* make_tracing_c for a better understanding at start and for debugging purpose.
* make_protect_lib For creating a DPI protected library. DPI (Direct Programming Interface), is an interface between SystemVerilog and functions in a language like C or C++.

System Verilog is an advanced evolution of Verilog so it’s also a HDL and sometime view as a HVL (Hardware Verification Language). It add several types, including block types, has some C aspects, and can be programmed in object-oriented programming.

Some common file extensions with Verilog, System Verilog and related simulators are:
* .v as Verilog.
* .vc (verilog ???) or .f as File, used for large projects, containing arguments to give to Verilator (or other simulator), including, flags, includes directory, linked libs for the simulation.
* .vcd as Value Change Dump file, contains tracing output of simulation for analysis (see below).
* .vo as Verilog Output file.
* .sv as System Verilog,

Basic practical example with Verilator


So we will reuse the simple make example (you can choose cmake example instead in this case you will have to call cmake instead of make.

I copied make_hello_c example in my work directory:

cp -a /usr/share/verilator/examples/make_hello_c verilator_test
cd verilator_test

and then just edited top.v ( vim, emacs, gedit, or any text editor of your choice can be used) and replaced it’s content.

Example of simple module after my test, I would like to understand how to access to a simple (verilog) registers, different kind of datatypes, including sprite one (the one displayed beside the title of this section), you can download this version of top.v here:

module top;
 reg [8*11:1] str1;
 reg [8*25:1] str2;  // filled by spaces at left
 reg a,b,c;
 reg [7:0] sprite[0:7];  // use one byte by line
 reg [15:0] sprite2;  // sprite as bitfield
 integer i;

 initial begin
   str1 = "Hello World"; // string initialisation tests
   str2 = "Hello World";

   a = 1'b1; // bit initialisation tests
   b = a^1;
   c = a^0;

   sprite[0] = 8'b10011000; // sprites initialisation tests
   sprite[1] = 8'b00100100;
   sprite[2] = 8'b01000010;
   sprite[3] = 8'b10011001;
   sprite[4] = 8'b10011001;
   sprite[5] = 8'b01000010;
   sprite[6] = 8'b00100100;
   sprite[7] = 8'b00011000;

   sprite2[ 7:0] = 8'b10011000;
   sprite2[15:8] = 8'b00100100;

   $display ("str1 = %s", str1); // strings display
   $display ("str2 = %s", str2);

   $display ("a = %d", a);       // bits display
   $display ("b = a^1 = %d", b);
   $display ("c = a^0 = %d", c);

   for ( i=0; i<8; i=i+1) begin  // sprites display
    $display ("sprite[%2d] = %b",i,sprite[i]);
   end
   for ( i=0; i<2; i=i+1) begin  // try to read bitranges, including out of bounds
    $display ("sprite2[8*i(%1d) +: 8] = %8b",i,sprite2[8*i +:8]);
    $display ("sprite2[8*i(%1d) -: 8] = %8b",i,sprite2[8*i -:8]);
    $display ("sprite2[4*i(%1d) -: 8] = %8b",i,sprite2[4*i -:8]);
   end
 end 
endmodule

To test it, just, apply a make, here in blue are comment of the output added in blog post only:

make
-- VERILATE & BUILD -------- 
[...]  This part is the compilation log
-- RUN ---------------------   This part is interesting one
obj_dir/Vtop
str1 = Hello World
str2 =               Hello World    str2 with filled 25 chars
a = 1
b = a^1 = 0
c = a^0 = 1
sprite[ 0] = 10011000    Eight lines of the sprite
sprite[ 1] = 00100100
sprite[ 2] = 01000010
sprite[ 3] = 10011001
sprite[ 4] = 10011001
sprite[ 5] = 01000010
sprite[ 6] = 00100100
sprite[ 7] = 00011000
sprite2[8*i(0) +: 8] = 10011000      some bitrange access tests
sprite2[8*i(0) -: 8] = x0010010      We go out of registers here with x displayed
sprite2[4*i(0) -: 8] = x0010010
sprite2[8*i(1) +: 8] = 00100100
sprite2[8*i(1) -: 8] = 01001100
sprite2[4*i(1) -: 8] = xxxxx001

Further reading

Relatively complete documentation online:

About Verilog:
* A complete Verilog manual at Chip Verify
* Verilog TUTORIAL for beginners on ReferenceDesigner.com
* Verilog overview in few presentation screens (PDF) at euler.ecs.umass.edu
* Verilog Quick Reference Card.pdf
*
Verilog HDL Quick reference Guide
* French: Verilog syntax at hdl.telecom-paristech.fr

Testbenchs, SystemC and System Verilog:
* How to write a basic verilog TestBench, using a C++ template. The site FPGA tutorial has 3 main subjects, VHDL, Verilog & System Verilog.
* Tutorials about Systemverilog and SystemC.
* Some tutorials about System Verilog.
* French: System Verilog in 13 minutes on sen.enst.fr
* Verilog Simulation with Verilator and SDL using SDL library to simulate VGA graphics output, based on Verilator and some part of SystemVerilog.

Some interesting FPGA projects:
* From Nand To Tetris, all the step to do in few exercises a computer based on a 16 bit CPU, using only NAND gate (interesting game), then building a Tetris with this small system. Their use their own HDL, but there are already some port as examples, as this one in Verilog using Ikarus Verilog simulator (older free software simulator), one in Verilog for opensourced Lattice ICE40 FPGA (tested on an open harware Olimex board) and Opensource tools (YoSYS as synthesizer), in Verilog for De0-nano (Altera Cyclone IV FPGA).
*
Tang nano MIDI Sounder, a MIDI synthesizer, with nice 8 bits like sound.
* (Japanese) SERV RISC-V procesossor for Tang Nano
* (Japanese) Tetris for Tang Nano in Verilog
* ZipCPU a blog for this CPU with a nice Verilator tests bench examples, using C++ templates for generic testbench.
* UART, Serial Port, RS-232 Interface FPGA implementation (both in Verilog and VHDL).
* Consolite, a light console in Verilog for FPGA, assembly programmable. An emulator is also available.
* FloPoCoFPU for FPGA devcelopped by INRIA.

For Free software loving hackers:
* Apicula project to open Gow1n FPGA array bitstreams

Linux syscall and RISC-V assembly

Sample of RISC-V assembly code

Syscall in Linux kernel, is an interface to access to kernel basic functions. They are described in section 2 of man pages. The introduction is in man 2 syscall (indirect system call), and the list of functions are described in man 2 syscalls. ”’Update:”’ System Calls in lectures of official Linux kernel documentation including “Linux system calls implementation”, “VDSO and virtual syscalls” and “Accessing user space from system calls”

This article follow previous one about RISC-V overall progress and available tools to play with, I will try to make a short article here about Linux syscall usage and the RISC-V assembly case.

Table of Content

* Description section of the man page
* Getting the list of function and how to access them
* Passing parameters
* Function number and registers of return values
* Return values and error code
* Compiling and executing on virtual environment
* Update: Bronzebeard assembler and its baremetal environment for real hardware

Description section of the man page

* syscall() is a small library function that invokes the system call whose assembly language interface has the specified number with the specified arguments. Employing
* syscall() is useful, for example, when invoking a system call that has no wrapper function in the C library.
* syscall() saves CPU registers before making the system call, restores the registers upon return from the system call, and stores any error returned by the system call in errno(3).
* Symbolic constants for system call numbers can be found in the header file .

You can find here function, like access to files open/close/read/write/flush, access to sockets, ioctl, uid, gid, pid, messages, ptrace, restart system, etc…

Getting the list of function and how to access them

As far I know, now only a part of syscall functions are accessible easily in assembly, they are defined in /usr/include/unistd.h, and function numbers assigned in ABI are defined in /usr/include/asm-generic/unistd.h.

The more practical match I found is using /usr/include/asm-generic/unistd.h to see which function are available and there respective manpage for the function header definition. For example:
* asm-generic: #define __NR_read 63
* man 2 read: ssize_t read(int fd, void *buf, size_t count);

The ABI with RISC_V as defined in man 2 syscall in section Architecture calling conventions use the registers following 2 tables rules.

Passing parameters

The second table in this section of the man page shows the registers used to pass the system call arguments.

Arch/ABI      arg1  arg2  arg3  arg4  arg5  arg6  arg7  Notes
──────────────────────────────────────────────────────────────
riscv         a0    a1    a2    a3    a4    a5    -

Here are the arguments in the order of the function definition, for example, in read (63) function:

ssize_t      read  ( int fd, void *buf, size_t count );
a0(result) = a7(63)( a0(fd),  a1(*buf),    a2(count) )

For remember, 3 standard I/O file descriptors are STDIN=0, STDOUT=1, STDERR=2, the other are used when opened a file with open and closed by close.

So we set the arguments as this. x0 is the always 0 register:

        addi  a0, x0, 0       # Set STDIN as sources
        la    a1, buffer_addr # load address of helloworld
        addi  a2, x0, 3       # reaad 3 bytes

Function number and registers of return values

And the first one give the register in which put the function number, that will receive return value and errno (error value)

Arch/ABI    Instruction           System  Ret  Ret  Error    Notes
                                  call #  val  val2
───────────────────────────────────────────────────────────────────
riscv       ecall                 a7      a0   a1   -

So for the read function, we need to put 63 (as found in /usr/include/asm-generic/unistd.h in register a7, and registers and following registers will receive the return values, a0 will receive system call result, and a1 an error message (the errno value).

ssize_t read(int fd, void *buf, size_t count);
a0    =  63 (    a0,       a1,         a2)

So can set them as this:

        addi  a7, x0, 63     # set called function as read()
        ecall                # call the function

Return values and error code

After the man page of read(2):
* On success, the number of bytes read is returned
* On error, -1 is returned, and errno is set to indicate the error.

So we can test first the return value of a0 and if a0 < 0 then we jump to part for display the error message, else we can simply display a OK message. So for the branching part, RISC-V in its super reduced set only have < (lt) and <= (le) comparators, you just need to swap registers to compute > (gt) and >= (ge), but this avoid lots more of transistors.

    addi a3,x0,0           # x3=0
    blt  a1,a3, error_seq  # if x1<0 branch to error_seq

We so use here the syscall write function (64) defined as:
* asm-generic: #define __NR_write 64
* man 2 write: ssize_t write(int fd, const void *buf, size_t count);
* So, registers: a0=1 (STDOUT), a1=*buf, a2=count, a7=64 (function number)

And we will finish with exit() syscall, defined as:
* asm-generic: #define __NR_exit 93
* man 2 exit: noreturn void _exit(int status);
* So, registers: a0=return code, a7=93 (function number)

    la    a1, ok           # load address (pseudo code) of ok string
    addi  a2, x0, 3        # set length of text to 3 (O + K + \n)
    addi  a7, x0, 64       # set ecall to write function
    ecall                  # Call the function

    addi  a0, x0, 0        # set return code to 0 (OK) for exit (93) function
    j     end              # unconditional jump to end before quit

error_seq:
    la    a1, error        # load address (pseudo code) of error string
    addi  a2, a2, 0x30     # add 0x30 (0 ASCII code) to the error code
    sb    a2, 7(a1)        # put the (byte) value at position 7 of Error string (before \n)
    addi  a2, x0, 10       # set now length of our string
    addi  a7, x0, 64       # set ecall to write function
    ecall                  # Call the function

    addi  a0, x0, -1       # set return code to -1 (error) for exit (93) function
end:
    addi    a7, x0, 93     # set ecall to exit (93) funciton
    ecall                  # Call linux to terminate the program

.data:
ok:     .ascii "OK\n"
error:  .ascii "Error:  \n"

RISC-V Longan nano

Compiling and executing on virtual environment

If you don't have a RISC-V hardware (can be found as low as 3€ now), you need to have a cross compiler and qemu for emulating instructions, or a whole system installed.

Packages needed for compiling on ArchLinux x86 or ARM for example.

sudo pacman -S riscv64-linux-gnu-gcc  riscv64-linux-gnu-glibc riscv64-elf-binutils riscv64-elf-binutils riscv64-elf-gcc riscv64-elf-gdb

Newlib is a lightweight RV32 (RISC-V 32bits) lightweight library for bare metal that can be used instead of a whole GNU system on embedded devices with low memory capacity (as Longan nano, less than 8€ with screen, see picture below, or 3€ Sipeed RV): riscv32-elf-newlib.

I made a simple shell script to don't have to remember the commands to assemble the code from an x86 platform (work also on ARM or RISC-V one) that take the .s as argument:

name=$1
riscv64-linux-gnu-as -march=rv64imac -o ${name}.o ${name}.s
riscv64-linux-gnu-ld -o ${name} ${name}.o

You can add a strip but better to avoid it if you need to debug it:

riscv64-elf-strip --strip-all ${name}

And it can be executed on non RISC-V platforms by using qemu-riscv64, if it doesn't depend on libraries or if you have them installed, it allow you to test it without having a full RISC-V system installed, qemu is so fantastic. On ArchLinux it is available in package qemu-arch-extra:

qemu-riscv64 ${name}

And can be disassembled (will probably use different instruction than your assembly code, due to RISC-V assembly pseudo-instructions:

riscv64-linux-gnu-objdump -d ${name}

Bronzebeard assembler and its baremetal environment for real hardware

RISC-V Longan nano

Update: Bronzebeard is an assembler with light baremetal environment builder for RISC-V, GD32VF103 as Longan Nano (about 8€ with a screen, as pictured on this article pictures) and Wio (similar board with an added ESP8266 SoC). I made an AUR package of Bronzebeard, and someone made a Mandelbrot set demo in 918 bytes pure RISC-V assembly. You can find some other example in the source of Bronzebeard. gd32vf103inator is a set of tools for GD32V, to manage from a simple random text editor.

Bugs in firefox 52=>53 on ArchLinuxARM 32 bits (ARMv7h) and how to still use it

Firefox 53 currently doesn’t compile on ARMv7h, so only firefox 52 works on ALARM/armv7h, but as the 52 package is no more in git current version, ALARM compiling sysyem doesn’t compile it with updated dependencies (ICU moved from 58 to 59 and hunspell updated too. I didn’t managed to compile firefox-esr.

I compiled former version of this too libs to be able to make firefox 52 works again.
You can find both icu-58 and hunspell 1.5.4 packages here. PLEASE DON’T INSTALL THEM, you can reinstall firefox package itself if needed with pacman -U firefox-52.0.2-1-armv7h.pkg.tar.xz
instead unarc them in a directory like this :

cd /tmp
mkdir unarc; cd unarc
wget https://popolon.org/depots/ArchLinuxARM/firefox/52/hunspell-1.5.4-1-armv7h.pkg.tar.xz
wget https://popolon.org/depots/ArchLinuxARM/firefox/52/icu-58.2-1-armv7h.pkg.tar.xz
tar xf hunspell-1.5.4-1-armv7h.pkg.tar.xz # lot of errors with SCHILY.fflags will be displayed
tar xf icu-58.2-1-armv7h.pkg.tar.xz # lot of errors with SCHILY.fflags will be displayed
cd usr/lib
sudo rsync -a libhunspell-1.5.so* libicu.so.58* /usr/lib/
sudo rsync -a icu/58.2 /usr/lib/icu/
ldconfig

That’s done. You can now type firefox to launch it :)

Create aligned disk partitions to improve performances and reduce SSD

Align disk partitions on their cylinders, is, knowing that most frequent datas feet on a cylinder:

* Avoid to read 2 cylinders instead of one => gain time at read, reduce useless cache usage, gain bandewidth.
* Avoid to write 2 cylinders when only one is good enough => gain time, reduce useless cache usage, gain bandwith, reduce wearout
* For the previous reason, allow to grow greatly timelife of an SSD disk (number of writing cycles is limited on an SLC, more on an MLC, even more on a TLC, but the price of this last one is the lower by GB).

A simple method to know if your partitions are well aligned on cylinders

Launch cfdisk on the disk you want to optimize (replace /dev/sda, by the disk to optimize: /dev/sdb, /dev/sdc…):

cfdisk /dev/sda

If you see an asterisk (or star) at the most right of a partition ligne, this one is not aligned on cylinder. In my case, I have to remake all the partitions but sda3:

                           cfdisk (util-linux 2.20.1)

                              Disk Drive: /dev/sda
                       Size: 240057409536 bytes, 240.0 GB
             Heads: 255   Sectors per Track: 63   Cylinders: 29185

    Name        Flags      Part Type  FS Type          [Label]        Size (MB)
 ------------------------------------------------------------------------------
                            Pri/Log   Free Space                           1.05*
    sda1        Boot        Primary   ext4                             51158.98*
                            Pri/Log   Free Space                           1.22*
    sda2        Boot        Primary   ext4                             53686.01*
                            Pri/Log   Free Space                           0.41*
    sda3                    Primary   ext4                            135207.16
                            Pri/Log   Free Space                           2.62*





     [   Help   ]  [   New    ]  [  Print   ]  [   Quit   ]  [  Units   ]
     [  Write   ]

                      Create new partition from free space

How to create an unique partition, using the whole disk and well fited

Using the old usagemode of fdisk. I found this tip on an Ubuntu documentation.

Replace /dev/sdX by the disk on which you want to create aligned partition:

fdisk /dev/sdX
c
u
p
n
[return key]
[return key]
[return key]
[return key]
w

Et voilà, you will see a beautifull result on cfdisk

To create several aligned partitions: Choose the good option instead of the 4 time [return key] and redo the n (new), for each partition.

Date format problem with WordPress + qTranslate resolved

I found here, the solution at the display date problem, when using WordPress with qTranslate, a plugin to write and using post in several languages.

* Before: %A %e %B %Y
* After: Thursday August 1st, 2013

There is still a problem in chinese, the date is wrote in english.

There only to replace the double % in the <code>wp-content/plugins/qtranslate-*/qtranslate_utils.php</code> file

$strftime_parameters[] = '%%';

by a single one

$strftime_parameters[] = '%';