The Three ARM Cortex-A (ARMv7) SoC generations and some GCC optimizations options for

I started to write this post during 2013 autumn, some changes come since, they are in bold in the text.

Some references:
* ARM Compiler toolchain Assembler Reference on ARM documentations site.
* GCC ARM Options, that is also in the gcc manpage.

Translation not finished, work in progress

There are 3 series in ARM architecture

To start ARMv7 architecture, that is the last one from 32 bits architectures of British ARM company. This architecture exists in 3 versions:
* Cortex-A (meaning appliance), computer processors (smartphones, tablets, personal computer, servers).
* Cortex-M (meaning microcontroler), microcontoler for embedded systems (domotic, electronisc…).
* Cortex-R (meaning real-time), for realtime world (robotics, transportation, etc…).

The next ARM architexture, ARMv8 (or AARM64 in Linux world), is a 64-bits architecture, the alter-ego of Cortex-A serie in ARMv8 is Cortex-A50 serie.

Cortex-A three generations

Cortex-A serie we focus on in this post, are divided in three generations. At each generation, new functionnalities are added, energy efficiency improved, and powerness of the most powerfull processor of its generation grown.

On energy efficient versions, some lose are made on computing power, by reducing pipelines for example, or the total number of registers. In all cases, total compatibility is kept between processors of the same generation, but if a piece of software is optimized for one of them, it will probably be less optimal for another one.

* The first generation was limited to Cortex-A8, with only one CPU core, it updates ARMv6 SIMD to NEON (also called advanced SIMD), change the Vector floating point unit to VFP3, but in a light version (ten times slower than next generations), add Thumb-EE and improve Thumb2, allowing him to use 16bits instruction to make code more compact, meaning, more efficient in caches and bandwith.
* The second generation add multiprocessor (or multicore) support, this one supporting onl one kind of processor at one time. There is still VFPv3, but in full version. This generation include Cortex-A9, the most powerfull one (called Cortex-A9 MP-Core) and Cortex-A5, the lower power version.
* The third generation, add hardware virtualization, LPAE, allowing an extended 40 bits (until 1 TB) addressing range, because with 32 bits addressing, only 4 GB (232= 22 × 210 × 210 × 210 = 4 × 1024 × 1024 × 1024 = 4194304 bytes), there are 3 versions, Cortex-A15, the more powerfull, Cortex-A7, more energy efficient, and the futur Cortex-A12, of intermediate power and energy consumption (Cortex-A12 is dropped for Cortex-A17, that has better computing power with far less electric energy. It is used for example in the top of the power list Rockchip RK3288 that use 4 Cortex-A17 cores. Finally, the big.LITTLE architecture is added, allowing to put together on the same chip different power class processors (like A7 and A15) to improve energy efficiency, when there is no need to power, ans improve computing power, when this is needed. The floating point unit is updated to VFPv4.

Leave a Reply