Part 1: The ARM Processor
Brain Pickard explains how anyone can program in ARM code.
Introduction
In these articles I hope to unravel the mystery of machine code programming on ARM RISC Computers which run RISC OS. The examples I will give will run on all versions of RISC OS and on ALL ARM chips. The ARM code assembler built into BASIC will be used since this is available to all users. I am assuming the reader has no prior knowledge and therefore I apologise to anyone if I seem to state the obvious from time to time. By the end of the series I hope you will be able to produce useful routines which can compliment your BASIC programming.
The Structure of a Computer
At its simplest the structure of a computer can be represented as shown below.
The Central Processing Unit (CPU) obtains data from the input, then modifies the data and sends it to the output. The memory is used for storing the program and any data required for later. The CPU uses two sets of wires to do the above. These are know as the address bus and the data bus. The CPU uses the address bus to send addresses of locations to allow data to be sent to or retrieved from that location. The location could be a part of RAM or input/output devices mapped to that location. Hence the CPU can get data from keyboards, or send data to the video chips etc as well as read or write to RAM. The data bus is used to transfer the actual data to and/or from the required address. In the ARM the data bus is 32 bits wide hence there are 32 wires so the ARM chip is said to be a 32 bit chip.
ARM better than Pentium?
It still seems amazing to me that all a computer does is add numbers. Even when running Desktop Publishers or Computer Aided Design Programs in a WIMP environment the CPU is only adding numbers, storing numbers or comparing numbers. So how does it seem to be so powerful?
The answer is in the speed it does these simple operations. In the first Archimedes A310 the ARM2 chip worked at about 4 million instructions per second (mips). Then came the ARM3, ARM610, ARM710 and StrongARM chips which all work at greater speeds. The StrongArm currently runs at approximately 233 mips . I say approximately because the CPU has to have a clock and this ticks‘ at 233 million ticks per second (Mhz). In the PC world chips run with clock speeds much greater than this in fact the latest are approaching 1 million million (giga) ticks per second!
So why is the ARM chip more efficient than other designs?
One reason is they have relatively few instructions hence they are a Reduced Instruction Set Computer (RISC) design. Most of the instructions can be executed in one clock cycle. Other Complex Instruction Set Computer (CISC) designs have many instructions most requiring many clock cycles. Another effect of the RISC design is low power consumption (hence no heat sinks or fans required!). We therefore have a powerful yet simple CPU which is relatively easy to program.
How a CPU works: Fetch Decode and Execute
A CPU understands its own machine code, made up of commands. These commands are not in English or any other easily recognisable language, but are in binary. If you tried to read a pure machine code language all you would see is a continuous string of Binary numbers! Our ARM CPUs run a machine code language called ARM code
Any CPU runs code using three stages:- Fetch, Decode and Execute.
- 1) Fetch The instruction is located in memory by placing its memory address on the address bus, then reading the instructions binary code on the data bus, and therefore loading the instruction.
- 2) Decode The CPU matches the instruction code to one in its internal dictionary‘ and hence recognises which instruction has been loaded.
- 3) Execute The final stage executes the instruction, so the operation required by the instruction is carried out.
Pipelining
At first sight this should take 3 clock cycles, one for each stage, but the ARM chip has a pipeline (fig 2).
Consider a set of instructions. At the start the first instruction is fetched (one clock cycle). Then in the next cycle this instruction is decoded while the next instruction is fetched. In the third cycle the first instruction is executed while the second instruction is decoded and a third instruction is fetched. The whole process then repeats until the end of the set of instructions. After the first few instructions the fetch, decode and execute stages for each instruction only effectively takes one clock cycle, hence speeding up the CPUs running of programs.
Barrel Shifting.
Another section in the ARM chip is called a Barrel Shifter. This enables data to be altered BEFORE the execution of a instruction. The data bits can be shifted to the left or right which in effect is multiplying or dividing the value of the data by a power of two.
Consider a data value of 7 which in binary is %00000111. (I will only show 8 bits but remember the ARM uses 32 bits) Shifting these bits three places to the left we get %00111000 which is the value 56 (7*2*2*2) For the following 5+7*8 to be calculated some CPUs would have to do the 7*8 first and store the result then add the 5 and store the result this could take 5 clock cycles or more. The ARM with the barrel shifter can shift the bit pattern of the 7 three places to the left in one clock cycle and then do the addition in then next cycle. So the ARM chip is well designed (but then you knew that already!!).
Registers
All CPU‘s require registers, these are like pigeon holes where data can be stored before any operation is carried out on it. Early CPU‘s like the BBC computers 6502 only had 3 registers only one of which could be used in arithmetic instructions. The ARM has 16 registers usually known as R0 to R15. Most of these are free for the programmer to use in any instructions. Just three are best left for special uses. These are
- R13 for stack operations (details in a much later part)
- R14 is the link register (it is used for storing return addresses in the construction of sub routines)
- R15 is the program counter (PC) and status register.
The program counter always contains the address of the next instruction to be fetched. The status register is a set of bit values which reflect/set the current state the CPU. The only bits we will use are the Negative, Zero, Carry and oVerflow (known as NZCV status flags).From now on I will always call R15 by its other recognised name PC.
ARM Instructions
But that‘s enough theory for now lets get programming! Even though machine code is not easily read directly we can make use of a language that will make programming easier. This language is called Assembler Language. Most of the instructions in assembler language are simple and straight forward
ADD R0,R1,R2 this adds R1 and R2 and places the answer in R0 MOV R4,R5 this would copy the value in R5 into R4 CMP R0,R2 this would compare the value in R2 with the value in R0
As you can see in each instruction has a three letter name followed by register names which are separated by commas. The answer in arithmetic commands is placed in the register directly following the name of the command. To produce the pure machine code we need to assemble it by running a program called an Assembler. Do not worry you won't have to buy one, it is built into BASIC. Just look at the first example program below.
The First Program
In all the examples no line numbers will be shown so in !Edit select Basic options->Strip line numbers on its iconbar menu This program is very simple and probably trivial, adding two numbers held in registers 1 and 2, giving the answer to register 0. The instruction would be ADD R0,R1,R2 but how do we write assembler code and produce machine code in BASIC?
Consider the following.
MODE 28 :REM If you have an A310 etc. try MODE 15 DIM mcode% 1024:REM This line reserves 1024 bytes of memory REM which start at position mcode% P%=mcode% :REM P% is a reserved variable which acts as a pointer REM while assembling the code REM we wish the code to start at mcode% [ :REM Note this is the square bracket (to the right REM of the P key). REM This tells BASIC to enter the ARM assembler, all REM commands from now are in ARM assembler. ADD R0,R1,R2 :REM Our ARM code instruction MOV PC,R14 :REM This instruction copies the value in R14 (link REM register contains the return address to return to REM BASIC) into the program counter hence REM jumping to the next BASIC line after running REM our ARM code program. ] :REM Leave the ARM code assembler and return to BASIC.
I have found that it easy for beginners to confuse the two processes of assembling a machine code program and running one. The above lines only assemble the code i.e. produce the 32 bit binary code the ARM chip understands.
To run the code we need to use the BASIC instructions CALL or USR. The difference between these two commands is similar to the difference between a Procedure and a Function in a High level language. CALL is used when data needs to be given to the ARM code routine. USR is used when a single integer answer is required from the ARM code routine. In this example we need an integer answer so USR is used.
To test our program we need to get some data into R1 and R2. This is made easy in BASIC since the integer variables values A% to H% are copied into R0 to R7.
INPUT"Enter an integer value "B% INPUT"Enter another integer value"C% REM These two lines therefore will give their values to R1 and R2. A%=USR(mcode%) :REM This line runs the ARM code starting at mcode% REM (our assembled code) returning the value in R0 to BASIC. PRINT"The answer is ";A% :REM Print the answer. END
So the whole program listing is
MODE28 DIM mcode% 1024 P%=mcode% [ ADD R0,R1,R2 MOV PC,R14 ] INPUT"Enter an integer value "B% INPUT"Enter another integer value"C% A%=USR(mcode%) PRINT"The answer is ";A% END
When running the above program you find at the top of the screen something like this
00008FD8 00008FD8 E0810002 ADD R0,R1,R2 00008FDC 1A0F00E MOV PC,R14
This is a listing generated by the assembler part of the program showing in the first column the actual memory address of the instruction. The middle column is the 32 bit ARM code for the instruction and the third column the instruction in Assembler code as typed in. All the values are in hexadecimal (base 16). This listing will become of more use as the programs become more involved.
Well that‘s it for this time, I hope you have not been put off by all the preamble but I feel some background knowledge at the start is valuable.
Problems to Solve.
- Modify the above to subtract the values R2 from R1 placing the answer in R0
- Modify the above to do the following R0=R1+R2+R3
- Modify the above to do the following R0=R1+R2-R3
- Modify the above to do the following R0=R1-R2-R3
Answers next time. Good luck.