From Python to CPU instructions: Part 2
This post, is what happens, when you get an intrusive thought about how your computer works, in the middle of the night.
In the last post, From Python to CPU instructions: Part 1, I showed how we could rewrite our Python program in C. We learned about what Python gives us, but what we have to implement manually in C. In this post, I will rewrite our C program in assembly language.
I’m not doing this to teach anyone assembly, because: 1) you probably shouldn’t learn it, no matter how smart you are, and 2) I’m definitely not smart enough to teach anyone assembly. However, I want you to gain a deeper understanding of how the computer executes the instructions we write in code. Understanding the few basic operations a CPU can perform will help you build a strong foundation in computer science. In fact, no matter how complex or magical a new technological breakthrough seems, it’s all just 1’s and 0’s.
How a CPU works
Before we start rewriting the C program in assembly, let’s first look at how a simplified CPU could work. Bear in mind, this is a simplified version, meant only for educational purposes. In reality, there are a lot more subtleties, but that’s outside the scope of this article.
A CPU has a list of registers where it stores data it wants to use. Before the CPU can perform any operation on data, that data has to be loaded into one of these registers.
Let’s say we want to add the numbers 1 and 2. We could use the following set of assembly instructions:
mov X1, 1 // Write the number 1 to register X1
mov X2, 2 // Write the number 2 to register X2
add X1, X1, X2 // Write the result from adding X1 and X2, to X1
Now the value in X1 would be 3.
Saving information in assembly
When we want to work with data that’s more complex than small numbers, like text, we can save this information in the data section of our assembly program. This section is part of the final executable file, and it stores data that doesn’t change during the program’s execution. EDIT: I wrote that this data does not change during the program execution, this is not correct. The data section holds information which can be changed during execution. If we wanted to make sure the data could not change, we would add it to the .rodata section. Thank you to reddit user HyperWinX for letting me know.
For example, let’s define the following data section:
.data
msg:
.ascii "Hello World"
msg_len = .-msg
First, we define that this is the data section. Next, we define two variables:
1. msg: an ASCII-encoded text string "Hello World".
2. msg_len: the length of the string in bytes.
Now, let’s say we want to load this message into the X1 register. We could do this using the following assembly instructions:
adrp X1, msg@page
add X1, X1, msg@pageoff
Here’s what’s happening:
1. The adrp instruction tells the CPU to load the page address of the msg data into the X1 register.
2. The add instruction then adds the page offset of msg to X1, giving us the exact memory address of msg.
To better understand this, it’s important to know that memory is organised into pages, which are further divided into slots where data is stored. The size of these pages and slots varies between different systems. For example, our msg data might be located on page 2, slot 3 in memory.
Revisiting the assembly code above:
adrp X1, msg@page // Tell our assembly code that msg is on page 2
add X1, X1, msg@pageoff // Tell it that it is in slot no. 3
Moving beyond simple addition
So far, so good, but how does the data actually get shown on the screen? This is where syscalls come into play. I call it “magic” because, honestly, I don’t fully understand how it works yet. The operating system designers decided that writing specific values to certain registers will make the computer do certain things. On my MacBook, the important registers for syscalls are:
• X16: Specifies the action you want to perform.
• X0-X5: Hold the arguments needed for the syscall (e.g., what data to write).
For example, to write the msg variable from our data section to the screen, we use the following assembly code:
mov X16, 4
mov X0, 1
adrp X1, msg@page
add X1, X1, msg@pageoff
mov X2, msg_len
svc 0
Here’s what this does:
1. X16 = 4: Tells the OS we want to perform a “write” syscall.
2. X0 = 1: Specifies that the output will go to the console (standard output).
3. X1: Holds the location of msg (using its page and slot address).
4. X2: The number of bytes to write (22 bytes, since “Hello World” is 11 characters in ASCII).
5. svc 0: Executes the syscall, writing the data to the screen.
With this knowledge, we now have the foundation to rewrite our C program in assembly.
From C to assembly
The C program we wrote in our last post, looks like this.
#include <stdio.h>
int main() {
char input[256];
printf("Enter the string you wish to reverse: \n");
fgets(input, sizeof(input), stdin);
int len = 0;
for (int i = 0; input[i]!='\0'; i++) {
len++;
}
for (int i=0, j=len-1; i<=j; i++, j--) {
char c = input[i];
input[i] = input[j];
input[j] = c;
}
printf("%s\n", input);
return 0;
}
Lets start the with the following line.
int main() {
In C, this tells the compiler where the program begins its execution—right at the main function. The operating system knows to look for this function when starting the program.
In assembly, there isn’t a function like main. Instead, we need to manually define where the program should start. We do this by specifying a label called _start and making it globally accessible using this line:
.global _start
This tells the assembly code that the execution should begin at the _start section. When the program runs, the CPU will begin executing instructions from the address labeled _start.
The next thing our C program does is define two variables. The first is straightforward.
char input[256];
This creates a 256-byte array to store the user’s input. But there’s another variable hiding in plain sight—the text inside the printf function:
printf("Enter the string you wish to reverse: \n");
That string is also a form of data that needs to be stored, but instead of dynamically allocated memory, we place it in the data section of our assembly program. Here’s what the data section looks like:
.data
msg:
.ascii "Enter the string you wish to reverse: \n"
msg_len =.-msg
input:
.ds 256
input_len = .-input
In this section, we define two variables: msg for the prompt string and input for storing user input, just like in the C program. Now that we have the data ready, we can move on to writing the instructions that the CPU will execute.
The first step is to prompt the user. In assembly, we do this with a system call to write the message to the console.
_start:
mov X0, 1
adrp X1, msg@page
add X1, X1, msg@pageoff
mov X2, msg_len
mov X16, 4
svc 0
This code looks similar to the system call we discussed earlier. The _start: label tells the CPU where to begin execution, and the rest of the code sends our message to the console.
Next, we need to capture the user’s input. In C, this is done with fgets. In assembly, we use the read syscall to achieve the same result:
mov X16, 3 // tell the operating system we want to do a read operation
mov X0, 0 // tell the OS we want to read console input
adrp X1, input@page // tell the OS on what page to write the input
add X1, X1, input@pageoff // and where on the page
mov X2, 256 // tell the operating system to write 256 bytes
svc 0 // execute the system call
Just like with the write operation, this code tells the OS to store the user’s input in the input array in memory.
Once we’ve captured the input, we can start reversing the string. The reversal algorithm is the same as in our C program, but in assembly, we don’t calculate the string’s length—we loop through all 256 bytes of the input.
The idea is simple: What we need to do is store 2 pointers (a pointer is just an address in memory, where we can retrieve data) in 2 registers. We then need to copy the data in stored where the pointers point, in two new register, and swap the values.
When we have done this. We will increase the pointer to the start of the text string by 1, and decrease the pointer to the end of the text string by 1. Then we repeat the operation.
This loop keeps running until the pointer from the front (X4) passes the pointer from the back (X6). At that point, the string is reversed.
// Reverse string
adrp X1, input@page
add X1, X1, input@pageoff
mov X2, 256
add X4, X1, 0 // add the first byte from X1 to X4
ldrb w5, [X4] // store the value from X4 to w5 (since we know we only need 1 byte, we can use a smaller memory address)
add X6, X1, X2 // add the from register X1 the number of bytes from X2 to register X6
loop:
sub X6, X6, 1 // Move pointer for X6 to the next letter from the back
// Store the letters to two new registers
ldrb w5, [X4] // Move the value from X4 into w5
ldrb w7, [X6] // Move the value from X6 into w7
// Swap the letters at each end with the letters stored in the new registers
strb w5, [X6] // Move value in w5 to X6
strb w7, [X4] // Move value in w7 to X4
// Go to the next letter from the front
add X4, X4, 1 // Go the next letter from the front
cmp X4, X6 // Compare if the pointer from the front has passed the pointer from the end,
b.lt loop
Finally, we need to output the reversed string. Since we already know how to use the write syscall, the code is almost identical to the earlier prompt:
// Echo Ouput
mov X0, 1
adrp X1, input@page
add X1, X1, input@pageoff
mov X2, 256
mov X16, 4
svc 0
And when we’re done, we need to let the operating system know the program has finished successfully. We do this with the exit syscall:
// Exit program with code 0 (succes)
mov X0, 0 // Set the return code to 0
mov X16, 1 // tell the operating system we want to call SYS_EXIT
svc 0 // execute the sys call.
In the end, our assembly program may look complex at first glance, but it’s really just a series of straightforward instructions that the CPU executes step by step.
.global _start
_start:
// Write msg to console
mov X0, 1
adrp X1, msg@page
add X1, X1, msg@pageoff
mov X2, msg_len
mov X16, 4
svc 0
// Read input from console
mov X0, 0
mov X2, 256
adrp X1, input@page
add X1, X1, input@pageoff
mov X16, 3
svc 0
// check string Length
// Reverse string
adrp X1, input@page
add X1, X1, input@pageoff
mov X2, 256
add X4, X1, 0 // add the first byte from X1 to X4
ldrb w5, [X4] // store the value from X4 to w5 (since we know we only need 1 byte, we can use a smaller memory address)
add X6, X1, X2 // add the from register X1 the number of bytes from X2 to register X6
loop:
sub X6, X6, 1 // Move pointer for X6 to the next letter from the back
// Store the letters to two new registers
ldrb w5, [X4] // Move the value from X4 into w5
ldrb w7, [X6] // Move the value from X6 into w7
// Swap the letters at each end with the letters stored in the new registers
strb w5, [X6] // Move value in w5 to X6
strb w7, [X4] // Move value in w7 to X4
// Go to the next letter from the front
add X4, X4, 1 // Go the next letter from the front
cmp X4, X6 // Compare if the pointer from the front has passed the pointer from the end,
b.lt loop // if it has, then break the loop
mov X16, 4
svc 0
// Echo Ouput
mov X0, 1
adrp X1, input@page
add X1, X1, input@pageoff
mov X2, 256
mov X16, 4
svc 0
// Exit program with code 0 (succes)
mov X0, 0
mov X16, 1
svc 0
.data
msg:
.ascii "Enter the string you wish to reverse: \n"
msg_len =.-msg
input:
.ds 256
input_len = .-input
Conclusion
In conclusion, rewriting our C program in assembly isn’t about becoming an assembly expert—it’s about pulling back the curtain on what’s really happening when your computer runs code. Sure, assembly looks complicated, but at the end of the day, it’s just a bunch of simple instructions telling the CPU what to do, one step at a time. By getting a taste of how registers, memory, and syscalls all fit together, you get a clearer picture of how computers work at their core. And while you’ll probably never need to write assembly in your day-to-day coding, understanding it can give you a solid foundation to tackle performance issues, system-level programming, or just satisfy your curiosity about what’s really happening under the hood. It’s all just 1’s and 0’s, after all!