From Python to CPU instructions: Part 1
This post, is what happens, when you get an intrusive thought about how your computer works, in the middle of the night.
This article is part 1 in a 2 part series. You can read part 2 here.
I’ve been programming for a while, but I recently realised how much I take for granted when it comes to what a computer actually does. I typically work in high-level languages like Python, where many low-level operations are abstracted away.
This has created a gap in my understanding. I know programs are just simple CPU instructions, but how those instructions evolve into complex software is unclear to me. To bridge this gap, I wrote a simple Python program to reverse a string and then implemented the same task in pure assembly.
The reason I believe this is important for me to learn, and for you to read about, is not because, I think everyone should be a systems level programmer. What I do feel is that the better you understand the tools you are using, the better you will be at using it. I’m a data engineer by trade, and most of the time, I use fairly simple techniques to load and transform data. However sometimes, I work with billions of rows, and the simple techniques I use, start failing. This is when my understanding for my tools come in handy. Because I have an understanding for what happens under the hood, it is easier for me to know which knobs to turn.
This blog is part of a two-part series. In the first part, we’ll compare the same program written in C and Python to reveal what Python hides from us. In the second part, we’ll dive into how the C program translates into CPU instructions, exploring what the computer does with our human-readable code.
The Program
The program will be a simple command-line tool that asks the user for a text string and returns the reversed version. I chose this because it’s straightforward in concept but involves slightly more complex operations like handling user input and string manipulation.
I’ll implement it in three languages:
1. Python: A high-level language I’m most proficient in.
2. C: A lower-level language, often described as “human-readable assembly” by one of my favorite tech influencers, Low Level Learning.
3. Assembly: As close to machine language as I’ll likely get.
Each language will demonstrate a different level of abstraction.
The Python program
The first program is only two lines and looks like the following
str_input = input("Enter the string you wish to revert: ")
print(str_input[::-1])
The first line uses the input function, and stores it in the variable str_input. The input function will print the string *Enter the string you wish to revert*, and returns the following input.
The second line does two things. The first thing is to access using the [::-1] the str_input as a list, and return the string in the reverse order. The second is the print function, which shows the result to the command line.
This is how the program looks on the command line:
> Enter the string you wish to revert: Hello World!
> !dlroW elloH
Conversion to C
Let’s convert these to lines to there C equivalent.
The first line
str_input = input("Enter the string you wish to revert: ")
becomes
char input[256]; // Define a list of characters of length 256 as input
printf("Enter the string you wish to reverse: \n"); // Prompt the user
fgets(input, sizeof(input), stdin); // Save the input to input
In Python, the input() function handles both prompting and capturing input. In C, I need to explicitly manage input size and types. Here, I’ve limited the input to 256 characters. I could avoid this limit using pointers, but that comes with its own set of trade offs.
The second line in Python:
print(str_input[::-1])
becomes
int len = 0;
for (int i = 0; input[i]!='\0'; i++) {
len++;
}
for (int i=0, j=len-1; i<=j; i++, j--) {
char c = input[i];
input[i] = input[j];
input[j] = c;
}
printf("%s\n", input);
This line has been split in to 10 lines of code (12 with empty lines). And 3 different operations.
in Python we reverse a text string by accessing it with [::-1], in C we have to implement a reversion algorithm. I chose to use the following algorithm:
The first thing we need to do is to find the length of our list of characters.
We do this by simply iterating through the list and adding 1 for each character in the list:
int len = 0;
// Loop through the list and stop when we find the first empty character '\0'
for (int i = 0; input[i]!='\0'; i++) {
len++;
}
Once we have the length of the list we can start our algorithm, which is as follows for the following example string.
We start with the by setting one variable to 0 and another to the length of the input string - 1. We do this to have a pointer to the first and last character in the string.
We then swap the two characters in 3 steps. See the picture below to follow the explanation
1. Move the first character to a temporary variable (named c in our code).
2. Move the last character to the location of the first character
3. Move the first character from the temporary variable to the location of the last character.
We then move the pointers to the second and second last characters and repeat.
We keep doing this as long as the the pointer for the first character is smaller than the pointer for the end character, ensuring that we do not double swap any characters.
This algorithm is implemented in the following C code.
for (int i=0, j=len-1; i<=j; i++, j--) {
char c = input[i];
input[i] = input[j];
input[j] = c;
}
We have now swapped the characters for the input characters and end up with the following list.
In the python program, the reverse algorithm is prewritten. In C, I have to make a conscious decision, on how I want to do it.
The last thing the python program does, is to print the new text string to the screen, this is done with the following C code:
printf("%s\n", input);
This looks a lot like what python gives us, and is because printf is an abstraction, which we get from the C standard IO library.
The complete C program looks like this:
#include <stdio.h>
int main() {
char input[256];
printf("Enter the string you wish to reverse: \n");
fgets(input, sizeof(input), stdin);
int len = 0;
for (int i = 0; input[i]!='\0'; i++) {
len++;
}
for (int i=0, j=len-1; i<=j; i++, j--) {
char c = input[i];
input[i] = input[j];
input[j] = c;
}
printf("%s\n", input);
return 0;
}
Conclusion
The goal here isn’t to say one language is better than another. Python simplifies many tasks by abstracting low-level details, while C requires a more hands-on approach. My point is to highlight how much happens “under the hood” in high-level languages. Understanding these underlying processes can help us make better decisions as developers, regardless of the language we’re using.
Thank you for reading, in part 2 we will dive into how our C program could be translated into CPU instructions, using assembly.
Part 2 is now out, and can be read here: