Arduino Compiler Optimizations for Faster & Smaller Code
How to use compiler optimization options with the Arduino IDE.
The goal of Arduino is to make microcontroller development as easy as possible so that the programmer can concentrate on his or her goals instead of spending time on setting up tool chains and configuring hardware at the register level. Arduino provides easy-to-use function libraries that hide the complexity of low-level hardware access like configuring the serial port to make it work at the desired speed.
Arduino also hides the toolchain, which consists of the assembler, the compiler, and the linker and also the programmer. By default, Arduino only tells you if building the sketch succeeded or not. To get more information about the build process you must set some options in the ‘Preferences’ dialog. For most people this is just fine, as this is exactly what they want.
However, as people become “better at Arduino” and their sketches grow bigger or get more demanding, they may run into problems related to execution speed or program size or even both.
And what to do when the sketch can’t meet the timing you need? Again, replacing the board is a possible solution, but this time by a board with a faster MCU, for instance a Due or a Zero. But these boards are more expensive and not 100% compatible.
Arduino uses gcc as toolchain, and gcc features optimization levels. They range from ‘0’ (zero), which optimizes compilation time, to ‘3’, which is the most aggressive level. There is also level ‘s’ that optimizes for code size. Often, but not always, small code is also fast, and this is the optimization level used by Arduino.
Finally, there is the ‘fast’ level, which is like level ‘3’ plus a few optimizations that are not valid for all standard-compliant programs. This means that it is cool to use for hobby projects, but probably not in life-saving equipment.
To see in the IDE what we are doing, we must activate ‘verbose output during compilation’ in the 'File -> Preferences' dialog.
The optimization level must be added to the build command invoked by the IDE. It is specified in the file
This file is used for the standard Arduino boards; boards that come with their own boards package have their own ‘platform.txt’ file.
The Arduino optimization level is ‘-Os’. Replace all three occurrences of this parameter by your variable name nested in curly braces, like this:
Save the file.
A pre-modified platform.txt file is attached below.
Arduino IDE version 1.8.12 reads ‘platform.txt’ every time you click compile, which makes experimenting with it much quicker. If your IDE does not do this (previous versions), you must restart it every time you change the file, which is tedious.
Level zero clearly is not the best level.
Here are my results as a table. I removed the lines that had more or less the same results on all levels. Complete results are available from the download section.
Memory sizes are in bytes, all other values are in microseconds (µs). Level -O0 is not used in comparisons as it is worst on every line and the compiler says it should not be used. Program memory size is too big to fit in 32 KB.
As we can see, the default ‘s’ level does pretty well. It produces the smallest code size and it is quite fast and seems very similar to level 2. Levels ‘3’ and ‘fast’ show better performance for digitalRead, digitalWrite and analogWrite.
Another interesting thing to notice is that the delay functions have best precision on the levels ‘1’ and ‘fast’.
To understand the reason for this the source code of the random function must be analysed.
If I have the right code, I got it from the AVR Libc Home Page, it shows integer maths with divides and remainders. As this is typical stuff that is used in many places, it might indicate that in general the ‘s’ option may produce slower code.
Speed-wise there may be something to gain, however, but you may get better results by first optimizing your code with faster algorithms and maybe even using assembly language.
In general, it is not a good idea to make your code depend on a compiler optimization option as you have no control over it and makes your code compiler version dependant and hard to port. It may break with the next version.
Twitter: @clemens_elektor
Arduino also hides the toolchain, which consists of the assembler, the compiler, and the linker and also the programmer. By default, Arduino only tells you if building the sketch succeeded or not. To get more information about the build process you must set some options in the ‘Preferences’ dialog. For most people this is just fine, as this is exactly what they want.
However, as people become “better at Arduino” and their sketches grow bigger or get more demanding, they may run into problems related to execution speed or program size or even both.
Sketch too big?
What to do when the sketch doesn’t fit in the microcontroller’s memory? Change the board? Replacing the Uno by a Mega can be a solution as the Mega has eight times more program memory. But it is also bigger.And what to do when the sketch can’t meet the timing you need? Again, replacing the board is a possible solution, but this time by a board with a faster MCU, for instance a Due or a Zero. But these boards are more expensive and not 100% compatible.
Do a code inspection
Before you decide to change the board, you must do a thorough code inspection to see if it can’t be optimised in some way so that it will fit or run fast enough.gcc compiler options
But there are also compiler optimisations that you can try first to see if they can help you out quickly. These options are not accessible through the IDE, but that doesn’t mean that they cannot be used at all.Arduino uses gcc as toolchain, and gcc features optimization levels. They range from ‘0’ (zero), which optimizes compilation time, to ‘3’, which is the most aggressive level. There is also level ‘s’ that optimizes for code size. Often, but not always, small code is also fast, and this is the optimization level used by Arduino.
Finally, there is the ‘fast’ level, which is like level ‘3’ plus a few optimizations that are not valid for all standard-compliant programs. This means that it is cool to use for hobby projects, but probably not in life-saving equipment.
Benchmarking optimization levels
To see the effects of gcc optimizations I downloaded the sketch Arduino_Speed_Tests. This sketch measures the execution times of some common functions and outputs the results in microseconds. It is also quite big so we can compare the influence on code size too.To see in the IDE what we are doing, we must activate ‘verbose output during compilation’ in the 'File -> Preferences' dialog.
The optimization level must be added to the build command invoked by the IDE. It is specified in the file
[arduino.exe]\hardware\arduino\avr\platform.txt
This file is used for the standard Arduino boards; boards that come with their own boards package have their own ‘platform.txt’ file.
Edit platform.txt
After opening the file in a text editor, we can change the optimization level. It must be changed at three places. To make things easier you can create a variable for it so that the optimization level is defined in only one place. Since there are six levels to try, create six lines of which five are commented out:# optimize_level = '-O0'
# optimize_level = '-O1'
# optimize_level = '-O2'
# optimize_level = '-O3'
optimize_level = '-Os'
# optimize_level = '-Ofast'
# optimize_level = '-O1'
# optimize_level = '-O2'
# optimize_level = '-O3'
optimize_level = '-Os'
# optimize_level = '-Ofast'
The Arduino optimization level is ‘-Os’. Replace all three occurrences of this parameter by your variable name nested in curly braces, like this:
-Os -> {optimize_level}
Save the file.
A pre-modified platform.txt file is attached below.
Arduino IDE version 1.8.12 reads ‘platform.txt’ every time you click compile, which makes experimenting with it much quicker. If your IDE does not do this (previous versions), you must restart it every time you change the file, which is tedious.
Optimization level '-O0'
Clicking the the ‘Upload’ button in the IDE with the 'O0' level activated will give an error as the sketch is too big. There is also a warning:# warning "Compiler optimizations disabled; functions from utils/delay.h won't work as designed"
Level zero clearly is not the best level.
Other optimization levels
Wait until uploading is done and the open the Serial Monitor at 9,600 baud. The tests are executed and the results will scroll by slowly. After delayMicroseconds(100) the sketch is done and you can inspect the results.Here are my results as a table. I removed the lines that had more or less the same results on all levels. Complete results are available from the download section.
Memory sizes are in bytes, all other values are in microseconds (µs). Level -O0 is not used in comparisons as it is worst on every line and the compiler says it should not be used. Program memory size is too big to fit in 32 KB.
-O0 | -O1 | -O2 | -O3 | -Os | -Ofast | |
---|---|---|---|---|---|---|
Program memory | 38858 | 22188 | 20920 | 32208 | 20730 | 31834 |
Dynamic memory | 382 | 240 | 240 | 240 | 240 | 232 |
digitalRead | 14.074 | 5.597 | 5.097 | 3.96 | 4.902 | 3.962 |
digitalWrite | 12.592 | 4.5 | 4.502 | 3.24 | 4.532 | 3.24 |
pinMode | 14.072 | 4.405 | 4.282 | 2.707 | 4.342 | 2.705 |
random() | 141.85 | 96.837 | 50.312 | 50.287 | 91.287 | 50.312 |
analogRead() | 148.75 | 111.987 | 111.937 | 111.987 | 111.987 | 111.987 |
analogWrite() PWM | 55.54 | 7.607 | 6.417 | 4.277 | 6.602 | 4.277 |
delay(1) | 1059 | 1006.487 | 1003.987 | 1000.487 | 1007.487 | 999.987 |
delay(100) | 100035 | 99999.984 | 99974.984 | 100024.984 | 100024.984 | 99999.992 |
delayMicroseconds(2) | 40.02 | 1.889 | 0.757 | 0.757 | 0.757 | 0.758 |
delayMicroseconds(5) | 43.048 | 4.909 | 3.775 | 3.776 | 3.775 | 3.776 |
delayMicroseconds(100) | 138.9 | 100.537 | 99.337 | 99.337 | 99.287 | 99.337 |
As we can see, the default ‘s’ level does pretty well. It produces the smallest code size and it is quite fast and seems very similar to level 2. Levels ‘3’ and ‘fast’ show better performance for digitalRead, digitalWrite and analogWrite.
Another interesting thing to notice is that the delay functions have best precision on the levels ‘1’ and ‘fast’.
Random
The function random is much slower for level ‘s’, even though the basic math functions seem to perform the same on all levels.To understand the reason for this the source code of the random function must be analysed.
If I have the right code, I got it from the AVR Libc Home Page, it shows integer maths with divides and remainders. As this is typical stuff that is used in many places, it might indicate that in general the ‘s’ option may produce slower code.
static long
do_random(unsigned long *ctx)
{
/*
* Compute x = (7^5 * x) mod (2^31 - 1)
* wihout overflowing 31 bits:
* (2^31 - 1) = 127773 * (7^5) + 2836
* From "Random number generators: good ones are hard to find",
* Park and Miller, Communications of the ACM, vol. 31, no. 10,
* October 1988, p. 1195.
*/
long hi, lo, x;
x = *ctx;
/* Can't be initialized with 0, so use another value. */
if (x == 0)
x = 123459876L;
hi = x / 127773L;
lo = x % 127773L;
x = 16807L * lo - 2836L * hi;
if (x < 0)
x += 0x7fffffffL;
return ((*ctx = x) % ((unsigned long)RANDOM_MAX + 1));
}
do_random(unsigned long *ctx)
{
/*
* Compute x = (7^5 * x) mod (2^31 - 1)
* wihout overflowing 31 bits:
* (2^31 - 1) = 127773 * (7^5) + 2836
* From "Random number generators: good ones are hard to find",
* Park and Miller, Communications of the ACM, vol. 31, no. 10,
* October 1988, p. 1195.
*/
long hi, lo, x;
x = *ctx;
/* Can't be initialized with 0, so use another value. */
if (x == 0)
x = 123459876L;
hi = x / 127773L;
lo = x % 127773L;
x = 16807L * lo - 2836L * hi;
if (x < 0)
x += 0x7fffffffL;
return ((*ctx = x) % ((unsigned long)RANDOM_MAX + 1));
}
Sounding off
So, concluding we can say that the default Arduino optimization level ‘s’ performs rather good, especially when it comes to code size. If your sketch doesn’t fit with ‘s’, then there is not a lot you can do besides taking a board with an MCU that has more memory.Speed-wise there may be something to gain, however, but you may get better results by first optimizing your code with faster algorithms and maybe even using assembly language.
In general, it is not a good idea to make your code depend on a compiler optimization option as you have no control over it and makes your code compiler version dependant and hard to port. It may break with the next version.
Twitter: @clemens_elektor
Updates vom Autor