2025-09-15 8:42 PM
I am writing some firmware for a device with an alpha-numeric display. The display language must be changable. For most languages, this means using characters outside the A-Z range. UTF-8 covers most of the necessary characters.
In the project properties, I found this setting, and naively thought that the UTF-8 selection included the actual compiler:
Well, obviously not, as it is GCC that needs to know that the character set is to be UTF-8. The code
sprintf(pString, "Français");
is translated to
sprintf(pString, "Français");
(It might not be super-visible, but the ç in Français is a c with a cedilla, which has the UTF-8 code 0xE7.)
I tried using the flags -finput-charset=utf-8 and -fexec-charset=utf-8 in an attempt to tweak GCC to use UTF-8 for strings and characters, but that didn't make a difference. Also, I have a hunch that these flags only work for C++, but I'm not sure.
Is what I'm trying to do even possible?
I do have the option to replace ç with \xE7, but with loads of text for several languages, and numerous "special" characters, this becomes rather cumbersome. I have enough trouble with the Polish characters as it is...
Solved! Go to Solution.
2025-09-16 2:53 AM
Oh ***...
As it turns out, I didn't know enough about UTF-8, and simply relied on the tables I found. When digging a bit, however, a greater complexity turns up.
But that's perfectly fine. I have a function that converts UTF-8 to the rather odd codepage in the display. I will just need to expand that to handle some more conversion.
2025-09-16 1:32 AM
CubeIDE and GCC use UTF-8 by default. I guess that some other piece of software you use for editing your files doesn't recognize your source text as UTF-8 and uses some other extended ASCII encoding, so your Unicode non-ASCII chars are displayed as sequences of 2 or more characters.
2025-09-16 1:51 AM
I don't use any other software for editing. STM32CubeIDE only.
I extracted this from the .hex file: 4672616EC3A7616973
This shows that the compiler interprets the ç as byte values 0xC3 0xA7 rather than 0xE7. This makes no sense to me, as these values have nothing to do with ç.
2025-09-16 2:03 AM
Update: I find this interesting.
If I copy the text (on screen) from Cube into a hex editor, I get what I expected: 00000000h: 46 72 61 6E E7 61 69 73 ; Français
However, if I open file.c file in the hex editor, I get this: 0000021ah: 46 72 61 6E C3 A7 61 69 73 ; Français
In my opinion, this shifts the blame from GCC to Cube. The file isn't saved as UTF-8.
2025-09-16 2:53 AM
Oh ***...
As it turns out, I didn't know enough about UTF-8, and simply relied on the tables I found. When digging a bit, however, a greater complexity turns up.
But that's perfectly fine. I have a function that converts UTF-8 to the rather odd codepage in the display. I will just need to expand that to handle some more conversion.
2025-09-16 2:54 AM
C3 A7 is the correct UTF-8 encoding of character U+E7. I cannot see a problem here.
2025-09-16 3:55 AM - edited 2025-09-16 3:56 AM
The problem was that I was a ***.
I should have used ISO 8859-1 encoding rather than UTF-8.
Well, I learned something today. That can't be half bad.
P.S. Interestingly, the website won't let me speak badly about myself...