Monthly Archives: March 2022

Code Tricky: why not optimize is_digit

Recently I need a super small bootloader for an arm chip, every bit need to be very careful used. When I read the code, I find an interesting function is_digit in printf.c, this blog will show common code and old expert code difference.

This function is used to check if a char is in the range of ‘0’ ~’9′, ascii is 48~57.

The common code is like this:

bool _is_digit_a(char ch)
{
    return (ch >= '0') && (ch <= '9');
}

This way is easy to read, but normally old fashion will write in another way, like this

bool _is_digit_b(char ch)
{
    return ((unsigned char)(ch - '0') <= 9);
}

Haha, actually they are same function, but second way looks like much smaller and faster.

Let’s do a simple test by gcc and its toolchain, first for x86 system:

call gcc -c test.c -o test.x86.o, then objdump -DSx test.x86.o, we can get machine code

0000000000000000 <_is_digit_a>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 89 f8 mov %edi,%eax
a: 88 45 fc mov %al,-0x4(%rbp)
d: 80 7d fc 2f cmpb $0x2f,-0x4(%rbp)
11: 7e 0d jle 20 <_is_digit_a+0x20>
13: 80 7d fc 39 cmpb $0x39,-0x4(%rbp)
17: 7f 07 jg 20 <_is_digit_a+0x20>
19: b8 01 00 00 00 mov $0x1,%eax
1e: eb 05 jmp 25 <_is_digit_a+0x25>
20: b8 00 00 00 00 mov $0x0,%eax
25: 83 e0 01 and $0x1,%eax
28: 5d pop %rbp
29: c3 retq
000000000000002a <_is_digit_b>:
2a: f3 0f 1e fa endbr64
2e: 55 push %rbp
2f: 48 89 e5 mov %rsp,%rbp
32: 89 f8 mov %edi,%eax
34: 88 45 fc mov %al,-0x4(%rbp)
37: 0f b6 45 fc movzbl -0x4(%rbp),%eax
3b: 83 e8 30 sub $0x30,%eax
3e: 3c 09 cmp $0x9,%al
40: 0f 96 c0 setbe %al
43: 5d pop %rbp
44: c3 retq

so A takes around 15 commands and 42 bytes, and B takes 11 commands and 27 bytes, save approx 30%.

Then check arm, call arm-none-eabi-gcc -c test.c -o test.arm.o, then arm-none-eabi-objdump -Dsx test.arm.o

00000000 <_is_digit_a>:
0: e52db004 push {fp} ; (str fp, [sp, #-4]!)
4: e28db000 add fp, sp, #0
8: e24dd00c sub sp, sp, #12
c: e1a03000 mov r3, r0
10: e54b3005 strb r3, [fp, #-5]
14: e55b3005 ldrb r3, [fp, #-5]
18: e353002f cmp r3, #47 ; 0x2f
1c: 9a000004 bls 34 <_is_digit_a+0x34>
20: e55b3005 ldrb r3, [fp, #-5]
24: e3530039 cmp r3, #57 ; 0x39
28: 8a000001 bhi 34 <_is_digit_a+0x34>
2c: e3a03001 mov r3, #1
30: ea000000 b 38 <_is_digit_a+0x38>
34: e3a03000 mov r3, #0
38: e2033001 and r3, r3, #1
3c: e20330ff and r3, r3, #255 ; 0xff
40: e1a00003 mov r0, r3
44: e28bd000 add sp, fp, #0
48: e49db004 pop {fp} ; (ldr fp, [sp], #4)
4c: e12fff1e bx lr
4c: R_ARM_V4BX ABS
00000050 <_is_digit_b>:
50: e52db004 push {fp} ; (str fp, [sp, #-4]!)
54: e28db000 add fp, sp, #0
58: e24dd00c sub sp, sp, #12
5c: e1a03000 mov r3, r0
60: e54b3005 strb r3, [fp, #-5]
64: e55b3005 ldrb r3, [fp, #-5]
68: e2433030 sub r3, r3, #48 ; 0x30
6c: e20330ff and r3, r3, #255 ; 0xff
70: e3530009 cmp r3, #9
74: 93a03001 movls r3, #1
78: 83a03000 movhi r3, #0
7c: e20330ff and r3, r3, #255 ; 0xff
80: e1a00003 mov r0, r3
84: e28bd000 add sp, fp, #0
88: e49db004 pop {fp} ; (ldr fp, [sp], #4)
8c: e12fff1e bx lr
8c: R_ARM_V4BX ABS

A uses 20 instructions and B uses 16 instructions, saves 20%. Also no branch, more friendly to CPU workflow.

Final, try riscv, call riscv-none-embed-gcc -c test.c -o test.riscv.o, then riscv-none-embed-objdump -Dsx test.riscv.o

00000000 <_is_digit_a>:
0: 1101 addi sp,sp,-32
2: ce22 sw s0,28(sp)
4: 1000 addi s0,sp,32
6: 87aa mv a5,a0
8: fef407a3 sb a5,-17(s0)
c: fef44703 lbu a4,-17(s0)
10: 02f00793 li a5,47
14: 00e7fa63 bgeu a5,a4,28 <.L2>
14: R_RISCV_BRANCH .L2
18: fef44703 lbu a4,-17(s0)
1c: 03900793 li a5,57
20: 00e7e463 bltu a5,a4,28 <.L2>
20: R_RISCV_BRANCH .L2
24: 4785 li a5,1
26: a011 j 2a <.L3>
26: R_RISCV_RVC_JUMP .L3
00000028 <.L2>:
28: 4781 li a5,0
0000002a <.L3>:
2a: 8b85 andi a5,a5,1
2c: 0ff7f793 andi a5,a5,255
30: 853e mv a0,a5
32: 4472 lw s0,28(sp)
34: 6105 addi sp,sp,32
36: 8082 ret
00000038 <_is_digit_b>:
38: 1101 addi sp,sp,-32
3a: ce22 sw s0,28(sp)
3c: 1000 addi s0,sp,32
3e: 87aa mv a5,a0
40: fef407a3 sb a5,-17(s0)
44: fef44783 lbu a5,-17(s0)
48: fd078793 addi a5,a5,-48
4c: 0ff7f793 andi a5,a5,255
50: 00a7b793 sltiu a5,a5,10
54: 0ff7f793 andi a5,a5,255
58: 853e mv a0,a5
5a: 4472 lw s0,28(sp)
5c: 6105 addi sp,sp,32
5e: 8082 ret

A takes 54bytes and 20 instructions; B takes 38bytes and 14 instructions, save 30%. Also no branch, more friendly to CPU workflow.

test source code like this:

#include <stdbool.h>
#include <stdio.h>

bool _is_digit_a(char ch)
{
    return (ch >= '0') && (ch <= '9');
}

bool _is_digit_b(char ch)
{
    return ((unsigned char)(ch - '0') <= 9);
}

void main(void)
{
    for (unsigned char i = 0; i < 255; i++) {
        printf("%d is%s digit\r\n", i, _is_digit_a((char)i) ? "" : " not");
        printf("%d is%s digit\r\n", i, _is_digit_b((char)i) ? "" : " not");
    }
}

PS: actually for modern compiler, any optimize will make both same code, only 1/3 of not optimized size. 🙂 nothing really need to improve now. Way A is the better way for it is more readable.

So final note, do not forget -O when you use gcc :p