Wednesday, October 8, 2008

Remove globals from libs

Global variables in libraries are bad, everyone knows that. Maybe I'll make another post explaining when they can be tolerated how this can be done. But for now we assume that they are simply bad.

One common mistake is to declare static data (like tables) as non-const. This makes them
practically variables. If the code never changes them, they cause no problem in terms of thread safety. But unfortunately the dynamic linker doesn't know that, so they will be mapped into r/w pages when the library is loaded. And those pages will, of course, not be shared between applications so you end up with predictable redundant blocks in your precious RAM.

Cleaning this up is simple: Add const to all declarations, where it's missing. But how does one find all these declarations in a larger sourcetree in a reasonable time? The ELF format is well documented and there are numerous tools to examine ELF files.

Let's take the following C file and pretend it's a library build from 100s of sourcefiles with 100000 of codelines:

struct s
{
char * str;
char ** str_list;
int i;
};

struct s static_data_1 =
{
"String1",
(char*[]){ "Str1", "Str2" },
1,
};

char * static_string_1 = "String2";

int zeroinit = 0;
Now there are 2 sections in an ELF file, which need special attention: The .data section contains statically initialized variables. The .bss section contains data which is initialized to zero. After compiling the file with gcc -c the sizes of the sections can be obtained with:
# size --format=SysV global.o
global.o :
section size addr
.text 0 0
.data 56 0
.bss 4 0
.rodata 26 0
.comment 42 0
.note.GNU-stack 0 0
Total 128
So we have 56 bytes in .data and 4 bytes in .bss. After successful cleanup all these should ideally end up in the .rodata section (read-only statically initialized data). Since we have 100000 lines of code, the next step is to find the variable names (linker symbols) contained in the sections:
# objdump -t global.o

global.o: file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 global.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .rodata 0000000000000000 .rodata
0000000000000020 l O .data 0000000000000010 __compound_literal.0
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 g O .data 0000000000000018 static_data_1
0000000000000030 g O .data 0000000000000008 static_string_1
0000000000000000 g O .bss 0000000000000004 zeroinit
Now we know that the variables static_data_1, static_string_1
and zeroinit are affected.

The symbol __compound_literal.0 comes from the expression (char*[]){ "Str1", "Str2" }. The bad news is that compound literals are lvalues according to the C99 standard, so they won't be assumed const by gcc. You can declare them const, but they'll still be in the .data section, at least for gcc-Version 4.2.3 (Ubuntu 4.2.3-2ubuntu7). The cleaned up file looks like:
struct s
{
const char * str;
char ** const str_list;
int i;
};

static const struct s static_data_1 =
{
"String1",
(char*[]){ "Str1", "Str2" },
1,
};

char const * const static_string_1 = "String2";

const int zeroinit = 0;
The resulting symbol table:
0000000000000000 l    df *ABS* 0000000000000000 global1.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .rodata 0000000000000000 .rodata
0000000000000010 l O .rodata 0000000000000018 static_data_1
0000000000000000 l O .data 0000000000000010 __compound_literal.0
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000040 g O .rodata 0000000000000008 static_string_1
0000000000000048 g O .rodata 0000000000000004 zeroinit
Larger libraries have huge symbol tables, so you will of course filter it with:

grep \\.data | grep -v __compound_literal

So if you want to contribute to a library which needs some cleanup, and you are of the "I know just a little C but I want to help"-type, this is a good idea for a patch :)

No comments: