Badly_Drawn_Squirrel - Blog


Memory Ramblings, Part 1: Static Allocation

This is the first post in a series about memory: memory allocation, memory management, etc. In these posts, I will work through a particular topic to show how various parts of software are involved in the management of the memory on a computer.

I am assuming the reader can understand C (and/or C++) and has an understanding of its memory model and build tools (compiler and linker). This post will have a bit of assembly for the Microchip PIC12 and for Intel/AMD x64.

Memory Allocation

Memory allocation is the process of reserving memory and dividing it up for specific tasks or purposes. There are two aspects to memory allocation - a low level that deals with requesting memory from the operating system and a high level that deals with dividing up memory and reserving it for specific purposes.

Low Level (Reserving Memory)

At the low level, there are two different approaches: static and runtime allocation. With static allocation , either a program is running alone on a computer (with no operating system) and can divide up physical memory as needed, or the program has memory reserved for it by the operating system as part of program loading. In runtime allocation , the program performs a system call to request more (or less) memory from the operating system at runtime.

Runtime allocation is usually called dynamic allocation , however, that term is also used to refer to heap allocators like malloc()/free() in C or new/delete in C++. I'll call it runtime allocation in this article to avoid the confusion and note that I am specifically referring to reserving memory from the operating system, not from the C/C++ heap. The C/C++ heap itself relies on system calls to get large chunks of memory which are then managed by a heap algorithm in order to implement malloc()/free() and new/delete .

On Linux, static allocation is performed through a coordination of the compiler, linker, and the program loader (specifically the ELF format loader). The compiler and linker result in a binary that describes what sections of memory need to be allocated (and where), and the OS sets that up when loading the program. Runtime allocation can be done with the mmap() , mmap2() , brk() , or sbrk() system calls.

On embedded platforms with no OS, there is no runtime allocation, since the program already has control and access to all of the system's memory. Instead, there is just static allocation in the assembler or compiler to help you divide up all the memory you have access to. You may still have calls used to allocate from a heap, or even map pages into virtual memory, but your program already owns and controls all the memory to begin with. You did not need to ask an OS for access to it.

High Level (Dividing Memory)

At the high level, you have memory but you need to divide it up in a useful, efficient way. In C, the language provides static allocation, manages memory on a call stack, and the C standard library provides a heap allocator through the malloc() / free() interface.

Other techniques include linear allocators (also called memory arenas), block or pool allocators, reference counting, and garbage collection.

Later posts will look a some of these allocation strategies.

Static Allocation

Static allocation is the most basic memory allocation technique, to the point that it is not usually thought of as such. Static allocation is used both to obtain memory from the OS and to divide up the memory for different variables and data structures.

With static allocation, you reserve and partition memory at compile time (or earlier). Because the memory layout is generated at compile time, it cannot change or "grow" during runtime; whatever variables, arrays, etc are declared in the program are all there will ever be. Statically allocated memory is available at the very start of the program and persists for the duration of the program.

Static allocation is closely related to program loading, and the same systems that allocate your static memory also set up the memory to store your (machine) code. The program loader is instructed on how to set up a process's virtual address space by reading the headers of the executable.

In this post, we'll look at how static allocation is implemented as well as the costs and benefits of using it. We will start by looking at assembly language programs on an extremely simple device (PIC12), and then work towards the implementation for C on modern computers (x64).

Manual Memory Layout

The simplest way to perform static allocation is to manually select the memory locations for all of your data. This is an approach used in assembly on single-program computers with no OS (ex, a NES game). The programmer manually assigns a purpose to each memory address (or to a range of them). The assembler may help you do this by supporting aliases, but that is not required - you could also use the memory addresses directly. There is little help from the compiler or assembler; it is a manual process done before compilation by the programmer.

For an example of manually laying out memory, we'll look at a program for the PIC12F508 microcontroller. The PIC12F508 is a very small, 4-pin microcontroller. It has:

You can read the data sheet here. See the Instruction Set section for details on the 33 instructions. If you don't understand the assembly perfectly, don't worry - I'll explain the important parts below the source. Here is the program:

; MPASM instructions are of the form:
;   label directive operand1, operand2 ...
;
; Most directives correspond to machine instructions, and cause the
; instruction to be output at the current address location, and then
; move the current location forward by their size.
;
; The label is optional. If there is a label, the label
; becomes an alias for the current address, which is typically
; where the directive following the label writes an instruction.
; You can also place a label on a line alone, without a directive.
;
; Some directives do not output instructions, but instead change
; assembler settings.

; Here is the program itself:

; Behold, a program more boring than hello world: A delayed output.
; MPASM assembler ceremony (set device and import definitions)
    list p=12f508 ; Set the device type
    #include p12f508.inc ; Include definitions (aliases) for the device
; Set chip configuration bits: (Use internal oscillator, no watchdog timer)
    __config _OSC_IntRC & _WDT_OFF & _CP_OFF & _MCLRE_ON

; Creates the reset vector (entry point on startup)
reset_vector    org    0x0000            
    goto main_program

; Declares an alias foo for memory location 0x7
foo equ 0x7

main_program    org    0x0001
    movlw 0x1E
    tris  GPIO   ; set gp0 pin to output mode, others to input mode
    movlw 0x6F   
    movwf foo    ; set foo to 0x6f
loop
    decf foo        ; foo--
    btfss STATUS, Z ; check foo == 0
    goto loop       ; repeat loop if false, otherwise:
    movlw 1         ; 
    movwf GPIO      ; set output on
    goto $          ; and halt
    end             ; MPASM demands this be here, it does nothing

This program initializes a byte at memory location 0x7 to the value 0x6F. It loops while decrementing the value in 0x7, and once it hits zero, it exits the loop, sets an output pin on the microcontroller to on, and then halts (loops in place forever). Exciting stuff.

The key line here is foo equ 0x7 . This assembly directive creates an alias foo for the value 0x7 . This is not a variable - its an alias, like a pre-processor definition in C. Wherever foo is used, 0x7 is substituted. This allows us to refer to memory location 0x7 with a mnemonic name. Essentially, as the programmer, we have decided that 0x7 will store a value we will call foo. By using many such aliases, a programmer can manually organize and reserve the available memory.

As an example, let's see how you would use this method to declare an array of 10 2D vectors, each of which contain a 16-bit X field, and a 16-bit Y field: (I am assuming we have a lot more RAM than the 12F508 now :-))

vec2d_size equ 0x4 ; Size of vector 
vec2d_x    equ 0x0 ; Offset of X field
vec2d_y    equ 0x2 ; Offset of Y field

my_vecs   equ 0x7  ; Location of 10 vectors
next_var  equ (my_vecs + vec2d_size * 10)

The size of the data structure and offsets to its fields are recorded manually. Space at 0x7 is reserved for the 10 vectors. The next variable in RAM has it position computed from the amount of space that the vectors take up, putting it at a location of 0x2F (47). Without an assembler that can compute arithmetic expressions, laying out memory this way becomes tedious and manual. Even with support for arithmetic expressions, your program will get verbose with all of the explicit address calculations clogging up your variable definitions. It would be nice if the assembler could do this for us, which it can, and which we will look at next.

Note that this assembly program also places the code at exact addresses. The org directive specifies the current program memory position to output code at. So, for example, the line main_program org 0x0001 makes the following assembly instructions write out to program memory address 0x0001 and makes main_program an alias for 0x0001 (the PIC has a separate address space for program and data memory, so this does not clash with foo at 0x7 ).

We can see the result of the equ and org directives by looking at the disassembly:

ADDRESS	DISM
000	GOTO  0x1
001	MOVLW 0x1E
002	TRIS  0x6
003	MOVLW 0x6F
004	MOVWF 0x7
005	DECF  0x7, F
006	BTFSS 0x3, 0x2
007	GOTO  0x5
008	MOVLW 0x1
009	MOVWF 0x6
00A	GOTO  0xA

Linker Based Static Allocation

The next stage after a manual layout (aka absolute addressed) assembly program is to make relocatable code and incorporate a linker. When using a linker, each file is assembled into a binary object file which contains the output in an intermediate form that has not applied absolute addresses yet. The linker then combines the object files into a final program binary.

In an absolute addressed assembly program like the last one, you can spread your code across multiple files with the include directive, but since all of the variables and code are manually placed at absolute addresses, these files need to be aware of what memory is being used by each other file. If you incorporate a third-party library into your program, you have to go into its code and shuffle everything around to make sure it does not use any of the memory you have already decided to use for your program. In general, every time you want to combine two source files together, you need to make sure their memory layouts do not conflict.

In contrast, when using a linker, it decides where to place everything, so the object file from the third-party library can be linked with your program's object files without any changes to its source. Your assembly files no longer reference absolute memory addresses (unless you are doing so explicitly to access hardware).

To use the linker, your program is assembled as a collection of "translation units". A translation unit generally consists of a single source file, but can contain multiple files via includes. The assembler will process the translation unit and produce a single object file. Each object file contains one or more "sections", which are memory regions that have to be placed at absolute addresses by the linker. Generally, there will be one code section and one data section, but more complex platforms will have more.

Let's look at an example of linking on the PIC12:

list p=12f508
    #include p12f508.inc
    __config _OSC_IntRC & _WDT_OFF & _CP_OFF & _MCLRE_ON
    global start
; The 'code' directive means:
;  The following directives' outputs are placed into the code section
reset_vector    code    0x0000            
    goto start

; The 'udata' directive means:
;  The following directives' outputs are placed into the udata section
udata 

; Reserve a memory location, assign alias foo:
foo res 1

; Go back to the code section
main_program    code
start 
    movlw 0x1E
    tris  GPIO   ; gp0 is output, others are inputs
    movlw 0x6F   
    movwf foo    ; set foo to 0x6f
loop
    decf foo        ; foo--
    btfss STATUS, Z ; check foo == 0
    goto loop       ; repeat if false
    movlw 1     
    movwf GPIO  ; set output
    goto $      ; halt
    end

This program does the same thing as the last one, but there are some big changes: the code directive replaces the org directive, res replaces the equ , the udata line has been added and there is a new global directive too.

The code directive sets the assembler to output into the code section. Optionally, the directive takes an absolute address to require code to output at a specific place; this is used to set interrupt vectors, which are located at fixed addresses. However, a majority of code will not have an absolute address. The udata directive has also been added. It sets the assembler to output into the udata (user RAM) section.

The res directive reserves memory and makes the label an alias for the address of the reserved space. The line foo res 1 reserves 1 byte of memory and makes foo an alias to the location of the reserved memory. The programmer does not know the location of the memory for foo; instead, it is decided by the linker when it combines the object files.

Code in separate object files do not share the same symbols, so a way is needed to indicate that a particular symbol comes from another object file. A file can export a symbol and make it visible to other translation units with the global directive. In other files, the export directive declares that a symbol comes from a different object file.

Lets look at the disassembly:

ADDRESS	DISM
000	GOTO  0x4
001	XOR   W, 0xFF
002	RETLW 0x0
003	RETLW 0x0
004	MOVLW 0x1E
005	TRIS  0x6
006	MOVLW 0x6F
007	MOVWF 0x7
008	DECF  0x7
009	BTFSS 0x3, 0x2
00A	GOTO  0x8
00B	MOVLW 0x1
00C	MOVWF 0x6
00D	GOTO  0xD

The linker decided that foo should be at 0x7 - the same as our chosen address in previous example. This is because it it the first general purpose RAM location, so the first reserved byte of memory gets placed there. It started the main_program code section at 0x004 instead of 0x001.

Lets repeat the example from last time, where we created an array of 10 vectors:

vec2d_size equ 0x4 ; Size of vector 
vec2d_x    equ 0x0 ; Offset of X field
vec2d_y    equ 0x2 ; Offset of Y field

my_vecs  res vec2d_size*10
next_var res 2
another  res 1

This is a huge improvement over the last version. Now you only need to know the size of your data structure and how many to allocate. You reserve that much space, and the linker takes care of finding the place to put it. The following variables do not need to compute any offsets. This is starting to look a lot like declarations in C! (We will look at those next.) Note that you do not need a linker combining object files to layout code and reserve memory this way - you could make an assembler or programming language that does the same thing all in one big compilation. However, historically, computers did not have enough memory to assemble and layout an entire program at once, so the linker was created to break it down into per file steps. This separate compilation and linking step is also (sadly) used by C and C++ to this day.

Static Allocation in C

Now its time to look at what happens when we statically allocate in a C program. In C, static allocation of variables is used for all global variables as well as any locals declared with the static keyword.

Here is the program:

#include <xc.h> // Include for PIC platform specific definitions
#include <stdint.h> // Standard integer sizes

uint8_t foo;
uint8_t baz;
uint8_t bar = 17;

void main(void) {
    TRIS = 0x1E;
    foo = 0x6F;

    do {
        foo--;
    } while (foo);

    GPIO = 1;

    while (1) {}
}

Once again, this program does the same thing as the last one. Just like the res directive, globals in C get placed in memory by the linker. However, the C language provides some guarantees about statically allocated variables: they are initialized to 0, or you can provide an initial value. To help demonstrate this, I've added an two extra variables, one with no value (init to zero) and the other initialized to 17. Lets look at the disassembly:

ADDRESS	DISM
000	GOTO 0x1
001	CALL 0x6
002	MOVWF 0x9
003	CLRF 0x8
004	CLRF FSR
005	GOTO 0x1F3
006	RETLW 0x11
... snip ...
1F3	MOVLW 0x1E
1F4	TRIS GPIO
1F5	MOVLW 0x6F
1F6	MOVWF 0x7
1F7	MOVLW 0x1
1F8	SUBWF 0x7, F
1F9	MOVF 0x7, W
1FA	BTFSS STATUS, 0x2
1FB	GOTO 0x1F7
1FC	MOVLW 0x1
1FD	MOVWF GPIO
1FE	GOTO 0x1FE
1FF	XORLW 0xFF

The main function starts at 0x1F3 (the Microchip xc8 C linker seems to start from the top of memory down). You'll notice there are some instructions at 0x001 that run before main. This code copies the value 17 from program memory into the location for the variable bar (0x9), and zeroes the memory location of baz (0x8).

Because the PIC architecture has a separate program and data space, there is no way to have data memory hold any known value at startup. So, on this architecture, there is a runtime cost to static allocation in C, since all of the data must be either cleared to zero or copied from constants stored in program memory. Generally speaking, any architecture with either a separate program and data address space or with program code stored in ROM will need to include runtime work to set up C static variables. Next, lets move to a modern architecture with a modern OS, and see what happens there.

Static Allocation with an OS and Virtual Memory

We will go back to assembly, but this time, we will use the NASM assembler on Linux on an Intel x64 architecture processor. If you are unfamiliar with x64 assembly, there are plenty of guides across the internet (like this) in addition to Intel's own documentation. Here's a basic x64 "hello world" for Linux:

; Enter the .bss section, the section for uninitialized
;   static data
section .bss
    ; declare an unitialized 64 bit variable 
    ; named state_uninit_var
    static_uninit_var: resq 1 

; Enter the .data section, the section for initialized
;   static data
section .data
    ; Create an initialized char/byte array, 
    ;   with a newline at the end (the byte value 10)
    ;   with a label/alias 'static_str'
    static_str:     db  "hello, world", 10

    ; Save the length of the string as an alias
    ; The $ represents the current output address,
    ; which is just after the string. So
    ; $ - static_str == length of the string
    static_str_len: equ $-static_str

    ; Declare an initialized 64 bit integer,
    ; initial value is 4
    static_number:     dq  4

; Enter the text section, the section for program instructions
section .text
global _start ; Export the entry point for the program
_start: 
    mov rax, [static_number]     ; rax  = *static_number
    add rax, 10                  ; rax += 10 
    mov [static_uninit_var], rax ; *static_uninit_var = rax
    cmp rax, 14                  ; compare rax and 14
    jne .no_print                ; jump to .no_print if not equal

    ; Call the write system call (Linux kernel interface):
    ; A Linux system call is performed by loading arguments
    ; into specific registers and then running the syscall
    ; instruction (on x64 at least)
    ;
    ; The first argument, in register rax, is the call number,
    ; denoting which operation to perform.
    mov rax, 1              ; write(
    mov rdi, 1              ;   STDOUT_NO,
    mov rsi, static_str     ;   static_str,
    mov rdx, static_str_len ;   static_str_len)
    syscall
.no_print:
    ; Call the exit system call
    mov rax, 60 ; exit(
    mov rdi, 0  ;   0)
    syscall
; Note - Build with: nasm -f elf64 -o prog1.o
;         Link with: ld prog1.o -o prog1

This program prints "hello, world\n" to stdout and then exits with an exit code of 0. The syscall instruction triggers a jump into the kernel's system call handler. Parameters are passed in specific registers as per the Linux ABI, with register rax containing the system call id. We use this to call two Linux system calls, write() and exit() .

Code and data are placed into one or more sections defined by the section directive. The text section contains all program instructions, the data section contains initialized static data and the bss section contains uninitialized static data. Just like the Microchip assembler, uninitialized data is declared with "res" directives. ( resq reserves quad-words, which are 8 bytes each).

New though is the stuff in the the ".data" section. Literal data can be added to the object file with the d* set of directives. db declares bytes and when followed by a string literal, declares all of the bytes in the string. static_str: db "hello, world", 10 reserves 13 bytes for the the "hello, world" string data followed by 10 (a newline). static_number: dq 4 reserves space for a 64-bit integer initialized to the value 4.

ELF64 Program Loading

So all this looks nice, but it raises the question of how the program and its data get loaded in the first place. On the Microchip architecture, the program is written to flash ROM and runs right at power-up. But in this case, the program comes from disk and the operating system must load and launch it.

Furthermore, you no longer have access to a simple block of physical memory. Instead, you have a virtual address space that is managed for you by the kernel. With a virtual address space, there is a layer of indirection between your program and the physical memory. The kernel configures a page table that maps address ranges in your program to pages of physical memory. When you access memory in your program, your virtual addresses are first translated into physical ones as per the page table, and then the lookup occurs. So, if the program expects the string "hello, world" to be in a specific place, it needs to get mapped there by the kernel.

To figure this out, we need to look at the executable file that NASM produces. The readelf program can print out text tables of the headers of an ELF64 binary. With the command readelf -S prog1 , we can get a list of the sections:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         00000000004000b0  000000b0
       0000000000000041  0000000000000000  AX       0     0     16
  [ 2] .data             PROGBITS         00000000006000f4  000000f4
       0000000000000015  0000000000000000  WA       0     0     4
  [ 3] .bss              NOBITS           000000000060010c  00000109
       000000000000000c  0000000000000000  WA       0     0     4
  [ 4] .shstrtab         STRTAB           0000000000000000  000002cc
       000000000000002c  0000000000000000           0     0     1
  [ 5] .symtab           SYMTAB           0000000000000000  00000110
       0000000000000150  0000000000000018           6    10     8
  [ 6] .strtab           STRTAB           0000000000000000  00000260
       000000000000006c  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

Here you can see the sections that we used, plus a few more. For example, the C language usually puts string constants in the read-only strtab section, not the data section like we have done. What's important here are the address and offset columns. The address value is the location in virtual memory where the section begins, and offset is the offset of that section from the beginning of the ELF64 segment.

If we look at the disassembly of the program, we can see that the program finds the "hello, world\n" string at address 0x6000f4, in the range the data section is set to be mapped:

Disassembly of section .text:
00000000004000b0 <_start>:
  4000b0:       48 8b 04 25 01 01 60    mov    rax,QWORD PTR ds:0x600101
  4000b7:       00
  4000b8:       48 83 c0 0a             add    rax,0xa
  4000bc:       48 89 04 25 0c 01 60    mov    QWORD PTR ds:0x60010c,rax
  4000c3:       00
  4000c4:       48 83 f8 0e             cmp    rax,0xe
  4000c8:       75 1b                   jne    4000e5 <_start.no_print>
  4000ca:       b8 01 00 00 00          mov    eax,0x1
  4000cf:       bf 01 00 00 00          mov    edi,0x1
  4000d4:       48 be f4 00 60 00 00    movabs rsi,0x6000f4
  4000db:       00 00 00
  4000de:       ba 0d 00 00 00          mov    edx,0xd
  4000e3:       0f 05                   syscall
00000000004000e5 <_start.no_print>:
  4000e5:       b8 3c 00 00 00          mov    eax,0x3c
  4000ea:       bf 00 00 00 00          mov    edi,0x0
  4000ef:       0f 05                   syscall

Next, let's list the ELF64 program headers, which will show us the segments that get loaded into memory, with readelf -l prog1 :

Elf file type is EXEC (Executable file)
Entry point 0x4000b0
There are 2 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000000f1 0x00000000000000f1  R E    200000
  LOAD           0x00000000000000f4 0x00000000006000f4 0x00000000006000f4
                 0x0000000000000015 0x0000000000000024  RW     200000

 Section to Segment mapping:
  Segment Sections...
   00     .text
   01     .data .bss

An ELF64 binary contains a collection of segments. LOAD (PT_LOAD) segments are regions of memory that are supposed to be mapped into the process's virtual address space. The first LOAD segement contains the text section, and it gets placed at 0x400000 and is readable/executable. The second segment contains both the data and bss sections, which get mapped to 0x6000f4 and is readable/writable. Notice that the MemSiz of the second segement is larger than the FileSiz; this is because the bss section is all uninitialized data. Instead of getting data from the file, the bss section is initialized to zeroes.

So how do these get into the running process? The code for loading ELF format binaries is in fs/binfmt_elf.c. The function in question is load_elf_binary. Amidst a whole bunch of work, it eventually loops through all of the program headers looking for PT_LOAD entries, mmaps (vm_mmap_pgoff) each segment, and calls brk (vm_brk_flags) to map anonymous (zeroed memory) pages for uninitialized regions such as the bss section whenenever MemSiz > FileSiz .

So, when a program is loaded on a modern Linux system, statically allocated memory is set up mostly the same way as runtime allocations: mmap and sbrk . In this case, the file being memory mapped is the program binary itself, and it is all set up automatically by the ELF loading code. There must always be at least one mmap performed by the kernel, since, of course, your program cannot preemptively memory map itself. However, you could actually make a program that contains only a text section and does not have any data or bss.

Static Allocation and ASLR

Lastly, we will look at what happens with static allocations on a modern Linux C program. Here is the program:

#include <stdio.h>

const char *foo = "hello, world";

int main() {
	printf("%llx: %s\n", (long long)foo, foo);
	return 0;
}

When you compile and run this program (assuming you have a kernel configured normally for modern Linux distributions), something interesting will happen. Every time you run the program, the address of the string will change!

eric@ubuntu:~/statics$ ./prog5
562c07baa714: hello, world
eric@ubuntu:~/statics$ ./prog5
55e6f8c5c714: hello, world
eric@ubuntu:~/statics$ ./prog5
55c677a52714: hello, world

What changed? The first sign is in the ELF header. The assembly program produced an ELF binary of type ET_EXEC, but GCC produced an ET_DYN file - a file that is relocatable and dynamically loadable. A relocatable library is a binary that can be loaded to any address and still be set up to work correctly. This is typically meant for dynamic (aka shared) libraries, which need to be loaded at runtime by another program, meaning that they will not always be placed at the same address.

The disassembly will hint at us what's going on here (run objdump -M intel -d prog5 ). This disassembly is pretty long, so I wont paste it below - you can read it here. You can also see the list of sections in the ELF binary here.

Instead of generating a normal executable, where the program code and data are at known locations in virtual memory, the compiler has produced Position Independent Code. In position independent code, your statically allocated data is not addressed directly. Instead, the actual location of the data is looked up in a Global Offset Table (GOT).

The GOT is is a table of addresses loaded from an ELF executable and then populated by the dynamic library loader. The entries are used to find the location of statically allocated data in sections that have been dynamically loaded into the virtual address space. This allows dynamic libraries to be loaded to any location in memory, while still having their state accessible. There is a similar table, the procedure linkage table, for calling functions in dynamically loaded code.

So if this is an executable, not a dynamic library, why did GCC make it dynamic, relocatable code? It does this so that the program data can be intentionally loaded to a different random location on every run - a feature called address space layout randomization (ASLR). ASLR is a security feature that attempts to make it much harder for code injected with an overflow exploit (or similar) to be able to perform dangerous operations. For example, an exploit may overwrite a return address to jump to the libc system() function and launch a shell, or they may try to read a private key out of the program's memory. Attacks like these become much harder if the locations of such things in memory are random - now any attack has to have some way to search for and find everything.

Costs and Benefits of Static Allocation

Costs

The most obvious cost of static allocation is that it is fixed in size. All of the arrays, hashes, etc that you allocate statically are set to a size at compile time and that size won't change.

Another cost of static allocation is poor error reporting. If you rely on static allocation to allocate a large block of memory and it is too big for the system it is run on, the program will just not start. It might print a message like "Segmentation fault", which is confusing and unhelpful to the end user. If you allocate the memory yourself, you can check for failure and either print a useful error message or change the program behavior to require less memory. When you statically allocate, you are not just setting a maximum memory usage, but also a minimum usage.

On microcontroller platforms like the Microchip PIC, a large amount of static allocation may introduce a delay at startup during which the initialzed data is copied to RAM and the uninitialized data is cleared to zero. Depending on your design constraints, the time this adds to startup may be problematic. Plus, you pay twice for your initialized data - if it is non-constant, it has to be stored in program memory and copied into data memory; this can waste a lot of space on small devices.

C++ constructors do not play well with static allocation. Within a single translation unit, globals are constructed at startup in the order of declaration. Across translation units, the order is implementation specific. This may make static allocation hard to use when relying on C++ constructors for initialization, depending on how interdependent your declared objects are.

Benefits

The most obvious cost of static allocation is also the most obvious benefit: it is fixed in size at compile time. Since you know the whole data layout at compile time, the exact memory usage of your program can be guaranteed. This is extremely valuable on memory constrained devices like 8-bit microcontrollers, where you want to be certain that the program can never exceed its memory budget.

Because static allocations are fixed for the duration of a progam and known at compile time, static allocations do not require any runtime pointer management to prevent leaks. Additionally, they are all packed into one contiguous section so there is no external fragmentation.

In languages like C that let you set initial values for static data, static allocations allow you to store data directly in the source code, and have it available immediately at the start of the program.

A Common Misconception

A common misconception regarding static allocation is that using it will result in tons of global variable filled spaghetti code. The assumption that static allocation implies global state is false. Visibility is separate from how the memory was allocated. In C, for example, static allocations can have global scope, file scope (static global), or function scope (static local). Just because data is allocated statically does not mean it has to be accessed globally.

When writing your application, you can have one small area of the code own the static allocation, and have the rest of your program access it through a pointer that is passed in as an argument. This allows you to use static allocation without directly accessing global state, and also supports refactoring later to other allocation strategies.

Next Time

Well, that's it for this topic. The next article will be about the next most basic allocator, the call stack, and will have some C and x64 assembly. No more PIC12 :(.