# Antonin Carette

## A journey into a wild pointer

Whoever worked on system programming know that data alignment is very important to save memory space, or avoid runtime crashes casting from a specific type to another.
After some months on porting games to console systems, I noticed how data alignment could be, sometimes, a source of errors and a lack of knowledge.
I wanted to discuss about data alignment in this technical article.

• MacBook Air M1 (base model, 2020),
• clang 14.0.0 (Apple version), with C++17 support.

### The problem

What is a data structure alignment? An alignment is an integer value representing the number of bytes between successive addresses at which a given struct, or object, can be allocated.
In short: it a way to arrange and access an address in the computer memory.

As an example, for basic types on a 64bits system (we will consider the following examples running on a 64bits system), an `int` type sizes 4 bytes, a `char` type sizes 1 byte, a `double` type size 8 bytes, etc.

But what about a `struct`? A first approach to compute the size of a `struct` is to consider that the size of a `struct` is equals to the sum of its fields size.
As an example, we could compute the size of the following `Example` C/C++ structure as 4 + 1 bytes, which results to 5 bytes in total.

``````struct Example {
int a; // 4 bytes
char b; // 1 byte
}
``````

However, `Example` is not 5 bytes but… 8 bytes, which is 3 bytes more than our hypothesis.
To explain this behaviour, we will consecutively split the issue for five different structs and compare our expectations to the reality.

### How to compute struct sizes ?

#### DA01

``````struct DA01 {
char a, b, c, d; // 4 * 1 byte
}
``````

For `DA01` the maximum size taken by one field is 1 byte. We have 4 fields of 1 byte, which results to 4 bytes.

#### DA02

``````struct DA02 {
char a; // 1 byte
int b;
}
``````

For `DA02` the maximum size taken by one field is 4 bytes (`b`). We have a field of 1 byte and another of 4 bytes, which results to 8 bytes as we have 3 padding bytes to allocate memory for `a`.

#### DA03

``````struct DA03 {
char a; // 1 byte
int b; // 4 bytes
char c, d, e; // 3 * 1 byte
}
``````

For `DA03` the maximum size taken by one field is 4 bytes (`b`). We have a field of 1 byte and the second one of 4 bytes, and then three other 1 byte fields. This results to 12 bytes as we have 3 padding bytes for `a`, and 1 padding byte for `e`.

#### DA04

``````struct DA04 {
char a, b, c, d; // 4 * 1 byte
int e; // 4 bytes
}
``````

For `DA04` the maximum size taken by one field is 4 bytes (`e`). We have 4 fields of 1 byte consecutively, which results to 4 bytes and no padding needed. This results to 8 bytes.

#### DA05

``````struct DA05 {
char a; // 1 byte
float b; // 8 bytes
}
``````

Finally, for `DA05`, the maximum size taken by one field is 8 bytes (`b`). We have a field of 1 byte at first, and we must insert paddings to get 8 bytes in total - so, a padding of 7 bytes. This results to 16 bytes in total.

#### Summary

Struct Supposed size (in bytes) Real size (in bytes)
DA01 04 04
DA02 05 08
DA03 08 12
DA04 08 08
DA05 09 16

The difference of size between `DA03` and `DA04`, which contain the same fields but using a different order, should alerts you how data alignment could be important to manage and save some space in your programs.

### The benchmarks

If you want to reproduce the results by yourself, this is the simple program I used:

``````#include <stdio.h>
#include <stdlib.h>

struct DA01
{
char a, b, c, d;
};

struct DA02
{
char a;
int b;
};

struct DA03
{
char a;
int b;
char c, d, e;
};

struct DA04
{
char a, b, c, d;
int e;
};

struct DA05
{
char a;
double b;
};

template <typename T>
void getSize(T s, const uint8_t expected_size, const char* const name)
{
printf("* %s: expected %d bytes, got %lu bytes (alignment of %lu byte(s)) \n", name, expected_size, sizeof(s), _Alignof(s));
}

int main()
{
{
const DA01 s = {};
getSize(
s,
sizeof(s.a) + sizeof(s.b) + sizeof(s.c) + sizeof(s.d),
"DA01");
}
{
const DA02 s = {};
getSize(
s,
sizeof(s.a) + sizeof(s.b),
"DA02");
}
{
const DA03 s = {};
getSize(
s,
sizeof(s.a) + sizeof(s.b) + sizeof(s.c) + sizeof(s.d) + sizeof(s.e),
"DA03");
}
{
const DA04 s = {};
getSize(
s,
sizeof(s.a) + sizeof(s.b) + sizeof(s.c) + sizeof(s.d) + sizeof(s.e),
"DA04");
}
{
const DA05 s = {};
getSize(
s,
sizeof(s.a) + sizeof(s.b),
"DA05");
}
return 0;
}
``````

If you execute the program, using any optimization (`-Oxxx`) compiler parameter you want, you would obtain the following output:

``````* DA01: expected 4 bytes, got 4 bytes (alignment of 1 byte(s))
* DA02: expected 5 bytes, got 8 bytes (alignment of 4 byte(s))
* DA03: expected 8 bytes, got 12 bytes (alignment of 4 byte(s))
* DA04: expected 8 bytes, got 8 bytes (alignment of 4 byte(s))
* DA05: expected 9 bytes, got 16 bytes (alignment of 8 byte(s))
``````

### Why does this matter ?

As you might know, the CPU accesses memory by a single memory WORD at each CPU clock.
As long as the memory WORD size is, at least, as large as the largest data type supported, aligned accesses will always access a single memory WORD. This may not be true for misaligned data accesses.

#### Performance

As I explained before the system tries to grab a value, and parse it, from its data alignment information.
As an example, there is absolutely no issue for the following structure to parse it with a manual alignment of 1 byte.

``````struct Example {
char a, b, c, d; // 4 fields, 4 reads.
}
``````

However, let’s consider the following structure

``````struct Example {
int a; // 1 fields, but 4 reads if the data alignment is set to 1 byte.
}
``````

As we know, for a x86/64 or arm 64 architecture, an `int` type has a size of 4 bytes. If we specify a data alignment of 1 byte, this mean that the system will have to read 4 times the fields in order to get the real value behind `a`, and make something with.
This would be at most 4 times slower than specifying a data alignment of 4 bytes for this `Example` structure, which could grab the entire value in only one read (and I did not count the data consolidation cost).

And I don’t talk about misaligned continuous fields in a struct that can reduces significantively the reading and parsing of the data stored in it, and might results to undefined behaviors.

Also, a correct alignment for your data structures may help very much the cache system.

#### Undefined behaviors

In system programming you could consider to cast one type to another… and so a structure or a basic type.

As an example, you could theoritically cast, or copy, a `uint8_t` type to a `uint32_t` type.
However, the size is not correct, and so the data alignment.
So, a page fault error could happen finishing the cast and reading the result value, a 32 bytes value from an 8 bytes aligned value… outch!

Undefined behaviors can also happen, which are most of the time much harder to find and solve than crashes.

This may happen very frequently when you try to copy a data structure from a memory place to another, which belongs to another hardware / chip. This is frequent for systems with APU(s).

### Alignment specification

#### In the code

There are different ways to specify a data alignment for one, or all, data structure(s).

As an example you could specify a data alignment using pragmas:

• `#pragma pack`
• `#pragma data_align`
• `#pragma align`

You could also use the `alignas` keyword specifying the data structure, like this:

``````struct alignas(32) AlignAsExample
{
float a[4]; // 4 * 8 bytes, which results to 32 bytes. Nice!
};
``````

or using, for example, the `__declspec(align(#))` keyword.
However, as the official Microsoft documentation states for MSVC:

The compiler doesn’t guarantee or attempt to preserve the alignment attribute of data during a copy or data transform operation. You can take a look at this documentation here.

This may be the principal method to use, as you may want to use a different data alignment per struct, and not for your entire program.

On my local `clang` compiler the option to specify the maximum alignment data size is `-fpack-struct`.
As an example, compiling the same code (from the “Benchmarks” section) with a `-fpack-struct` of 1 byte outputs:
``````* DA01: expected 4 bytes, got 4 bytes (alignment of 1 bytes)