Memory optimization is crucial for writing performant software systems. When a software has a finite amount of memory to work with, many issues can arise when that memory isn't used efficiently. That's why memory optimization is critical for better overall performance.
Go inherits many of C's advantageous features, but what I notice is that a large part of people who use it do not know the full power of this language. One of the reasons may be a lack of knowledge about how it works at a low level, or a lack of experience with languages like C or C . I mention C and C because the foundations of Go are pretty much built on the wonderful features of C/C . It's no coincidence that I will quote an interview of Ken Thompson at Google I/O 2012:
For me, the reason I was enthusiastic about Go is because just about the same time that we were starting on Go, I read (or tried to read) the C 0x proposed standard, and that was a convincer for me.
Today, we're going to talk about how we can optimize our Go program, and more specifically, how it would be good to use structs in Go. Let's first say what a structure is:
A struct is a user-defined data type that groups related variables of different types under a single name.
To fully understand where the problem lies, we will mention that modern processors do not read 1 byte at a time from the memory. How the CPU fetches the data or instructions which are stored in the memory?
In computer architecture, a word is a unit of data that a processor can handle in a single operation - generally the smallest addressable unit of memory. It's a fixed-size group of bits (binary digits). The word size of a processor determines its ability to handle data efficiently. Common word sizes include 8, 16, 32, and 64 bits. Some computer processor architectures support a half word, which is half the number of bits in a word, and a double word, which is two contiguous words.
In nowday the most common architectures is 32 bit and 64 bit. If you have 32 bit processor then it means it can access 4 bytes at a time which means word size is 4 bytes. If you have 64 bit processor the it can access 8 bytes at time which means word size is 8 bytes.
When we store the data in memory, each 32-bit data word has a unique address, as shown below.
Figure. 1 ‑ Word-Addressable Memory
We can read the data in memory and load it to one register using the load word (lw) instruction.
After know the above theoria let's see what is the practise. For the descripte the cases with structure data structure, I will demonstrate with C languge. A struct in C is a composite data type that allows you to group multiple variables together and store them in the same block of memory. As we said early the CPU access the data depend of the givemn architecture. Every data type in C will have alignment requirements.
So let's we have the following as simple structure:
// structure 1 typedef struct example_1 { char c; short int s; } struct1_t; // structure 2 typedef struct example_2 { double d; int s; char c; } struct2_t;
And now try to calculate the size of the following structures:
Size of Structure 1 = Size of (char short int) = 1 2 = 3.
Size of Structure 2 = Size of (double int char) = 8 4 1= 13.
The real sizes using a C program might surprise you.
#include <stdio.h> // structure 1 typedef struct example_1 { char c; short int s; } struct1_t; // structure 2 typedef struct example_2 { double d; int s; char c; } struct2_t; int main() { printf("sizeof(struct1_t) = %lu\n", sizeof(struct1_t)); printf("sizeof(struct2_t) = %lu\n", sizeof(struct2_t)); return 0; }
Output
sizeof(struct1_t) = 4 sizeof(struct2_t) = 16
As we can see, the size of the structures is different from those we calculated.
C and Go employ a technique known as "struct padding" to ensure data is appropriately aligned in memory, which can significantly affect performance due to hardware and architectural constraints. Data padding and alignment conform to the requirements of the system's architecture, mainly to optimize CPU access times by ensuring data boundaries align with word sizes.
Let's go through an example to illustrate how Go handles padding and alignment, consider the following struct:
type Employee struct { IsAdmin bool Id int64 Age int32 Salary float32 }
A bool is 1 byte, int64 is 8 bytes, int32 is 4 bytes and float32 is 4 bytes = 17 bytes(total).
Let's validate the struct size by examining the compiled Go program:
package main import ( "fmt" "unsafe" ) type Employee struct { IsAdmin bool Id int64 Age int32 Salary float32 } func main() { var emp Employee fmt.Printf("Size of Employee: %d\n", unsafe.Sizeof(emp)) }
Output
Size of Employee: 24
The reported size is 24 bytes, not 17. This discrepancy is due to memory alignment. To understand how alignment works, we need to inspect the structure and visualize the memory which it ocupate.
Figure 2 - Unoptimized Memory Layout
The struct Employee will consume 8*3 = 24 bytes. Yo see the problem now, there are a lot of empty holes in the layout of Employee (those gaps created by the alignment rules are called “padding”).
Understanding how memory alignment and padding can affect the performance of an application is crucial. Specifically, data alignment impacts the number of CPU cycles required to access fields within a struct. This influence arises mostly from CPU cache effects, rather than raw clock cycles themselves, as cache behavior depends heavily on data locality and alignment within memory blocks.
Modern CPUs fetch data from memory into a faster intermediary called cache, organized in fixed-size blocks (commonly 64 bytes). When data is well-aligned and localized within the same or fewer cache lines, the CPU can access it more quickly due to reduced cache loading operations.
Consider the following Go structures to illustrate poor versus optimal alignment:
// structure 1 typedef struct example_1 { char c; short int s; } struct1_t; // structure 2 typedef struct example_2 { double d; int s; char c; } struct2_t;
CPU reads data in word-size instead of byte-size. As I described in the begining a word in a 64-bit system is 8 bytes, while a word in a 32-bit system is 4 bytes. In short, CPU reads address in the multiple of its word size. To fetch the variable passportId, our CPU takes two cycles to access the data instead of one. The first cycle will fetch memory 0 to 7 and the subsequent cycle will fetch the rest. And this is inefficient- we need of data structure alignment.By simply aligning the data, computers ensure that the var passportId can be retrieved in ONE CPU cycle.
Figure 3 - Comparing Memory Access Efficiency
Padding is the key to achieving data alignment. Padding occurs because modern CPUs are optimized to read data from memory at aligned addresses. This alignment allows the CPU to read the data in a single operation.
Figure 4 - Simply aligning the data
Without padding, data may be misaligned, leading to multiple memory accesses and slower performance. Therefore, while padding might waste some memory, it ensures that your program runs efficiently.
Aligned struct consumes lesser memory simply because it possesses a better struct fields order compared to Misaligned. Because of padding, two 13 bytes data structures turn out to be 16 bytes and 24 bytes respectively. Hence, you save up extra memory by simply reordering your struct fields.
Figure 5 - Optimizing Field Order
Improperly aligned data can slow performance as the CPU might need multiple cycles to access misaligned fields. Conversely, correctly aligned data minimizes cache line loads, which is crucial for performance, especially in systems where memory speed is a bottleneck.
Let’s do a simple benchmark to prove it:
#include <stdio.h> // structure 1 typedef struct example_1 { char c; short int s; } struct1_t; // structure 2 typedef struct example_2 { double d; int s; char c; } struct2_t; int main() { printf("sizeof(struct1_t) = %lu\n", sizeof(struct1_t)); printf("sizeof(struct2_t) = %lu\n", sizeof(struct2_t)); return 0; }
Output
sizeof(struct1_t) = 4 sizeof(struct2_t) = 16
As you can see the traversing the Aligned indeed takes lesser time than its counterpart.
Padding’s added to make sure each struct field lines up properly in memory based on its needs, like we saw earlier. But while it enables efficient access, padding can also waste space if the fields ain’t ordered well.
Understanding how to properly align struct fields to minimize memory waste due to padding is important for efficient memory usage, especially in performance-critical applications. Below, I will provide an example with a poorly aligned structure and then show an optimized version of the same structure.
In a poorly aligned struct, fields are ordered without considering their sizes and alignment requirements, which can lead to added padding and increased memory usage:
// structure 1 typedef struct example_1 { char c; short int s; } struct1_t; // structure 2 typedef struct example_2 { double d; int s; char c; } struct2_t;
The total memory could hence be 1 (bool) 7 (padding) 8 (float64) 4 (int32) 4 (padding) 16 (string) = 40 bytes.
An optimized structure arranges fields from largest to smallest size, significantly reducing or eliminating the need for additional padding:
#include <stdio.h> // structure 1 typedef struct example_1 { char c; short int s; } struct1_t; // structure 2 typedef struct example_2 { double d; int s; char c; } struct2_t; int main() { printf("sizeof(struct1_t) = %lu\n", sizeof(struct1_t)); printf("sizeof(struct2_t) = %lu\n", sizeof(struct2_t)); return 0; }
The total memory would then neatly comprise 8 (float64) 16 (string) 4 (int32) 1 (bool) 3 (padding) = 32 bytes.
Let's proof the above:
sizeof(struct1_t) = 4 sizeof(struct2_t) = 16
Output
type Employee struct { IsAdmin bool Id int64 Age int32 Salary float32 }
Reducing the structure size from 40 bytes to 32 bytes means a 20% reduction in memory usage per instance of Person. This can lead to considerable savings in applications where many such instances are created or stored, improving cache efficiency and potentially reducing the number of cache misses.
Data alignment is a pivotal factor in optimizing memory utilization and enhancing system performance. By arranging struct data correctly, memory usage becomes not only more efficient but also faster in terms of CPU read times, contributing significantly to overall system efficiency.
The above is the detailed content of Optimizing Memory Usage in Go: Mastering Data Structure Alignment. For more information, please follow other related articles on the PHP Chinese website!