Performance Impact When Looping Over 8192 Elements
Certain matrix operations exhibit performance anomalies when the matrix size, particularly the number of rows, is a multiple of 2048 (e.g., 8192). This phenomenon, referred to as super-alignment, arises due to specific memory management practices in modern CPUs.
The provided code snippet demonstrates this issue, where a matrix res[][] is computed from a matrix img[][]. The performance for different matrix sizes, specifically 8191, 8192, and 8193, reveals a significant slowdown when the matrix size is 8192.
Super-Alignment Effects
The performance variations stem from the non-uniform access to memory caused by the nested loops iterating column-wise over the matrix img[][]. This non-sequential access pattern results in performance penalties on modern CPUs, which operate more efficiently with sequential memory access.
Resolution: Interchanging Outer Loops
The solution lies in reordering the nested loops, prioritizing row-wise iteration over column-wise iteration. By doing so, memory access becomes sequential, significantly improving performance:
for(j=1;j<SIZE-1;j++) { for(i=1;i<SIZE-1;i++) { // Code to compute res[j][i] } }
Performance Results
The following performance results demonstrate the improvement achieved by interchanging the outer loops:
Matrix Size | Original Code (s) | Interchanged Loops (s) |
---|---|---|
8191 | 1.499 | 0.376 |
8192 | 2.122 | 0.357 |
8193 | 1.582 | 0.351 |
This optimization drastically reduces the performance gap for matrices with dimensions that are multiples of 2048, resulting in consistent performance across different matrix sizes.
The above is the detailed content of Why is Looping Over 8192 Elements So Much Slower Than 8191 or 8193?. For more information, please follow other related articles on the PHP Chinese website!