效率 - c++中float的计算速度比double慢？

Question

c++中float的计算速度比double慢 　浮点运算都是以双精度进行的，即使只有float的运算，也要先转换成double型再计算。　　所以doublex型比float型要快一点。　　C++ 标准要求 float 类型至少要能精确表示到小数点...

伊谢尔伦 · Answer

For VC++, there are generally two situations.

The first case is when you compile a 32-bit program, it will use the X87 instruction set. In modern CPUs, there is a small stack inside X87, and each element is an 80-bit or 128-bit floating point. Regardless of whether you use float, double, or other types, when you push it in, it will be normalized into an 80-bit or 128-bit floating point of the same length, and then transferred back to you after all calculations are completed. So the speed should be almost the same.

The second case is that you compile a 64-bit program, or turn on MMX, SSE, and AVX instruction set optimization, or you simply use intrinsics to use these instruction sets directly. These instructions currently support float and double, and will not be converted to the same format as X87. Not only does double have twice the data of float, but the number of doubles that can be stored simultaneously in a register of the same size is less than half the number of floats. Therefore, after parallelization, float will be much faster than double.

Of course, in many cases, the accuracy of float is actually not enough, and when using intrinsics, the impact of your own level is several orders of magnitude greater than the impact of double, so it depends on your needs. Demand dominates.

大家讲道理 · Answer

It’s double faster.
I wrote a test code `main()
{
//float f1=0.0;
double f1=0.0;
int i,j;
for(i= 0;i<100000;i++){
for(j=0;j<10000;j++)f1+=1.1;
f1-=11000;
}
printf("%fn", f1);
}
`
float:
root@i5a:~/test# time ./a.out
-1412.595703

real 0m3.063s
user 0m3.065s
sys 0m0.000s

doubule
time ./a.out
0.000204

real 0m0.843s
user 0m0.840s
sys 0m0.004s

The difference is nearly 4 times
Let’s take a look at gcc -c -S, just look at the loop body part
double:

.L2:
        movl    000, %eax
        .p2align 4,,10
        .p2align 3
.L5:
        subl    , %eax
        addsd   %xmm1, %xmm0
        jne     .L5
        subl    , %edx
        subsd   %xmm2, %xmm0
        jne     .L2

Let’s look at float again:

 .L2:
        movl    000, %eax
        .p2align 4,,10
        .p2align 3
.L5:
        unpcklps        %xmm0, %xmm0
        subl    , %eax
        cvtps2pd        %xmm0, %xmm0
        addsd   %xmm1, %xmm0
        unpcklpd        %xmm0, %xmm0
        cvtpd2ps        %xmm0, %xmm0
        jne     .L5
        subl    , %edx
        subss   %xmm2, %xmm0
        jne     .L2
        unpcklps        %xmm0, %xmm0
        movl    $.LC3, %edi
        movl    , %eax
        cvtps2pd        %xmm0, %xmm0
        jmp     printf

-O2 optimization has been turned on.

Let’s take a look at compiling to 32-bit.
double：
.L8:

    fxch    %st(1)

.L2:

    movl    000, %eax
    .p2align 4,,7
    .p2align 3

.L5:

    subl    , %eax
    fadd    %st, %st(1)
    jne     .L5
    fxch    %st(1)
    subl    , %edx
    fsubs   .LC2
    jne     .L8

float：
.L9:

    fxch    %st(1)

.L2:

    movl    000, %eax
    jmp     .L5
    .p2align 4,,7
    .p2align 3

.L8:

    fxch    %st(1)

.L5:

    fadd    %st, %st(1)
    fxch    %st(1)
    subl    , %eax
    fstps   12(%esp)
    flds    12(%esp)
    jne     .L8
    subl    , %edx
    fsubs   .LC2
    jne     .L9

The test results are similar to 64-bit, 0.85 seconds for double, and 2.78 seconds for float, which is a little faster than 64-bit float.