How Can We Optimize NTT Transform and Modular Arithmetic for Faster Bignum Squaring?-C++-php.cn

How Can We Optimize NTT Transform and Modular Arithmetic for Faster Bignum Squaring?

DDD

Release： 2024-12-19 03:14:13

Original

465 people have browsed it

How Can We Optimize NTT Transform and Modular Arithmetic for Faster Bignum Squaring?

Modular arithmetics and NTT (finite field DFT) optimizations

Problem:

In a project involving squaring using fast bignum, the implementation with NTT is slow, even for large numbers (12000 bits or more).

Question:

Is there a way to optimize the NTT transform?
Is there a way to speed up modular arithmetics?

Solution:

Optimized NTT Transform

Separating the NTT_fast function into two functions, one with WW[] and the other with iWW[], leads to one less parameter in recursion calls. This provides a small performance improvement.

Optimized Modular Arithmetics

The original code used a series of branching and safety checks, which can be eliminated by utilizing bit tricks and rearranging the main loop. The assembly code for modmul was also modified for efficiency.

New Code

class fourier_NTT
{
    // ...
    // Modular arithmetics (optimized, but it works only for p >= 0x80000000!!!)
    inline uint32 modadd(uint32 a, uint32 b)
    {
        uint32 d;
        //d = a + b;
        //!if (d < a)
        //    d -= p;
        //!if (d >= p)
        //    d -= p;
         __asm {
            mov eax, a;
            add eax, b
        }
        return d;
    }

    inline uint32 modsub(uint32 a, uint32 b)
    {
        uint32 d;
        //d = a - b;
        //!if (a < b)
        // a - b < 0 when a < b, so no need to check
        //!    d += p;
        __asm {
            mov eax, a;
            sub eax, b
        }
        return d;
    }

    inline uint32 modmul(uint32 a, uint32 b)
    {
        uint32 _a = a;
        uint32 _b = b;   

        __asm
        {
        mov eax, a;
        mul b;
        mov ebx, eax;
        mov eax, 2863311530;
        mov ecx, edx;
        mul edx;
        shld edx, eax, 1;
        mov eax, 3221225473;

        mul edx;
        sub ebx, eax;
        mov eax, 3221225473;
        sbb ecx, edx;
        jc addback;

            neg ecx;
            and ecx, eax;
            sub ebx, ecx;

        sub ebx, eax;
        sbb edx, edx;
        and eax, edx;
            addback:
        add eax, ebx;          
        };  
    }
    // ...
};

Copy after login

Conclusion

These optimizations result in a 1.35x speedup of the NTT transform and modular arithmetics. The code is provided in C and inline assembly.

The above is the detailed content of How Can We Optimize NTT Transform and Modular Arithmetic for Faster Bignum Squaring?. For more information, please follow other related articles on the PHP Chinese website!