How are Atomic Floating-Point and Vector Operations Handled on x86

How are Atomic Floating-Point and Vector Operations Handled on x86_64 Architectures?

Mary-Kate Olsen

Release： 2024-12-07 06:08:15

Original

574 people have browsed it

How are Atomic Floating-Point and Vector Operations Handled on x86_64 Architectures?

Atomic Floating Point Operations on x86_64

While C does not natively support atomic double operations, it does provide lock-free atomic implementations on most platforms. These implementations typically use compare-and-swap (CAS) operations with lock cmpxchg instruction.

For atomic vector operations on x86_64, there is no direct hardware support. However, aligned 128-bit and 256-bit loads and stores are generally guaranteed to be atomic. For non-aligned vector operations, the atomicity guarantees are less clear.

Assembly-Level Support for Double and Vector Operations

x86_64 provides assembly-level support for atomic operations on doubles and vectors:

Doubles: Atomic loads, stores, and add/subtract/multiply operations are supported through the memory-destination instructions movsd, movq, addsd, subsd, and mulsd.
Vectors: Aligned 128-bit and 256-bit loads and stores are atomic on x86_64 with AVX support. For non-aligned vector operations, there is no direct hardware guarantee of atomicity.

MSVC 2017 Implementation of Lock-Free atomic

MSVC 2017 implements lock-free atomic operations using double-width integer registers. For example, the load operation involves:

CAS: movq QWORD PTR [dst_addr], rax  // 64-bit CAS

Copy after login

The add operation uses:

CAS: lock cmpxchg16b QWORD PTR [dst_addr], rax  // 128-bit CAS

Copy after login

Atomic RMW (Read-Modify-Write) Operations

Atomic read-modify-write (RMW) operations, such as fetch_add, require a CAS loop implementation. On x86_64, the CAS instruction supports 16-byte operations (cmpxchg16b).

CAS: lock cmpxchg16b QWORD PTR [dst_addr], rax

Copy after login

While CAS loops provide atomic RMW functionality, they are more expensive than atomic loads and stores.

Additional Notes

Some non-x86 hardware supports atomic add operations for float/double types.
Intel's Transactional Memory Extensions (TSX) provide improved support for atomic FP and SIMD operations.
Compilers often generate inefficient code for atomic operations, but improvements are being made.
Atomic operations on shared arrays of aligned doubles should be safe, while operations on unaligned vectors may involve tearing.
It is possible to implement atomic operations on 16-byte objects using cmpxchg16b, but performance will be poor.

The above is the detailed content of How are Atomic Floating-Point and Vector Operations Handled on x86_64 Architectures?. For more information, please follow other related articles on the PHP Chinese website!