DeepSeek significantly boosted open-source capabilities on February 28, 2025, unveiling the Fire-Flyer File System (3FS) and the Smallpond data processing framework. These tools are designed to revolutionize data access and processing, particularly for AI training and inference.
? Day 5 of #OpenSourceWeek: 3FS, a powerful engine for all DeepSeek data access.
Fire-Flyer File System (3FS) – a parallel file system maximizing the bandwidth of modern SSDs and RDMA networks.
⚡ 6.6 TiB/s aggregate read throughput (180-node cluster) ⚡ 3.66 TiB/min…
— DeepSeek (@deepseek_ai) February 28, 2025
3FS is a high-performance, distributed file system built for modern SSDs and RDMA networks. It offers a robust shared storage solution, simplifying distributed application development.
Remote Direct Memory Access (RDMA) bypasses operating system limitations, enabling direct data transfer between the memory of two computers. This results in faster, more efficient communication.
Extensive testing validates 3FS performance. A read stress test on a large cluster achieved 6.6 TiB/s aggregate read throughput, even with concurrent training job traffic.
Smallpond, designed to complement 3FS, is a lightweight, distributed data processing framework. It uses DuckDB as its compute engine and stores data in Parquet format on a distributed file system (like 3FS).
Clone the repository and install dependencies:
git clone https://github.com/deepseek-ai/3fs
cd 3fs
git submodule update --init --recursive
./patches/apply.sh
Consult the 3FS documentation for further details.
Ensure Python 3.8 is installed.
Install Smallpond: pip install smallpond
Initialize a Smallpond session: import smallpond; sp = smallpond.init()
Load Parquet data: df = sp.read_parquet("path/to/dataset/*.parquet")
Repartition data (examples):
df = df.repartition(3)
df = df.repartition(3, by_row=True)
df = df.repartition(3, hash_by="host")
Transform data (examples):
df = df.map('a b as c')
df = df.map(lambda row: {'c': row['a'] row['b']})
Save data: df.write_parquet("path/to/output/dataset.parquet")
Run a Smallpond job: sp.run(df)
Smallpond offers monitoring and debugging tools. Log analysis helps resolve execution issues. Comprehensive documentation, tutorials, and use cases are available through the official support channels.
The open-source release of 3FS and Smallpond represents a significant advancement in data processing. Their high performance, ease of use, and consistency empower developers and researchers. These tools provide a powerful infrastructure for modern, data-intensive applications.
The above is the detailed content of DeepSeek Releases 3FS & Smallpond Framework. For more information, please follow other related articles on the PHP Chinese website!