so cool! Old iPhone, iPad, and MacBook devices form a heterogeneous cluster and can run Llama 3-AI-php.cn

so cool! Old iPhone, iPad, and MacBook devices form a heterogeneous cluster and can run Llama 3

PHPz

Release： 2024-07-19 05:09:59

Original

997 people have browsed it

If you have idle equipment, maybe you can give it a try.

This time, the hardware device in your hand can also flex its muscles in the field of AI.

By combining iPhone, iPad, and Macbook, you can assemble a "heterogeneous cluster inference solution" and then run the Llama3 model smoothly.

太酷了！iPhone、iPad、MacBook老旧设备组成异构集群，能跑Llama 3

It is worth mentioning that this heterogeneous cluster can be a Windows system, Linux, or iOS system, and support for Android will be coming soon. The heterogeneous cluster is running.

According to the project author @evilsocket, this heterogeneous cluster includes iPhone 15 Pro Max, iPad Pro, MacBook Pro (M1 Max), NVIDIA GeForce 3080, 2x NVIDIA Titan X Pascal. All code has been uploaded to GitHub. Seeing this, netizens expressed that this old man is indeed not simple.

However, some netizens are beginning to worry about energy consumption. Regardless of speed, the electricity bill cannot be afforded. Moving data back and forth causes too much loss.

太酷了！iPhone、iPad、MacBook老旧设备组成异构集群，能跑Llama 3

Project Introduction

太酷了！iPhone、iPad、MacBook老旧设备组成异构集群，能跑Llama 3

The implementation of the above functions is inseparable from a Rust framework called Cake. Cake can complete distributed inference of large models (such as Llama3) and is designed to combine consumer-grade hardware into heterogeneous clusters. The consumer-grade hardware uses a variety of operating systems, including: iOS, Android, macOS, Linux and Windows, so that AI is more accessible.

Project address: https://github.com/evilsocket/cake

The main idea of Cake is to shard transformer blocks to multiple devices to be able to run inference on models that typically do not fit into the GPU memory of a single device . Inference on consecutive transformer blocks on the same worker thread is done in batches to minimize delays caused by data transfer. 太酷了！iPhone、iPad、MacBook老旧设备组成异构集群，能跑Llama 3

Cake currently supports the following systems and devices:

Compile

太酷了！iPhone、iPad、MacBook老旧设备组成异构集群，能跑Llama 3

After installing Rust, run the following code:

cargo build --release

Copy after login

If the user wants to generate iOS bindings in the application, you can proceed as follows Described operation:

make ios

Copy after login

Use

to run the worker node:

cake-cli --model /path/to/Meta-Llama-3-8B \ # model path, read below on how to optimize model size for workers --mode worker \# run as worker --name worker0 \ # worker name in topology file --topology topology.yml \# topology         --address 0.0.0.0:10128            # bind address

Copy after login

Run the master node:

cake-cli --model /path/to/Meta-Llama-3-8B \         --topology topology.yml

Copy after login

The topology.yml determines which layers are served by which worker:

linux_server_1:host: &#39;linux_server.host:10128&#39;description: &#39;NVIDIA Titan X Pascal (12GB)&#39;layers:- &#39;model.layers.0-5&#39;linux_server_2:host: &#39;linux_server2.host:10128&#39;description: &#39;NVIDIA GeForce 3080 (10GB)&#39;layers:- &#39;model.layers.6-16&#39;iphone:host: &#39;iphone.host:10128&#39;description: &#39;iPhone 15 Pro Max&#39;layers:- &#39;model.layers.17&#39;ipad:host: &#39;ipad.host:10128&#39;description: &#39;iPad&#39;layers:- &#39;model.layers.18-19&#39;macbook:host: &#39;macbook.host:10128&#39;description: &#39;M1 Max&#39;layers:    - &#39;model.layers.20-31&#39;

Copy after login

About Memory and disk space optimization issues. Users may want to provide workers with only the data actually needed in the model, rather than the entire folder. In this case, cake-split-model can be used. For example, to generate a smaller version of llama3 safetensors, you can use the following code:

cake-split-model --model-path path/to/Meta-Llama-3-8B \ # source model to split --topology path/to/topology.yml \# topology file                 --output output-folder-name

Copy after login

Reference link: https://x.com/tuturetom/status/1812654489972973643

The above is the detailed content of so cool! Old iPhone, iPad, and MacBook devices form a heterogeneous cluster and can run Llama 3. For more information, please follow other related articles on the PHP Chinese website!