For setting up a distributed crawler, I believe your main thing is still in the crawler code. Just split the content into multiple small services. You only need to design it to be scalable at the beginning, or simply put If you can add a node by adding an IP port, there is no difference between a single machine and multiple machines. You can start it on a single machine from the beginning. After the initial demo is completed, for example, if it is completed, you can crawl a page and then distribute the sub-resources inside. Go to different nodes (other processes that run on a single machine from the beginning) to load, and then find a multi-machine to run it. As for the machine, aws and ali cloud are all ok, or if you like to mess around, you can learn about Raspberry Pi.
If your laptop has 8 GB of memory, there is no big problem in running 4 or 5 server version (no GUI) Liunx virtual machines. Free VirtualBox, paid VMWare, and Hyper-V that comes with Windows can all be used.
For cloud services, you can consider using Amazon AWS, write the code locally, and then turn on the machine on Amazon. Just find a smaller machine, which is billed by time. You can just turn it off after completing the experiment.
vagrant, a virtual development environment based on virtualbox, can run multiple virtual machines on one machine, and can configure IP networking Official website http://www.vagrantup.com/ Simple usage method: http://segmentfault .com/a/1190000000264347
Commonly used virtual machine software can be used, such as virtualbox. Linux can also run with 128M memory. If the database is fast, it will consume more memory.
For setting up a distributed crawler, I believe your main thing is still in the crawler code. Just split the content into multiple small services. You only need to design it to be scalable at the beginning, or simply put If you can add a node by adding an IP port, there is no difference between a single machine and multiple machines. You can start it on a single machine from the beginning. After the initial demo is completed, for example, if it is completed, you can crawl a page and then distribute the sub-resources inside. Go to different nodes (other processes that run on a single machine from the beginning) to load, and then find a multi-machine to run it. As for the machine, aws and ali cloud are all ok, or if you like to mess around, you can learn about Raspberry Pi.
If your laptop has 8 GB of memory, there is no big problem in running 4 or 5 server version (no GUI) Liunx virtual machines.
Free VirtualBox, paid VMWare, and Hyper-V that comes with Windows can all be used.
For cloud services, you can consider using Amazon AWS, write the code locally, and then turn on the machine on Amazon. Just find a smaller machine, which is billed by time. You can just turn it off after completing the experiment.
vagrant, a virtual development environment based on virtualbox, can run multiple virtual machines on one machine, and can configure IP networking
Official website http://www.vagrantup.com/
Simple usage method: http://segmentfault .com/a/1190000000264347
Heavyweight solution: virtual machine, you can use virtualbox, vmware or API-based Vagrant
Lightweight solution: docker
Commonly used virtual machine software can be used, such as virtualbox.
Linux can also run with 128M memory.
If the database is fast, it will consume more memory.