Home>Article>Operation and Maintenance> what is apache spark

what is apache spark

藏色散人
藏色散人 Original
2019-06-11 13:47:00 3936browse

what is apache spark

Apache Sparkis an open source cluster computing framework originally developed by AMPLab at the University of California, Berkeley. Compared with Hadoop's MapReduce, which stores intermediary data on disk after running the work, Spark uses in-memory computing technology to analyze and perform operations in memory before the data is written to the hard disk.

Spark can run programs in memory 100 times faster than Hadoop MapReduce. Even when running programs on hard disk, Spark can run 10 times faster. Spark allows users to load data into cluster storage and query it multiple times, making it ideal for machine learning algorithms.

Using Spark requires a cluster administrator and distributed storage system. Spark supports standalone mode (local Spark cluster), Hadoop YARN or Apache Mesos cluster management.

In terms of distributed storage, Spark can be equipped with interfaces such as HDFS, Cassandra, OpenStack Swift and Amazon S3. Spark also supports pseudo-distributed local mode, but it is usually only used for development or testing to replace the distributed storage system with the local file system. In such cases, Spark only uses each CPU core on one machine to run the program.

In 2014, more than 465 contributors invested in Spark development, making it the most active project among the Apache Software Foundation and many open source projects of big data.

For more Apache related knowledge, please visit theApache usage tutorialcolumn!

The above is the detailed content of what is apache spark. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:What software is apache? Next article:What software is apache?