Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without understanding the underlying details of distribution. Make full use of the power of clusters for high-speed computing and storage.
Although Hadoop is written in java, Hadoop provides Hadoop stream. Hadoop stream provides an API that allows users to write map functions and reduce using any language. function. (Recommended learning: PHP video tutorial)
The key to Hadoop flow is that it uses UNIX standard flow as the interface between the program and Hadoop. Therefore, as long as any program can read data from the standard input stream and write data to the standard output stream, the map function and reduce function of the MapReduce program can be written in any language through the Hadoop stream.
For example:
bin/hadoop jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper /usr/local/hadoop/mapper.php -reducer /usr/local/hadoop/reducer.php -input test/* -output out4
The package introduced by Hadoop streaming: hadoop-streaming-0.20.203.0.jar. There is no hadoop-streaming.jar in the Hadoop root directory because streaming is a contrib. So you have to find it under contrib. Taking hadoop-0.20.2 as an example, it is here:
-input: Specify the path to the input hdfs file
-output: Specify the path to the output hdfs file
-mapper: Specify the map function
-reducer: Specify the reduce function
mapper function
mapper.php file, write Enter the following code:
#!/usr/local/php/bin/php <?php $word2count = array(); // input comes from STDIN (standard input) // You can this code :$stdin = fopen(“php://stdin”, “r”); while (($line = fgets(STDIN)) !== false) { // remove leading and trailing whitespace and lowercase $line = strtolower(trim($line)); // split the line into words while removing any empty string $words = preg_split('/\W/', $line, 0, PREG_SPLIT_NO_EMPTY); // increase counters foreach ($words as $word) { $word2count[$word] += 1; } } // write the results to STDOUT (standard output) // what we output here will be the input for the // Reduce step, i.e. the input for reducer.py foreach ($word2count as $word => $count) { // tab-delimited echo $word, chr(9), $count, PHP_EOL; } ?>
The above is the detailed content of Can php use hadoop?. For more information, please follow other related articles on the PHP Chinese website!