In recent projects, I need to use thrift and php to read and write related data in HBase, so I sorted out the relevant classes and did some tests.
The main ways I use to operate HBase are as follows:
1. HBase Shell, mainly execute the shell after configuration to view the data in HBase through commands, such as count 'xxxx', scan 'xxxx', etc.
2. Through Native Java Api, we encapsulate a RESTfull Api and operate HBase through the provided Api (http) method
3. Use Thrift's serialization technology. Thrift supports C++, PHP, Python and other languages, which is suitable for other heterogeneous systems to operate HBase. I have just tried this
4. Use HBasExplorer, a graphical client written before to operate HBase, http://www.cnblogs.com/scotoma/archive/2012/12/18/2824311.html
5. Hive/Pig, this has not been really used yet.
Currently we are mainly talking about the third method Thrift, which is open sourced by Facebook. The official website is http://thrift.apache.org/ .
Download, install and start, please see the content in the reference article
Check whether the run is successful...
Use PHP class files to operate Hbase. How to generate class files, please see the production method in the reference article. However, the generation method I tested myself has bugs. The namespace in the generated class files is empty, but from the official source code The library generates namespace Hbase, so you need to pay attention here.
I debugged a driver class file and put it on github. You can download it if you need it.
https://github.com/xinqiyang/buddy/tree/master/Vender/thrift
Next, perform the test operation. Refer to the test class here at http://blog.csdn.net/hguisu/article/details/7298456, write a test, and debug it
<?php /*** Thrift Test Class by xinqiyang */ ini_set('display_error', E_ALL); $GLOBALS['THRIFT_ROOT'] = './lib'; /* Dependencies. In the proper order. */ require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Transport/TTransport.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Transport/TSocket.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Protocol/TProtocol.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Protocol/TBinaryProtocol.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Transport/TBufferedTransport.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Type/TMessageType.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Factory/TStringFuncFactory.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/StringFunc/TStringFunc.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/StringFunc/Core.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Type/TType.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Exception/TException.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Exception/TTransportException.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Thrift/Exception/TProtocolException.php'; /* Remember these two files? */ require_once $GLOBALS['THRIFT_ROOT'].'/Types.php'; require_once $GLOBALS['THRIFT_ROOT'].'/Hbase.php'; use Thrift\Protocol\TBinaryProtocol; use Thrift\Transport\TSocket; use Thrift\Transport\TSocketPool; use Thrift\Transport\TFramedTransport; use Thrift\Transport\TBufferedTransport; use Hbase\HbaseClient; //define host and port $host = '192.168.56.56'; $port = 9090; $socket = new Thrift\Transport\TSocket($host, $port); $transport = new TBufferedTransport($socket); $protocol = new TBinaryProtocol($transport); // Create a calculator client $client = new HbaseClient($protocol); $transport->open(); //echo "Time: " . $client -> time(); $tables = $client->getTableNames(); sort($tables); foreach ($tables as $name) { echo $name."\r\n"; } //create a fc and then create a table $columns = array( new \Hbase\ColumnDescriptor(array( 'name' => 'id:', 'maxVersions' => 10 )), new \Hbase\ColumnDescriptor(array( 'name' => 'name:' )), new \Hbase\ColumnDescriptor(array( 'name' => 'score:' )), ); $tableName = "student"; /* try { $client->createTable($tableName, $columns); } catch (AlreadyExists $ae) { var_dump( "WARN: {$ae->message}\n" ); } */ // get table descriptors $descriptors = $client->getColumnDescriptors($tableName); asort($descriptors); foreach ($descriptors as $col) { var_dump( " column: {$col->name}, maxVer: {$col->maxVersions}\n" ); } //set clomn //add update column data $time = time(); var_dump($time); $row = '2'; $valid = "foobar-".$time; $mutations = array( new \Hbase\Mutation(array( 'column' => 'score', 'value' => $valid )), ); $mutations1 = array( new \Hbase\Mutation(array( 'column' => 'score:a', 'value' => $time, )), ); $attributes = array ( ); //add row, write a row $row1 = $time; $client->mutateRow($tableName, $row1, $mutations1, $attributes); echo "-------write row $row1 ---\r\n"; //update row $client->mutateRow($tableName, $row, $mutations, $attributes); //get column data $row_name = $time; $fam_col_name = 'score:a'; $arr = $client->get($tableName, $row_name, $fam_col_name, $attributes); // $arr = array foreach ($arr as $k => $v) { // $k = TCell echo " ------ get one : value = {$v->value} , <br> "; echo " ------ get one : timestamp = {$v->timestamp} <br>"; } echo "----------\r\n"; $arr = $client->getRow($tableName, $row_name, $attributes); // $client->getRow return a array foreach ($arr as $k => $TRowResult) { // $k = 0 ; non-use // $TRowResult = TRowResult var_dump($TRowResult); } echo "----------\r\n"; /****** //no test public function scannerOpenWithScan($tableName, \Hbase\TScan $scan, $attributes); public function scannerOpen($tableName, $startRow, $columns, $attributes); public function scannerOpenWithStop($tableName, $startRow, $stopRow, $columns, $attributes); public function scannerOpenWithPrefix($tableName, $startAndPrefix, $columns, $attributes); public function scannerOpenTs($tableName, $startRow, $columns, $timestamp, $attributes); public function scannerOpenWithStopTs($tableName, $startRow, $stopRow, $columns, $timestamp, $attributes); public function scannerGet($id); public function scannerGetList($id, $nbRows); public function scannerClose($id); */ echo "----scanner get ------\r\n"; $startRow = '1'; $columns = array ('column' => 'score', ); // $scan = $client->scannerOpen($tableName, $startRow, $columns, $attributes); //$startAndPrefix = '13686667'; //$scan = $client->scannerOpenWithPrefix($tableName,$startAndPrefix,$columns,$attributes); //$startRow = '1'; //$stopRow = '2'; //$scan = $client->scannerOpenWithStop($tableName, $startRow, $stopRow, $columns, $attributes); //$arr = $client->scannerGet($scan); $nbRows = 1000; $arr = $client->scannerGetList($scan, $nbRows); var_dump('count of result :'.count($arr)); foreach ($arr as $k => $TRowResult) { // code... //var_dump($TRowResult); } $client->scannerClose($scan); //close transport $transport->close();
Here we operate createTable, Insert Row, Get Table, Update Row, Scan Table, these commonly used ones. Let’s get familiar with them first.
During actual operation, you need to pay attention to:
The version of 1.php needs to support namespace, so it needs support of 5.3 or above
2. Install thrift's php extension. It seems that this is not actually used. You still have to use the relevant php file. Who can write an extension? I don't know if the performance can be improved.
3. For scan-related operations, I tested start/stop and prefix Scan, and it seems to be OK.
4. I feel that the namespace of php is very frustrating, what should I do... The segmentation feels so unauthentic...
Next, if I have time, I will do several other operations, conduct stress testing, and deploy this to the cluster.
Everyone is welcome to communicate with Thrift. Thanks to hguisu for writing this article (reference article) so that everyone can get started as soon as possible.
Update content:
20130517 After starting Thrift on the cluster, I found that the write operation was still unstable and had serious timeouts. For this operation, the PHP operation class needs to be optimized. In fact, I feel that the operation class is still too written. It’s complicated.
Reference article:
http://blog.csdn.net/hguisu/article/details/7298456