Use HBase in Go language to implement efficient NoSQL database applications-Golang-php.cn

With the advent of the big data era, the storage and processing of massive data is particularly important. In terms of NoSQL databases, HBase is currently a widely used solution. As a statically strongly typed programming language, Go language is increasingly used in fields such as cloud computing, website development, and data science due to its simple syntax and excellent performance. This article will introduce how to use HBase in Go language to implement efficient NoSQL database applications.

HBase Introduction

HBase is a highly scalable, highly reliable, column-based distributed data storage system. It runs on a Hadoop cluster and can handle extremely large-scale data storage and processing tasks. HBase's data model is similar to Google's Bigtable, a column-based NoSQL database. HBase has the following characteristics:

Based on the Hadoop distributed computing platform, it can store PB-level data on thousands of machines.
Supports fast reading and writing of data, and the storage and access speed is very fast.
Supports multiple methods of data access such as random reading, scan reading, and full table scanning.
Supports the storage and query of multi-version data and can effectively process time series data.
Supports horizontal expansion and can easily expand storage and processing capabilities.
Provides a series of filters and encoders to support data processing and transformation.

Go language operates HBase

Go language provides the Thrift library to implement operations on HBase. Thrift is a cross-language framework under Apache that can generate code in multiple languages, including Java, Python, Ruby, C, etc. Thrift allows developers to define RPC services using a simple definition language and generate client-side and server-side code. In the Go language, you can use the thriftgo library for development.

2.1 Install Thrift

Before using Thrift, you first need to install the Thrift compiler. You can download the corresponding version of the compiler from the Thrift official website, decompress it and add it to the environment variables.

2.2 Define the Thrift interface of HBase

The Thrift definition file is called IDL (Interface Definition Language, interface definition language). The Thrift interface file of HBase is Hbase.thrift. It can be downloaded from the official documentation or from github via the git clone command.

$ git clone https://github.com/apache/hbase

All Thrift interface definitions of HBase can be found in the Hbase.thrift file, and we can choose to use them as needed. For example, the following is an interface definition that lists tables:

struct TColumnDescriptor {

1: required binary name,
2: binary value,
3: bool __isset.value,
4: optional CompressionType compression,
5: optional int32 maxVersions,
6: optional int32 minVersions,
7: optional int32 ttl,
8: optional bool inMemory,
9: optional BloomType bloomFilterType,
10: optional int32 scope,
11: optional bool __isset.compression,
12: optional bool __isset.maxVersions,
13: optional bool __isset.minVersions,
14: optional bool __isset.ttl,
15: optional bool __isset.inMemory,
16: optional bool __isset.bloomFilterType,
17: optional bool __isset.scope

Copy after login

}

TColumnDescriptor can be thought of as the definition of a column family, which includes the column family name , compression type, maximum version, expiration time, memory storage and other attributes. In Go language, you need to use the Thrift compiler to compile the Hbase.thrift file into Go language code. The thriftgo library needs to be installed before compilation.

$ go get -u github.com/apache/thrift/lib/go/thrift

Then, execute the following command in the HBase directory to generate Go language code.

$ thrift --gen go src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift

After executing the command, it will be in the generated gen-go directory See all generated Go language code files.

2.3 Connecting to the HBase server

Connecting to the HBase server requires creating a Transport link and using a connection pool to manage the link. The connection pool can maintain multiple Transport links, and reusing these links can improve overall throughput. The following is a code example for connecting to HBase:

package main

import (

"context"
"fmt"
"sync"

"git.apache.org/thrift.git/lib/go/thrift"
"hbase"

Copy after login

)

type pool struct {

hosts    []string         // HBase服务器地址列表
timeout  thrift.TDuration // 连接超时时间
size     int              // 连接池大小
pool     chan *conn       // 连接池
curConns int              // 当前连接池中的连接数

lock sync.RWMutex

Copy after login

}

type conn struct {

trans hbase.THBaseServiceClient // HBase客户端
used  bool                      // 是否被使用

Copy after login

}

// NewPool initializes the connection pool
func NewPool(hosts []string, timeout int, size int) *pool {

p := &pool{
    hosts:    hosts,
    timeout:  thrift.NewTDuration(timeout * int(thrift.MILLISECOND)),
    size:     size,
    pool:     make(chan *conn, size),
    curConns: 0,
}

p.lock.Lock()
defer p.lock.Unlock()

for i := 0; i < size; i++ {
    p.newConn()
}

return p

Copy after login

}

// AddConn Add connection
func (p *pool) AddConn() {

p.lock.Lock()
defer p.lock.Unlock()

if p.curConns < p.size {
    p.newConn()
}

Copy after login

}

// Close Close the connection pool
func (p *pool) Close() {

p.lock.Lock()
defer p.lock.Unlock()

for i := 0; i < p.curConns; i++ {
    c := <-p.pool
    _ = c.trans.Close()
}

Copy after login

}

// GetConn Get the connection
func (p pool) GetConn() ( conn, error) {

select {
case conn := <-p.pool:
    if conn.used {
        return nil, fmt.Errorf("Connection is already in use")
    }

    return conn, nil
default:
    if p.curConns >= p.size {
        return nil, fmt.Errorf("Connection pool is full")
    }

    p.lock.Lock()
    defer p.lock.Unlock()

    return p.newConn(), nil
}

Copy after login

}

// PutConn returns the connection
func (p pool) PutConn(conn conn) {

conn.used = false
p.pool <- conn

Copy after login

}

// newConn Create connection
func (p pool) newConn() conn {

socket := thrift.NewTSocketTimeout(p.hosts[0], p.timeout)
transport := thrift.NewTFramedTransport(socket)
protocol := thrift.NewTBinaryProtocolTransport(transport, true, true)
client := hbase.NewTHBaseServiceClientFactory(transport, protocol)

if err := transport.Open(); err != nil {
    return nil
}

p.curConns++

return &conn{
    trans: client,
    used:  false,
}

Copy after login

}

Use The above code example can create a connection pool to connect to HBase. After setting parameters such as hosts, timeout and size, you can use the NewPool method to create a connection pool. Connections in the connection pool can be obtained using the GetConn method and returned by the PutConn method.

2.4 Operation on data

After connecting to the HBase server, you can use the connection in the connection pool to operate on the data. Here are some examples of operations on data:

// Get a list of tables
func GetTableNames(c *conn) ([]string, error) {

names, err := c.trans.GetTableNames(context.Background())
if err != nil {
    return nil, err
}

return names, nil

Copy after login

}

// Get a row of data
func GetRow(c conn, tableName string, rowKey string) (hbase.TRowResult_, error) {

// 构造Get请求
get := hbase.NewTGet()
get.Row = []byte(rowKey)
get.TableName = []byte(tableName)

result, err := c.trans.Get(context.Background(), get)
if err != nil {
    return nil, err
}

if len(result.Row) == 0 {
    return nil, fmt.Errorf("Row %s in table %s not found", rowKey, tableName)
}

return result, nil

Copy after login

}

// Write a row of data
func PutRow(c *conn, tableName string, rowKey string, columns map[string]map[string][]byte,

         timestamp int64) error {
// 构造Put请求
put := hbase.NewTPut()
put.Row = []byte(rowKey)
put.TableName = []byte(tableName)

for cf, cols := range columns {
    family := hbase.NewTColumnValueMap()

    for col, val := range cols {
        family.Set(map[string][]byte{
            col: val,
        })
    }

    put.ColumnValues[[]byte(cf)] = family
}

put.Timestamp = timestamp

_, err := c.trans.Put(context.Background(), put)
if err != nil {
    return err
}

return nil

Copy after login

}

## The #GetTableNames method can get a list of tables, the GetRow method can get a row of data, and the PutRow method can write a row of data. It should be noted that the TPut request needs to be constructed in the PutRow method.

Summary

This article introduces how to use HBase in Go language to implement efficient NoSQL database applications. From defining the Thrift interface, connecting to the HBase server to operating data, it explains step by step how to use Go language to operate HBase. With the high performance of the Go language and the cross-language features of the Thrift framework, efficient NoSQL database applications can be built.

The above is the detailed content of Use HBase in Go language to implement efficient NoSQL database applications. For more information, please follow other related articles on the PHP Chinese website!