[Recommended collection] Soul torture! Zookeeper’s 31-shot cannon-javaTutorial-php.cn

Zookeeper core knowledge summary

[Recommended collection] Soul torture! Zookeeper's 31-shot cannon

Please read the title

What is ZooKeeper?
What does ZooKeeper offer?
Zookeeper file system
How does Zookeeper ensure that the status of master and slave nodes is synchronized?
Four types of data node Znode
Zookeeper Watcher mechanism--data change notification
How is client registration Watcher implemented?
How is the server-side processing of Watcher implemented?
How does the client call back Watcher?
Are you familiar with the ACL permission control mechanism?
Do you know about Chroot features?
Are you familiar with session management
What are the roles of the server?
Server working status under Zookeeper
How is the data synchronized?
How does zookeeper ensure the sequential consistency of transactions?
Why is there a Master node in a distributed cluster?
zk How to deal with node downtime?
#The difference between zookeeper load balancing and nginx load balancing
What are the deployment modes of Zookeeper?
What are the minimum number of machines required for a cluster? What are the cluster rules? There are 3 servers in the cluster, and one of the nodes is down. Can Zookeeper still be used at this time?
Does the cluster support dynamic addition of machines?
Is Zookeeper's watch notification for nodes permanent? Why is it not permanent?
#What are the java clients of Zookeeper?
What is chubby, and how do you think it compares to zookeeper?
Let’s talk about some commonly used commands of zookeeper.
What are the connections and differences between ZAB and Paxos algorithms?
Typical application scenarios of Zookeeper
What are the functions of Zookeeper?
Tell me about Zookeeper’s notification mechanism?
The relationship between Zookeeper and Dubbo?

How many answers can you give?

1. What is ZooKeeper?

ZooKeeper is an open source distributed coordination service. It is a software that provides consistency services for distributed applications. Distributed applications can implement tasks such as data publishing/subscription, load balancing, naming service, distributed coordination/notification, cluster management, Master election, distributed locks and Distributed queues and other functions.

The goal of ZooKeeper is to encapsulate complex and error-prone key services and provide users with simple and easy-to-use interfaces and a system with efficient performance and stable functions.

Zookeeper guarantees the following distributed consistency features:

Sequential consistency
Atomicity
Single view
Reliability
Real-time (eventual consistency)

The client's read request can be processed by any machine in the cluster. If the read request has a listener registered on the node, the listener will also be processed by the connected zookeeper machine. For write requests, these requests will be sent to other zookeeper machines at the same time and only after consensus is reached, the request will return successfully. Therefore, as the number of zookeeper cluster machines increases, the throughput of read requests will increase but the throughput of write requests will decrease.

Orderliness is a very important feature in zookeeper. All updates are globally ordered. Each update has a unique timestamp. This timestamp is called zxid (Zookeeper Transaction Id) . The read request will only be in order relative to the update, that is, the return result of the read request will contain the latest zxid of the zookeeper.

2. What does ZooKeeper provide?

File system
Notification mechanism

3. Zookeeper file system

Zookeeper provides a multi-level node namespace (nodes are called znodes). Different from the file system, these nodes can set associated data. In the file system, only the file nodes can store data but not the directory nodes.Welcome to follow"Interview Column" to get more interviewdry information.

In order to ensure high throughput and low latency, Zookeeper maintains this tree-like directory structure in memory. This feature prevents Zookeeper from being used to store large amounts of data. The upper limit of data storage for each node is is 1M.

#4. How does Zookeeper ensure the status synchronization of master and slave nodes?

The core of Zookeeper is the atomic broadcast mechanism, which ensures synchronization between servers. The protocol that implements this mechanism is called the Zab protocol. The Zab protocol has two modes, namely recovery mode and broadcast mode.

Recovery Mode

Zab enters recovery mode when the service starts or after the leader crashes. When the leader is elected and a majority After the server completes the status synchronization with the leader, the recovery mode ends. State synchronization ensures that the leader and server have the same system state.

Broadcast mode

Once the leader has synchronized the status with most followers, it can start broadcasting messages, that is, entering broadcast state. At this time, when a server joins the ZooKeeper service, it will start in recovery mode, discover the leader, and synchronize its status with the leader. When synchronization is completed, it also participates in message broadcasting. The ZooKeeper service remains in the Broadcast state until the leader crashes or the leader loses most of its follower support.

5. Talk about what data nodes zookeeper has

PERSISTENT- Persistent node

#Unless manually deleted, the node always exists on Zookeeper

EPHEMERAL-Temporary node

The life cycle of temporary nodes is bound to the client session. Once the client session expires (the disconnection between the client and zookeeper does not necessarily mean that the session expires), then all temporary nodes created by the client will be removed. .

PERSISTENT_SEQUENTIAL-persistent sequence node

The basic characteristics are the same as the persistent node, except that the sequence attribute is added, after the node name An auto-increasing integer number maintained by the parent node will be appended.

EPHEMERAL_SEQUENTIAL-Temporary sequence node

The basic characteristics are the same as the temporary node, with the addition of a sequence attribute. A node maintained by the parent node will be appended to the node name. An auto-increasing integer number.

6. Talk about the Zookeeper Watcher mechanism

Zookeeper allows the client to register a Watcher with a Znode on the server. Some specified events on the server trigger this Watcher. The server will send an event notification to the specified client to implement the distributed notification function, and then the client will make business changes based on the Watcher notification status and event type. Welcome to pay attention to "Interview Column" to get more interview information.

Working mechanism:

(1) Client registers watcher

(2) Server processes watcher

(3) Client callback watcher

Watcher feature summary:

(1) One-time

Whether it is the server or the client, once a Watcher is triggered, Zookeeper will remove it from the corresponding storage . This design effectively reduces the pressure on the server. Otherwise, for nodes that are updated very frequently, the server will continuously send event notifications to the client, which puts great pressure on both the network and the server.

(2) Client serial execution

The process of client Watcher callback is a serial synchronization process.

(3) Lightweight

3.1. Watcher notification is very simple. It will only tell the client that an event has occurred, but will not explain the specific content of the event.

3.2. When the client registers a Watcher with the server, it does not pass the client's real Watcher object entity to the server. It is only marked with a boolean type attribute in the client request.

(4) Watcher event is sent asynchronously

The watcher notification event is sent asynchronously from the server to the client. This creates a problem. Different clients and servers communicate through sockets. , due to network delay or other factors, the client will monitor the event at unavailable times. Since Zookeeper itself provides an ordering guarantee, that is, the client will not perceive changes in the znode it monitors until it listens to the event. Therefore, when we use Zookeeper, we cannot expect to be able to monitor every change of the node. Zookeeper can only guarantee eventual consistency, but cannot guarantee strong consistency.

(5) Register watcher getData, exists, getChildren

(6) Trigger watcher create, delete, setData

(7) When a client connects to a new server, watch will be triggered by any session event. When the connection to a server is lost, watches cannot be received. When the client reconnects, all previously registered watches will be re-registered if necessary. Usually this is completely transparent. There is only one special case where a watch may be lost: for an existing watch on an uncreated znode, if it was created while the client was disconnected and subsequently deleted before the client connected. , this watch event may be lost.

7. How the client registers Watcher implementation

(1) Call getData()/getChildren() /exist() three APIs, pass in the Watcher object

(2) Mark the request, encapsulate the Watcher to WatchRegistration

(3) Encapsulate it into a Packet object, and send the request## to the server

#(4) After receiving the server response, register the Watcher in ZKWatcherManager for management

(5) The request returns and the registration is completed.

8. How the server handles the Watcher implementation

(1) The server receives the Watcher and stores it

Receive the client request, process the request to determine whether it is necessary to register the Watcher, and if necessary, add the node path of the data node to ServerCnxn (ServerCnxn represents a connection between the client and the server, and implements the process interface of the Watcher. You can see at this time into a Watcher object) and stored in WatchTable and watch2Paths of WatcherManager.

(2) Watcher trigger

Take the server receiving the setData() transaction request to trigger the NodeDataChanged event as an example:

2.1 Encapsulate WatchedEvent

will notify The status (SyncConnected), event type (NodeDataChanged) and node path are encapsulated into a WatchedEvent object

2.2 Query Watcher

Find Watcher based on node path from WatchTable

2.3 Not found ;Indicates that no client has registered Watcher

on this data node

2.4 Find; extract and delete the corresponding Watcher from WatchTable and Watch2Paths (it can be seen from here that the Watcher is one-time on the server side and becomes invalid after being triggered once)

(3) Call the process method to Triggering Watcher

The process here is mainly to send Watcher event notification through the TCP connection corresponding to ServerCnxn.

9. How the client calls back the Watcher

The client's SendThread thread receives the event notification and passes it to the EventThread thread Callback Watcher.

The client's Watcher mechanism is also one-time. Once triggered, the Watcher becomes invalid.

10. Are you familiar with the ACL permission control mechanism?

UGO (User/Group/Others)

is currently used in Linux/Unix file systems and is also the most widely used permission control method. It is a coarse-grained file system permission control mode.

ACL (Access Control List) access control list

Includes three aspects:

##Permission Mode (Scheme)

(1) IP: Permission control from IP address granularity

(2) Digest: The most commonly used, use permission identifiers similar to username:password to configure permissions to facilitate the differentiation of different applications. Perform permission control

(3) World: The most open permission control method, which is a special digest mode with only one permission identifier "world:anyone"

(4) Super: Super User

Authorization Object

The authorization object refers to the user or a designated entity to whom permission is granted, such as an IP address or a machine light.

Permission

(1) CREATE: Data node creation permission, allowing the authorized object to create sub-nodes under the Znode

( 2) DELETE: Child node deletion permission, allowing the authorized object to delete the child node of the data node

(3) READ: Reading permission of the data node, allowing the authorized object to access the data node and read its data content Or a list of child nodes, etc.

(4) WRITE: Data node update permissions, allowing authorized objects to update the data node

(5) ADMIN: Data node management permissions, allowing authorized objects Perform ACL related setting operations on the data node

11. Do you understand the Chroot feature

After version 3.2.0 , added the Chroot feature, which allows each client to set a namespace for itself. If a client has Chroot set up, any operations the client does on the server will be restricted to its own namespace.

By setting Chroot, a client can be applied to a subtree of the Zookeeper server. In scenarios where multiple applications share a Zookeeper into the group, it is very useful to achieve mutual isolation between different applications. helpful.

12. Are you familiar with session management?

Bucketing strategy: Put similar sessions in the same block Management is performed so that Zookeeper can isolate sessions in different blocks and process the same block in a unified manner.

Distribution principle: "Next timeout time point" (ExpirationTime) of each session

Calculation formula:

ExpirationTime_ = currentTime sessionTimeout

ExpirationTime = (ExpirationTime_ / ExpirationInrerval 1) *

ExpirationInterval, ExpirationInterval refers to the Zookeeper session timeout check interval, the default is tickTime

13. What are the roles of the server

Leader

(1) Unique scheduling and processing of transaction requests To ensure the order of cluster transaction processing

(2) The scheduler of each service within the cluster

Follower

(1) Process the client’s non-transaction request and forward the transaction Request to Leader Server

(2) Participate in transaction request Proposal voting

(3) Participate in Leader election voting

Observer

(1) Version 3.0 A server role will be introduced later to improve the non-transaction processing capabilities of the cluster without affecting the cluster's transaction processing capabilities

(2) Process the client's non-transaction requests and forward the transaction requests to the Leader server

(3) Do not participate in any form of voting

14. Server working status under Zookeeper

The server has The four states are LOOKING, FOLLOWING, LEADING, and OBSERVING.

(1) LOOKING: Looking for Leader status. When the server is in this state, it will think that there is no leader in the current cluster, so it needs to enter the leader election state.

(2) FOLLOWING: follower status. Indicates that the current server role is Follower.

(3) LEADING: Leader status. Indicates that the current server role is Leader.

(4) OBSERVING: observer status. Indicates that the current server role is Observer.

15. Can you tell me how the data is synchronized?

After the entire cluster completes the Leader election, the Learner (the collective name of Follower and Observer) registers back with the Leader server. After the Learner server completes registration with the Leader server, it enters the data synchronization phase.

Data synchronization process: (all performed by messaging)

Learner registers with Leader

Data synchronization

Synchronization confirmation

Zookeeper's data synchronization is usually divided into four categories:

(1) Direct differential synchronization (DIFF synchronization)

(2) Roll back first and then differential synchronization (TRUNC DIFF synchronization)

(3) Only rollback synchronization (TRUNC synchronization)

(4) Full synchronization (SNAP synchronization)

Before data synchronization, the Leader server will complete the data synchronization initialization :

peerLastZxid:

Extract lastZxid (the last ZXID processed by the Learner server) from the ACKEPOCH message sent when the learner server registers

minCommittedLog:

Leader server Proposal cache queue committedLog minimum ZXID maxCommittedLog:
Leader server Proposal cache queue committedLog maximum ZXID directly Differential synchronization (DIFF synchronization)
Scenario: peerLastZxid is between minCommittedLog and maxCommittedLog. Roll back first and then differential synchronization (TRUNC DIFF synchronization)
Scenario: When the new Leader server discovers that a Learner server contains a transaction record that it does not have, it needs to have the Learner server perform a transaction rollback – rollback to the one that exists on the Leader server and is the closest to the one that exists on the Leader server. peerLastZxid's

ZXID rollback synchronization only (TRUNC synchronization)

Scenario: peerLastZxid is greater than maxCommittedLog

Full synchronization (SNAP synchronization)

Scenario 1: peerLastZxid is less than minCommittedLog
Scenario 2: There is no Proposal cache queue on the Leader server and peerLastZxid is not equal to lastProcessZxid

16. How does zookeeper ensure the sequential consistency of transactions?

zookeeper uses a globally incremented transaction ID to identify it. All proposals are added with a zxid when they are proposed. The zxid is actually a 64-bit The number, the high 32 bits are epoch (period; epoch; century; new era) used to identify the leader cycle. If a new leader is generated, epoch will increase automatically, and the low 32 bits are used to count up. When a new proposal is generated, it will first issue a transaction execution request to other servers based on the two-stage process of the database. If more than half of the machines can execute it and succeed, then execution will begin.

#17. Why is there a Master node in a distributed cluster?

In a distributed environment, some business logic only needs to be executed by a certain machine in the cluster, and other machines can share the results, which can greatly reduce repeated calculations. , improve performance, so leader election is required.

#18. How to deal with zk node downtime?

Zookeeper itself is also a cluster, and it is recommended to configure no less than 3 servers. Zookeeper itself also needs to ensure that when one node goes down, other nodes will continue to provide services.

If a Follower goes down, there are still 2 servers providing access. Because the data on Zookeeper has multiple copies, the data will not be lost;

If a Leader goes down machine, Zookeeper will elect a new Leader.

The mechanism of ZK cluster is that as long as more than half of the nodes are normal, the cluster can provide services normally. The cluster will fail only when there are too many ZK nodes and only half or less than half of the nodes can work.

A cluster of 3 nodes can kill 1 node (leader can get 2 votes>1.5)

A cluster of 2 nodes cannot kill any node (leader can get 1 vote <=1)

19. The difference between zookeeper load balancing and nginx load balancing

zk’s load balancing is It can be controlled, nginx can only adjust the weight, and other things that need to be controllable need to be written by yourself. However, the throughput of nginx is much greater than that of zk. It should be said that you should choose which method to use according to the business.

#20. What are the deployment modes of Zookeeper?

Zookeeper has three deployment modes:

Single-machine deployment: running on a cluster;
Cluster deployment: multiple clusters are running;
Pseudo-cluster deployment: One cluster starts multiple Zookeeper instances to run.

21. What are the minimum number of machines required for a cluster? What are the cluster rules? There are 3 servers in the cluster, and one of the nodes is down. Can Zookeeper still be used at this time?

The cluster rule is 2N 1 unit, N>0, that is, 3 units. You can continue to use the odd-numbered servers as long as not more than half of the servers are down.

#22. Does the cluster support dynamic addition of machines?

In fact, it is horizontal expansion. Zookeeper is not very good in this aspect. Two methods:

#Restart all: shut down all Zookeeper services, modify the configuration and then start them. Does not affect previous client sessions.
Restart one by one: Under the principle that more than half of the machines are alive and available, restarting one machine will not affect the entire cluster's external services. This is the more commonly used method.

#3.5 version starts to support dynamic expansion.

#23. Are Zookeeper’s watch notifications for nodes permanent?

no. Official statement: A Watch event is a one-time trigger. When the data for which the Watch is set changes, the server sends the change to the client for which the Watch is set to notify them.

Why is it not permanent? For example, if the server changes frequently and the monitoring clients are in many cases, all clients must be notified of each change, which puts a lot of pressure on the network and server. .

Generally, the client executes getData("/node A", true). If node A is changed or deleted, the client will get its watch event, but then node A changes again. The client does not set a watch event, so it will no longer be sent to the client.

In practical applications, in many cases, our client does not need to know every change on the server, I only need the latest data.

#24. What are the java clients of Zookeeper?

java client: zk’s own zkclient and Apache’s open source Curator.

25. What is chubby, and how do you think it compares to zookeeper?

chubby is from Google, fully implements the paxos algorithm, and is not open source. Zookeeper is an open source implementation of Chubby, using the zab protocol, a variant of the paxos algorithm.

#26. Let’s talk about some commonly used commands of zookeeper.

Commonly used commands: ls get set create delete etc.

27. What are the connections and differences between ZAB and Paxos algorithms?

Same points:

(1) Both have a role similar to the Leader process, which is responsible for coordinating the running of multiple Follower processes

(2) The Leader process will wait for more than half of the Followers to give correct feedback before submitting a proposal

(3) In the ZAB protocol, each Proposal contains a The epoch value represents the current Leader cycle. The name in Paxos is Ballot

. The difference:

ZAB is used to build a highly available distributed data master and backup system (Zookeeper). Paxos is used to Build a distributed consistent state machine system.

28. Typical application scenarios of Zookeeper

Zookeeper is a typical publish/subscribe model of distributed data A management and coordination framework that developers can use to publish and subscribe to distributed data.

By cross-using the rich data nodes in Zookeeper and cooperating with the Watcher event notification mechanism, it is very convenient to build a series of core functions that will be involved in distributed applications, such as:

(1) Data publishing/subscription

(2) Load balancing

(3) Naming service

(4) Distributed coordination/notification

( 5) Cluster management

(6)Master election

(7)Distributed lock

(8)Distributed queue

29. What functions does Zookeeper have?

Cluster management: monitor node survival status, running requests, etc.;

Master node election: after the master node hangs up, you can start a new one from the backup node One round of master election, the master node election is about this election process, using Zookeeper can assist in completing this process;

Distributed lock: Zookeeper provides two types of locks: exclusive locks and shared locks. An exclusive lock means that only one thread can use the resource at a time. A shared lock means that read locks are shared, and read and write are mutually exclusive, that is, multiple threads can read the same resource at the same time. If a write lock is used, only one thread can use it. Zookeeper can control distributed locks.

Naming service: In a distributed system, by using the naming service, the client application can obtain the address, provider and other information of the resource or service based on the specified name.

#30. Tell me about Zookeeper’s notification mechanism?

The client will create a watcher event for a certain znode. When the znode changes, these clients will receive zk notifications, and then the client can respond based on the znode changes. Make business changes, etc.

31. What is the relationship between Zookeeper and Dubbo?

The role of Zookeeper

zookeeper is used to register services and perform load balancing. Which service is controlled by The caller must know which machine provides the service. Simply put, it is the corresponding relationship between the IP address and the service name. Of course, this correspondence can also be implemented in the caller's business code through hard coding. However, if the machine that provides the service hangs up, the caller has no way of knowing. If the code is not changed, it will continue to request the dead machine to provide services. Zookeeper can detect the hung machine through the heartbeat mechanism and delete the corresponding relationship between the IP and service of the hung machine from the list. As for supporting high concurrency, simply speaking, it means horizontal expansion, increasing computing power by adding machines without changing the code. By adding new machines to register services with ZooKeeper, the more service providers there are, the more customers they can serve.

dubbo

is a tool for managing the middle layer. There are many service access and service providers between the business layer and the data warehouse. Scheduling is required, and dubbo provides a framework to solve this problem.

Обратите внимание, что даббо здесь всего лишь каркас.Что вы поставите на полку, полностью зависит от вас, так же, как и скелет автомобиля, вам нужно подобрать колесо под двигатель. Для завершения планирования в этой структуре должен быть распределенный центр регистрации для хранения метаданных всех сервисов.Вы можете использовать zk или другие, но все используют zk.

Отношения между Zookeeper и Dubbo:

Dubbo абстрагирует центр регистрации и может подключать различные носители данных для предоставления услуг центру регистрации. , включая ZooKeeper, Memcached, Redis и т. д.

Внедрение ZooKeeper в качестве носителя данных также знакомит с функциями ZooKeeper. Первый — балансировка нагрузки.Пропускная способность одного центра регистрации ограничена.Когда трафик достигает определенного уровня, его необходимо перенаправить.Балансировка нагрузки существует с целью перенаправления трафика.Группа ZooKeeper может легко добиться балансировки нагрузки с помощью соответствующее веб-приложение.; Синхронизация ресурсов, одной лишь балансировки нагрузки недостаточно, необходимо синхронизировать данные и ресурсы между узлами, и кластеры ZooKeeper, естественно, имеют такую функцию; служба именования, использующая древовидную структуру для ведения глобального списка адресов службы, Поставщики услуг При запуске напишите свой собственный URL-адрес в каталоге /dubbo/${serviceName}/providers указанного узла ZooKeeper. Эта операция завершает выпуск службы. Другие функции включают выбор мачты, распределенные блокировки и т. д.

The above is the detailed content of [Recommended collection] Soul torture! Zookeeper's 31-shot cannon. For more information, please follow other related articles on the PHP Chinese website!