Home >Common Problem >How to understand what distributed databases are
Distributed databases include: 1. Elasticsearch database, which can exist on a single node or multiple nodes; 2. Redis database, which supports rich data types; 3. Mongodb database, which can obtain data more conveniently; 4. Mysql Distributed cluster, high availability.
Distributed databases include:
1. Elasticsearch database
Course Recommendation→: "Elasticsearch Full Text Search Practical Combat" (Practical Video)
From the course"Ten Million Level Data Concurrency Solution ( Theory and practice)》
1. Introduction to Elasticsearch
Distributed real-time file storage, each field is indexed and searchable, distributed real-time analysis and search The engine
can be expanded to hundreds of servers to process PB-level structured or unstructured data
2. Elasticsearch application scenarios
Distributed search engine and data analysis Engine, full-text retrieval, structured retrieval, data analysis
Near-real-time processing of massive data, on-site search (e-commerce, recruitment, portal, etc.), IT system search (OA, CRM, ERP, Etc.), data analysis
3. Advantages and disadvantages of Elasticsearch
Disadvantages: no user verification and permission control, no concept of transactions, no rollback support, accidental deletion cannot be restored, requires java Environment.
Advantages: Split your documents into different containers or shards, which can exist on a single node or multiple nodes
Replicate each shard to provide data backup to prevent hardware problems data lost.
Route mutual requests from any node in the cluster to ensure that the data obtained is what you need. When the cluster adds or redistributes shards, the new node will not stop to recover the lost node shard data
4. Elasticsearch persistence solution
gateway represents the persistent storage method of elasticsearch index. By default, elasticsearch stores the index in memory first, and then persists it to the hard disk when the memory is full. . When the elasticsearch cluster is shut down or restarted again, index data will be read from the gateway. Elasticsearch supports multiple types of gateways, including local file systems (default), distributed file systems, Hadoop's HDFS and Amazon's S3 cloud storage service.
ElasticSearch first saves the index content into the memory, and then persists the index to the hard disk when the memory is not enough. At the same time, it also has a queue that automatically writes the index to the hard disk when the system is idle. middle.
2. Redis database
1. Introduction to Redis
redis is an open source BSD licensed advanced key-value storage system (NoSQL) that can be used It is used to store strings, hash structures, linked lists, and sets. Therefore, it is often used to provide data structure services. Redis supports data persistence. It can save the data in the memory to the disk and load it again for use when restarting. It supports simple key-value type data, and also provides storage of data structures such as list, set, zset, and hash. Redis supports data backup, that is, data backup in master-slave mode.
2.Redis application scenario
A) Regular counting: number of fans, number of Weibo
B) User information change
C) Cache processing, As mysql's cache
D) queue system, a prioritized queue system and log collection system
3. Advantages and disadvantages of Redis
Advantages:
(1) It is fast because the data is stored in memory, similar to HashMap. The advantage of HashMap is that the time complexity of search and operation is O(1)
(2) It supports rich data types and supports string, list, set, sorted set, hash
(3) supports transactions and operations are atomic. The so-called atomicity means that all changes to the data are either executed or not executed at all
(4) Rich features: can be used for caching, messages, setting expiration time by key, and will be automatically deleted after expiration
Disadvantages:
(1) Redis does not have automatic fault tolerance and recovery Function, the downtime of the host and slave machines will cause some front-end read and write requests to fail. You need to wait for the machine to restart or manually switch the front-end IP to recover
(2) The host is down, and some data failed before the downtime. Synchronize to the slave machine in time. After switching IP, data inconsistency will be introduced, which reduces the availability of the system.
(3) The master-slave replication of redis adopts full replication. During the replication process, the host will fork a child process. Make a snapshot of the memory and save the memory snapshot of the child process as a file and send it to the slave. This process requires ensuring that the host has enough free memory. If the snapshot file is large, it will have a greater impact on the cluster's service capabilities. Moreover, the replication process will be performed when the slave machine newly joins the cluster or when the slave machine and the host network are disconnected and reconnected. That is to say, network fluctuations will cause the host and host to reconnect. A full data copy between slave machines causes a lot of trouble to the actual system operation
(4) Redis is difficult to support online expansion. When the cluster capacity reaches the upper limit, online expansion will become very complicated. In order to avoid this problem, operation and maintenance personnel must ensure that there is enough space when the system goes online, which causes a great waste of resources.
4. Redis persistence solution
Redis provides two methods for persistence, one is RDB persistence (the principle is to regularly dump the Redis database records in memory to the disk RDB persistence), and the other is AOF (append only file) persistence (the principle is to write Reids' operation log to the file in an appended manner).
RDB persistence refers to writing the snapshot of the data set in the memory to the disk within a specified time interval. The actual operation process is to fork a child process and first write the data set to a temporary file. After the writing is successful, , then replace the previous file and store it with binary compression.
3. Mongodb database
1. Introduction to Mongodb
MongoDB itself is a non-relational database. Each of its records is a Document, and each Document consists of a set of key-value pairs. Documents in MongoDB are similar to JSON objects. The values of fields in Document may include other Documents, arrays, etc.
2.Mongodb application scenario
The main goal of mongodb is to build on the key/value storage method (providing high performance and high scalability) and the traditional RDBMS system (rich functions) A bridge that combines the best of both worlds. Mongo is suitable for the following scenarios:
a. Website data: Mongo is very suitable for real-time insertion, update and query, and has the replication and high scalability required for real-time data storage on the website.
b. Caching: Due to its high performance, mongo is also suitable as a caching layer for information infrastructure. After the system is restarted, the persistent cache built by mongo can prevent the underlying data source from being overloaded.
c. Large-size, low-value data: It may be more expensive to store some data using traditional relational databases. Before this, many programmers often chose traditional files for storage.
d. High scalability scenario: mongo is very suitable for databases composed of dozens or hundreds of servers.
e. Used for storage of objects and JSON data: mongo’s BSON data format is very suitable for document formatted storage and query.
3. Advantages and disadvantages of Mongodb
Advantages:
(1) Weak consistency (eventual consistency), which can better ensure user access speed
(2) The storage method of document structure can obtain data more conveniently
(3) Built-in GridFS supports large-capacity storage
(4) In use cases, tens of millions of levels For document objects, nearly 10G of data, the query for indexed IDs will not be slower than mysql, while the query for non-indexed fields will win overall.
Disadvantages:
(1) Does not support things
(2) Occupies too much space, causing disk waste
(3) Single machine reliability Relatively poor
(4) Large amounts of data are continuously inserted, and the writing performance fluctuates greatly
4. Mongodb’s persistence solution/exception handling
When performing a write operation , MongoDB creates a journal containing the exact disk location and the changed bytes. Therefore, if the server suddenly crashes, when it starts, journal will replay any write operations that were not flushed to disk before the crash.
The data file is refreshed to the disk every 60s, by default, so the journal only needs to hold the written data within 60s. The journal pre-allocates several empty files for this purpose, located in /data/db/journal, named _j.0, j.1, etc.
When MongoDB runs for a long time, you will see files similar to _j.6217, _j.6218 and _j.6219 in the journal directory. These files are the current journal files, and if MongoDB is running all the time, these numbers will continue to increase. When MongoDB is shut down gracefully, these files will be cleared because these logs are no longer needed during a graceful shutdown.
If the server crashes or kill -9, when mongodb starts again, the journal file will be replayed and lengthy and difficult-to-understand verification lines will be output, indicating normal recovery.
4. Mysql distributed cluster
1. Introduction to Mysql distributed cluster
MySQL cluster is a shared-nothing, A storage solution based on distributed node architecture, which aims to provide fault tolerance and high performance.
Data update uses the read-committed isolation level to ensure the consistency of data on all nodes, and uses the two-phase commit mechanism (two-phasedcommit) to ensure that all nodes have the same data (if any If the write operation fails, the update fails).
Shared-nothing peer nodes make update operations on one server immediately visible on other servers. Propagating updates uses a complex communication mechanism designed to provide high throughput across the network.
Distribute the load through multiple MySQL servers to maximize program performance and ensure high availability and redundancy by storing data in different locations.
2.Mysql distributed cluster application scenario
Solve the problem of mass storage, such as the Mysql distributed cluster used by Jingdong B2B.
Suitable for billions of PV access to DB.
3. Advantages and disadvantages of Mysql distributed cluster
Advantages:
a) High availability
b) Fast automatic failover
c) Flexible distributed architecture, no single point of failure
d) High throughput and low latency
e ) Strong scalability, supports online expansion
Disadvantages:
a) There are many limitations, such as: no support for foreign keys
b) Deployment, management, and configuration are complex
c) It takes up a lot of disk space and memory
d) Backup and recovery are inconvenient
e) When restarting, it takes a long time for the data node to load data into the memory. Time
4. Mysql distributed cluster persistence solution
Load balancing.
Manage node backup.
Related free learning recommendations: mysql video tutorial
The above is the detailed content of How to understand what distributed databases are. For more information, please follow other related articles on the PHP Chinese website!