As the web develops, the number of elements that users interact with and show increases. These elements change the screen the user sees. Things that change the screen can be defined as ‘states.’
For example, in the case of an informational web page such as a landing page, 'status' is one piece of information to be shown.
Next, in the case of GitHub, there is various information such as my information, my repository information, number of stars, etc. Since the screen shown to the user varies depending on this, all of these can be considered 'status'.
As a more complex example, you can take an example like Figma. All graphics on the screen, such as dots, lines, and surfaces, are all 'states'. Moreover, collaboration functions require you to share the status of people other than yourself.
States are all data. Information about the user, user-customized information, etc. are all data stored somewhere, and this data soon becomes the state of the screen the user sees. Usually, this data is stored on the server as a single source of truth. If you log in to a website, it will be saved as a single row in the users table on that site's server.
The web is complicated these days. There are countless buttons and a lot of data displayed on one screen. There is a lot of information whose timeliness is important. Whenever these states change, the data must go back and forth to the server to ensure consistency. If you only need to receive the ‘next page’ per minute, like a document, it is not a big problem. However, in cases like Notion where users continuously modify data, it becomes a big problem. If I had to load it every time I set something like a feature on the page, I would be upset.
Think of clicking a like button on a social media site like Instagram. When I click like, I have to go to the server and save the information that I liked the post, increase the number of likes for the post by one, and then get the likes for the current post and show it to me.
But on Instagram, likes are clicked and the count goes up along with animation in 0.001 seconds.
This is possible by updating the client's state before the information even reaches the server. The idea is to update the status of the client, assuming that the like data will be well recorded on the server. In most cases, communication with the server will be successful, so we optimistically judge this to be a success.
Of course, there are cases where the request sent to the server fails, so care must be taken to roll back the client state in case of failure.
It is very reasonable to optimally show whether I clicked the like button or not. But when I click, someone else also clicks, so the number of likes may have increased by one or more. How do I handle this?
This can be easily solved by just slightly ignoring data consistency. If the post is a popular post, there is no way the number of likes did not increase during the time I was viewing the post. This is just the policy of the software. For quick response, some data consistency is sacrificed.
In distributed systems studies, there is CAP theory. This theory states that when configuring a distributed system, only two of C, A, and P can be used.
C stands for Consistency. No matter which node you read data from, you must read the same data.
A is Availability, which means whether all requests can be responded to even if a node dies.
P is Partition-tolerance, which is how many nodes can operate when the network connection is lost and whether it can be restored after network connection.
According to this theory, ultimately, three systems are possible: CA, AP, and CP.
In theory, a distributed system can choose CA, but we decided not to call a system that does not operate when the network connection is lost a distributed system.
In the end, if it is a distributed system, P must be guaranteed.
Availability Over Consistency
When several nodes are disconnected from the network, the value of the connected nodes is lowered even if all nodes do not agree on the latest status of the value. Therefore, the latest data may not match between disconnected nodes. However, users can continue to use the service as if they are receiving the latest data.
A representative example is social media. Although this is unlikely to happen in reality, let's assume that the network connection between Instagram's nodes in Europe and nodes in Asia is lost. It is okay for the number of followers, likes, etc. seen by users accessing from Asia and users accessing from Europe to be slightly different during this period of disruption. But the function will still work.
Consistency Over Availability
This is a system that does not respond to user requests in situations where the latest data cannot be assured in a network failure situation.
Examples are usually related to money (transactions). Let’s say there is a network disconnection in a situation where there is only one hotel room left with a 50% discount. In the AP system, reservations are made assuming that both rooms will be available, so there is a possibility of overbooking. The CP system is not sure about the up-to-date status of this data, so it postpones or rejects the request.
CAP theory is actually a theory about partition. If partition has occurred, you have to choose A or C.
But in fact, in normal circumstances, partition does not occur. The theory that can be applied in such situations is the PACELC theory.
if (P) then (AC) else (LC)
In other words, in the case of partition, consider AC, otherwise, consider LC.
Latency & Consistency
Under normal circumstances, the system trades off Latency and Consistency. It's a grandiose theory, but in fact, it's like a truth throughout computer engineering.
Thinking about trading off means seeing a certain degree of compromise between these two standards.
Latency can be intuitively determined from slow to fast, but it is difficult to intuitively know what consistency is.
Strong consistency can be sensed just by hearing the name. No matter which node you access, you must see the same data. In other words, consistency is possible only when all nodes have the same data.
I think you can think about a bank.
Someday it will be consistent. This means that not all clients will see the same value at the same time for a certain change, but they will eventually see the same value after synchronization is completed.
Therefore, depending on the characteristics of the software, it is decided whether to take care of consistency while sacrificing latency or to sacrifice consistency for fast response.
The above is the detailed content of Local First Software. For more information, please follow other related articles on the PHP Chinese website!