This article is introduced by the golang tutorial column about "Go rqlite author tells you: How important is the algorithm when developing database software!" 》, I hope it will be helpful to friends in need!
Writing database programs is a fascinating job. I've been heavily involved in open source database development for the past two years, and database programming is probably the most inspiring project you can do as a software developer.
What is truly shocking, however, is how much my attitude toward databases has changed over the past 6 years. From being uninterested at the beginning, I now begin to think that database systems are the pinnacle of software engineering.
For most of my career, my only experience with databases was reading about them. Usually in a boring context - open any undergraduate textbook on databases and you'll see what I mean. Usually you will see the following table as a typical use case for relational databases:
ID | FIRST | LAST | TITLE | DEPARTMENT |
---|---|---|---|---|
1 | Robert | Kelly | Director | Marketing |
2 | Tom | Burke | Representative | Sales |
3 | John | #Smith | Vice President | Sales |
Can you read more boring stuff? If these are all about databases, I want nothing to do with them. What's the point? Software is much cooler than this, right? So I completely avoided anything to do with databases for a long time
2009, after years of writing Embedded software, Linux device drivers, and networking software, I found myself leading a team that needed to build a web-based system. You see, the AWS cloud has arrived, and cloud-based licensing technology MAC addresses are no longer valid. My team has to build a licensing portal for our new EC2-based software appliance. Since we had a lot of experience with Python, we chose Django, running on MySQL. Something new happened. I actually started working on the database.
As our CRUD applications continued to run in the plains of our country, I began to realize how important the database was - and how central it was to our systems. If we lose the database, our software development is in vain. If the database corrupts data, our customers' devices may become unlicensed and their networks will cease to function. If the database does not function properly, thousands of people will be affected simultaneously. But none of these things happened. Databasealways works. It never disappoints us. I'm impressed.
Later I discovered foreign key constraints, unique constraints, referential integrity, indexes, (remember, at this time I don’t know anything about these things) - the database can help me in various ways to build a more robust system. I finally realized that modern databases are amazing-Databases are the most boring things in the worlduntil you actually have to build a system with them.
By 2012, I was leading a team that built a large key-value database based on a large indexing and search system , with elasticsearch at its core. It's eye-opening to see what a system like Elasticsearch can do - a technology built on world-class indexing - even with terabytes of log data underneath.
By now I've seen even databases and search systems fail, but I'm fascinated by database technology. By 2014, I joined a small dedicated team developing the core of [open source time series database] (github.com/influxdata/influxdb).
Only in database development can Big O analysis really come alive. Databases are one of the few applications where programmers still need to loop, sort, and filter millions of objects. This is one of the few places where a lot of the boring material learned in CS classes is important.
This is not the case with many other software developments. Writing boot ROM firmware? No, algorithms have never been important to me. Tuner device driver? No, it doesn't matter. Network device management software? CRUD application? Hardly all of these disciplines require different skills and knowledge. Most of the time, I just discussed runtime complexity in interviews.
But with the development of databases, all this has changed. It's a wonderful thing to actually see a system return the correct results, but only for a fraction of the time due to algorithm changes, and to see it happen in your code, in the system you built. matter.
There’s an old story in software that goes like this: A programmer writes some code that runs ten times faster than the previous version. He showed it, but someone pointed out that the data it produced was slightly different than the correct data. "But it's ten times faster," the programmer pointed out. "Well, if it doesn't need to be correct, I can make a version that takes up no space at all and runs infinitely fast," replied another.
This morality tale has always had a great impact on me. Being right is always more important than anything else. This is real. But it also leads me to believe that projects are valuable simply because they produce the right results.
For databases, this is not the case.
Performance is more than just a feature. This is a requirement. Those who are willing to pay for databases often do so because they have large amounts of data. If the database doesn't perform well in this situation—if it doesn't return results quickly and efficiently—then it might not work at all.
I think the thing that shocks me most about developing databases is how complex query engines have become. I have a lot of experience building systems that write and store data to disk. Making these systems work well can be a significant challenge.
But this complexity is usually much less than that of the query engine. A flexible query system - effectively building a system to answer questions when you don't know what the questions will be - requires serious design thinking. The query planner must be valid. Query systems must support many orthogonal requirements—filtering by certain dimensions, grouping by other dimensions, joining data from different tables—and sometimes supporting data from external sources. Finally, the query system must be efficient and perform well. This leads to a tension between abstraction and optimization in design and implementation, which requires real skill to manage well.
Any important database must support basic operations such as backup, recovery, fragmentation management, and monitoring.
If I, as a serious operator, can't back up your database, I can't use it, simple as that. It doesn't matter how quickly the database accepts writes. During a query, it doesn't matter how small its memory footprint is. If I can't protect the data in the database from failures beyond your control, the creator of the database, I will never be able to run it comfortably.
Of course, there are many ways to back up the database without the cooperation of the database. But built-in methods are usually best. This is also my recommendation for rqlite v2.0. If I want anyone to use rqlite seriously, I have to solve the real world problem where the system can fail completely and lag behind data for a long time.
Therefore, when designing and implementing a database, build operational support from the beginning. Make it a fundamental part of your design. Your users will thank you for it.
When you first start working with a database, especially as an operator, you often ask the question: Can the system What rate index? How quickly does it respond to queries? How much disk space do I need? How big can a piece of debris be and still work? How can I speed it up? All asked without reservation. I used to make it myself.
Maybe you can talk to the database programmers and ask them these questions. And the answer you'll often—perhaps ever—get is: It's up to you. You have to benchmark, you have to measure. This can be irritating to hear and may seem like you are avoiding responsibility.
Now, when I hear questions like this, I smile. too naive.
Indexing rate may depend on the size of the data, not just the number of documents or data points. This may depend on the batch processing, the cardinality of the data, whether the database is clustered, which columns and fields in the data are indexed, whether it is new data or an update to existing data, the machine the database is running on, RAM, IO performance, and the replication used.
The variables that control performance never end.
For queries, it may depend on the time range of the time series data. It depends on the number of records hit, the number of fields queried, whether a range scan is involved, whether the data is indexed, the type of index used, the number of shards that may be accessed, and whether the data is local. and machine characteristics. Is it in stock? Is it undergoing maintenance? Is the network busy?
It depends. Database designers are honest. They can know everything about the system they built and still not know the answers to your questions.
Programming Bucket ListIf there is one piece of advice for developers who want to improve their programming skills, it would be to join a database development team. My programming skills have improved tremendously because of database development - it's been a wonderful coding experience.Original address: https://www.philipotoole.com/what-i-learned-from-programming-a-database/Translation address: https://learnku .com/go/t/64605
The above is the detailed content of The author of Go rqlite tells you: How important algorithms are when developing database software!. For more information, please follow other related articles on the PHP Chinese website!