01 Let’s first talk about the historical baggage of this system
Our advertising engine went through about a year and a half before this reconstruction. The initial iteration is focused on search scenarios, with single business and clear processes.
#1. The business scenario began to become complex. In addition to search advertising, there were also It is necessary to support information flow recommendation and similar recommendation scenarios.
2. Advertising traffic begins to increase rapidly. In addition to meeting functional requirements, it is also necessary to take into account performance.
After sorting out, most of the logic of the entire engine can be shared, so we defined a main framework and made it extensible Partially abstracted. In this way, each scenario can implement certain public interfaces according to the particularity of its own business. In addition, from a performance perspective, we sacrificed some code readability and parallelized some logic.
With the development of business, search scenarios began to enter a rapid iteration period, with more and more new strategies added, and our main framework gradually became inflexible at this time.
1. In order to be compatible with the special logic of search , we need to add various if judgments in other scenarios to bypass these logics.
2. There are more and more advertising strategies, dozens in total. When the framework loses its clear structure, the implementation of some strategies begins to become customized and lacks hierarchical divisions. and pluggable abstract design.
##The turning point came at the end of 2019. Due to the particularity of the advertising business, traffic began to naturally decline. In addition, the product operation team The focus is on the work planning for the second year, thus giving us a very good window period to start this reconstruction.
We set the construction period to 1 month, and in the end was only online one day later than expected. Although there were two online problems, but in Grayscale They were discovered and repaired in time, and no online accidents were caused.
02 What preparations have we done before refactoring?
The amount of code refactored this time is very large, more than 30,000 lines, and it is the core engine part of the advertising system. Before starting, we can anticipate the following difficulties:
#1, Resistance on the business side : Advertising is extremely business-oriented. Although this reconstruction can improve long-term R&D efficiency, it cannot directly improve business income, and the development cycle will not be too short. How can we get support from business classmates? ?
▍Let everyone see the pain points
As mentioned earlier: With the business iteration, the main framework of our advertising engine has become blurred, and dozens of advertising strategies are scattered in different business scenarios, with messy configurations.
In view of these two pain points, we started sorting out the existing business one month in advance, reading the old code and looking through the previous requirements documents. Finally, we combined the core processes of different scenarios. and advertising strategies categorized into a clear table.
It is this table that allows technology and products to clearly see the whole picture of our engine part for the first time, and understand the complexity of the business and the current technical bottlenecks.
▍Clear the goals and values of refactoring
Let everyone feel the pain points Finally, we planned two core goals for this reconstruction:
1. Reconstruction of the main framework: modularize the main process, Redefine the upper and lower layer protocols to ensure clear interfaces; each layer also needs to be abstracted and have good scalability.
2. Flexible and configurable strategies: Advertising strategies are classified and abstracted according to business intentions, the execution conditions of the strategies are dynamically configurable, and the strategies can be plugged in and out at will.
In addition, we have refined the expected benefits that can be brought about after completing these two core goals:
#1. Technical benefits: The code structure is clearer, easier to understand and maintain; the scalability is enhanced, and the engine development efficiency will be further improved.
2. Business benefits: Strategies can achieve more fine-grained configuration and expansion, and are more friendly to business support; improved R&D efficiency can further speed up business iteration.
##▍Control of the overall rhythm
The control of the overall rhythm is also a very important part, allowing everyone to have a time expectation for this matter.
First of all, we set the construction period to 1 month. On the one hand, we considered the maximum cycle acceptable to the business side, and we also hoped for a quick solution technically; on the other hand, the Spring Festival is about to Come, we must rush to go online before the company shuts down the network, and reserve a buffer of 1-2 weeks to prevent unexpected situations.
03 What experiences can you share during the implementation process?
1. High-quality technical design plan
This is due to daily requirements. We will design technical solutions for projects with a development cycle of more than 3 days, and this reconstruction is of course no exception.
The overall architecture of the framework, the protocol design between modules, and the scalability design of the strategy are the focus of this technical solution. The team discussed it no less than three times. .
After the big plan was finalized, the team further refined the public parts such as database, interface fields, cache structure, log buried points, etc. Because it involves multi-person collaborative development, the team agreed Using documents as the communication interface, documents are always synchronized with code.
2. Pre-reconstruct the framework code
This PR is very critical, it is our technical solution The most important step to landing on the code. We have sorted out the reconstructed package structure, module division, API definition between each layer, and abstraction of different advertising strategies, ignoring the implementation details first.
In this way, the main body of the code is basically formed, which can clearly depict our ideal framework. We then organized multiple centralized code reviews and finally formed a unified opinion.
This step can well avoid getting caught up in implementation details too early, resulting in insufficient attention to the main framework and unstable code. Rework later will drag down efficiency.
3. Frequent communication and paired code review mechanism
After entering the detailed implementation stage, a very important point is: review the existing Logical understanding. The engine code has been iterated for a year and a half. It has been developed by many people in history, but this time only three students participated in the reconstruction.
#During the whole process, whenever we encountered any unclear code logic, we communicated and verified repeatedly and did not make subjective guesses. This caution is actually very important.
In addition, in terms of code review, we assigned students who are familiar with this business to be responsible for each module. They are paired in pairs and the mechanism is flexible.
4. Effective test plan
Refactoring has not been done, testing first. This principle is emphasized in the book "Refactoring" and is also the focus of our discussion of this technical solution. I will single it out here to expand on it in detail.
First of all, we made an agreement in the early stage: not to leave any old code, and completely build a new package for reconstruction. This makes it easy to compare the results before and after reconstruction and conduct online grayscale experiments at the same time.
Regarding the test plan, the following 4 points are worth learning from:
1 . End-to-end testing: This reconstruction does not involve functional adjustments, so the behavior of the outer API will not change. This end-to-end testing method is the most effective. This is the most important means of R&D and QA testing. .
2. Smoke test: QA students provide smoke cases, and R&D students conduct smoke. Before R&D tests, all smoke cases must be passed. . This is not common in most Internet companies, but it is absolutely effective for large projects.
3. Sandbox environment dual-process verification: As mentioned earlier, the code before and after our reconstruction is retained, so the input parameters of the online environment can be captured through scripts as a case, and then Use an automated method to compare the returned fields of the API one by one.
4. Online environment grayscale experiment: Grayscale is very important for reconstruction. We use the existing ABTest platform to gradually liberalize grayscale traffic, starting from 5% , to 10%, to 30%, and finally to 100%, a very cautious pace of volume increase was established, and then verified through logs and business indicator monitoring.
Write at the end
Review the entire reconstruction process and summarize it into the following 7 Key points:
7. Verify carefully and be responsible for every line of code
Of course, the most critical factor is people. Large-scale project refactoring extremely tests the team's collaboration ability. If everyone is reliable, the refactoring is already half successful.
The above is the detailed content of Hardcore stuff: a journey of reconstruction of more than 30,000 lines of code in a core system. For more information, please follow other related articles on the PHP Chinese website!