Home>Article>Operation and Maintenance> A classic algorithm question caused by a Linux command commonly used in projects
小时候家里定了《读者》的月刊,里面记录一个故事:说有有个偏僻的乡村一日突然来了一个美女,她携着万贯家财子女在当地安家落户,成了当地的乡绅。她让她的子女世世代代的保守这个秘密,直到这个秘密不会再对家族带来灾难。她就是陈圆圆。当年吴三桂领清兵入关,冲冠一怒为红颜,改写了中国的历史,自己却能全身而退的那个人。
周五例行公事的查看一下离线数据推送项目的数据和log。将log用awk分段之后,我想知道实时数据前10个被重复发送的数据ID都被重复发送了几次,从而找到进一步优化的入手点,天知道我对这个项目已经进行了多少次优化了。于是linux命令就是
cat transmission.log |grep 'IncrementAlbumService.java:146'|awk '{print $6}'|awk -F ',' '{print $1}'| sort |uniq -c| sort -nr |head
I feel very guilty for the results I got
(Data security, the ID rule part of our project is not displayed)
Although this is related to their operation, I originally It is time to send the data when it detects the change, but for such a large resend rate. Regardless of the interface of the update service or the offline service, there are still points that can be optimized. Girl, my thinking has always been different from that of those male idols who use hair dryers and artificial watering cans to create a picture when they appear. In addition to this result, I am also thinking about another classic algorithm problem: there is a text file with about 10,000 lines, one word in each line, and it is required to count the top ten most frequently occurring words.
For this algorithm problem, the above Linux command is sort|uniq -c |sort -nr | head. The time complexity is the largest of the following:
1> First do a sort,
Direct insertion sort: continuously insert elements into the ordered list, the worst time is complex The degree is O(n2)
Shell sorting: insertion sorting with reduced increment, unstable, depends on the selection of the increment factor sequence, the worst time complexity is O(n2)
Simple selection sorting: Select the smallest or largest number among the numbers to be sorted and exchange it with the first unsorted position. The worst time complexity is O(n2)
Binary selection sort: Each simple selection sort determines two elements, which can reduce the cycle by half.
Heap sort: tree selection sort, large root heap, small root heap. The worst time complexity is O(N*logN)
Bubble sorting: Each time two adjacent numbers are compared and exchanged, the worst time complexity is O(n2)
Quick sorting: select the base element and divide the elements to be sorted each time. The worst time complexity is O(n2)
Merge sorting: divide the two elements An ordered list is synthesized into a new ordered list. The worst-case time complexity is O(N*logN)
. Bucket sorting: an algorithm that trades space for time. The complexity is close to O(n)
Radix sorting: Allocate and collect according to hundreds of thousands of digits, the time complexity is O(dn)
2>uniq time complexity is O(n)
3> ;Sort time service degree is the same as 1>
4>The time complexity after sorting is O(1)
The algorithm used is also related to the size of the file. If the file is too large , if there is too much data, the files need to be split, sorted separately and then merged in multiple ways. So the number of words is mentioned here.
Without linux commands, the classic solution is to first use a dictionary tree to count word frequencies, and then use a big root heap. Let’s first introduce the dictionary tree, also called the tire tree. Because search engines often use this to make text word frequency statistics, and word segmentation algorithms also use this as a basic data structure, so I know a little bit about it. Its advantages are: minimizing unnecessary string comparisons, and query efficiency is higher than hash tables. The core idea is to exchange space for time and use public prefixes to reduce query time overhead. So when talking about statistics, the first thing that comes to mind is the dictionary tree. If you maintain an array of the top ten maximum word frequencies when counting word frequencies, the time complexity will be 10 times higher when compared in loop processing. Therefore, it is more appropriate to make statistics first and then select the top 10 in terms of time efficiency.
Actually, I don’t know much about algorithms, I just know how to use them. A former colleague of mine read an article I wrote on WeChat and asked me: "Is feed streaming a very technical job?" His question reminded me of Li Xiaoyao in "Sword of Immortals" who insisted on pretending to be tall, rich and handsome in the restaurant. When he said he wanted to order the most expensive dish: "Fried beef with vegetables", everyone laughed. Ling'er asked Li Xiaoyao: "Brother Xiaoyao, is fried beef with vegetables a very expensive dish?". Although my colleague was sincerely asking my opinion because he was in JD.com and was considering whether to go to Momo, I felt like the Li Xiaoyao who had never seen the world. The business logic of feed flow can be done in any way. Whether it has technical content depends on how it is done. I have written a patent to introduce a method of assembling feed streams. The process has not been completed, so I will not disclose the calculation method until then. But if you think hard, there are still many optimization points. The year before last, when I liked playing Moments, I would often find that the Moments I had deleted reappeared, or all the recent data in my or other people's Moments suddenly disappeared, leaving only very old data, such as two months ago a year ago. Data from a year ago will be automatically restored after one day. It's all a matter of strategy. There are many problems with WeChat Moments. Since one of our products, mm, is a family member of the WeChat architect, I won’t complain too much.
Although today is Sunday, you can use your imagination a little bit, but you also have to have a theme. The previous example has a classic top K problem. Because search engines often need to count the most popular query strings, the top K question is the basis. TopK problems use small root heaps. Maintain a small root heap of K size, traverse the elements to be compared, and compare them with the following elements respectively. If it is smaller than the root element, it means that it will definitely not enter the top K and will be eliminated. If it is greater than the root element, eliminate the root element. Then adjust the tree to the minimum heap and continue the comparison.
The minimum heap is a complete binary tree, and the value of each non-leaf node is not greater than the value of its child node. If this rule is broken, adjustments must be made from the first non-leaf node to the root node in a bottom-up order.
I decided to interview on Hulu next week, but I haven’t done so yet, so I probably won’t. Two years ago, my former colleague recommended Amazon, but I was not asked to go for an interview. To comfort myself, I guess they were not hiring at that time. I have never been to an interview with a foreign company like this before, so I don’t know what the routine is. If we start preparing now, we will probably be able to pass it after the National Day. I think it would be very disadvantageous for me to go to the interview by myself. It’s not going to be bad at all, it’s going to be very unstable. Friends who have read my articles may think that my articles are very messy and complicated. This is indeed the case for me in life. I have a wide range of knowledge, I am very whimsical, and I have no scruples. On the one hand, this lays the foundation for my creativity, but on the other hand, it is not conducive to my ability to express myself on the spot. The brain is like a computer. I have many parallel programs, the memory is not large enough, and there is a lot of data. Memory paging causes constant disk swapping. Time-sensitive actions like interviews can easily lead to timeout returns. I have so many technical invention patents, and now I think, I can’t even remember what I invented. I just took the bus. Because there were very few people, the driver asked me where to get off. He meant that it would stop wherever no one got off. It took me a long time to remember. My brain runs more in asynchronous non-blocking mode. In fact, synchronous blocking would be better for things like interviews. However, there is a solution to everything. If you can't find a solution, you just don't have enough ability. There's nothing wrong with that. However, the interview is to examine comprehensive abilities, such as teamwork, conversational skills, etc. I believe that no one in our department will have any objection to the sentence "Xiaojing is very smart." I also believe that colleagues with whom I collaborate in the department or work will not think that I am a difficult person to communicate with or get along with. But I tend to forget how to speak during interviews. But if I fail an interview because of this problem, I have no complaints. Because the interviewer is your future colleague and leader, if you are not in tune with the interviewer, you may not be able to use your abilities in the future. If you don’t do well in interviews and still feel that your abilities are sufficient, it’s likely that you are not high-level enough and have never seen what truly outstanding people look like. However, I am the kind of person who continues to do things even when I am determined to hit a wall. If I decide to give up on something, it’s because it’s not worth doing.
I like working. My goal is to still have a creative job when I am 60 years old. So I am afraid that domestic Internet companies will let me retire at the age of 40. There is one more important thing: I want to make my own search engine middleware. Domestic Internet companies mainly use it, so I am afraid it will be difficult for me to have the energy to do this. Of course, if you can’t go to Hulu, the search engine still has to do it. It’s just a matter of how to allocate your time.
I actually like to hit the wall, probably because I don’t want to grow up so quickly. If you act mature and elegant every day, you need to hide some things that you are not good at or things that may go wrong. As a result, I will be happy every day, but I may stay like this for the rest of my life. There are many famous figures in history who were originally playboys, but later became great men after their family fell into decline. In the book, there are two types of turning points in life: meeting noble people and encountering setbacks. When you are young and open-minded, you can have an epiphany when you meet a noble person and open your mind. As experience increases, people will be more selective in receiving the information around them. At this time, they may need to encounter great setbacks before they can rethink life. If I can see a better future, I am willing to go it alone and burn the boat. It’s better to have ups and downs than one day at a time. If you want to live, live a wonderful life~~
The above is the detailed content of A classic algorithm question caused by a Linux command commonly used in projects. For more information, please follow other related articles on the PHP Chinese website!