Friends who have done the IP address query function should have heard of the Innocence IP database. The Innocence IP database query is similar to this:
If you only need to search for the user's home text based on IP and then display it, just perform a binary search according to the rules of the IP library and display it. (Detailed format explanation)
But what if you need to obtain the text description of the location based on IP, and then further associate it with your existing administrative region data table?
At first glance, both of these should be possible, but what about efficiency? Very bad! Especially in the face of applications with slightly higher concurrency, neither of these methods can stand the test.
Why not associate the data in the Innocence IP library (other IP libraries are also acceptable) with your own region data, use your own region ID to replace the region description in the Innocence IP library, and finally make your own binary IP library file. ?
Let’s get to the point and see how to make our own binary IP library file based on the pure IP library data.
Note: This article only explains the general idea, there is no detailed code, thank you
We need to prepare two parts of data:
1. The decompressed txt file of the Innocence IP library.
After downloading the Innocence IP library, there will be an ip.exe tool, which can be generated by decompressing it above.
The generated data is shown in Figure 1-1. My version has approximately 444,290 entries.
Figure 1-1
2. Your own national, province and city cascade data table.
There should be more on this website. You can import it yourself. The table structure is similar (area_id, area_level, area_name, area_pid), which respectively represent area ID, area level, area name, and parent area ID.
Of course, you can also use different structures yourself, which will not affect our processing this time.
The data is already available, now let’s plan the organization of the IP library we need to generate.
As you can see from the title, the IP library we need to generate is a binary data packet, not an ordinary text file. So what should the file structure of our IP library be?
As shown in the picture:
As you can see, our structure is like this:
The structure of the IP data packet has been determined, and the next step is to process it step by step.
1. Read the contents of the IP text file one by one, convert the IP into a 32-bit signed integer (customized ip2long), and obtain the final region through regional text analysis
a. The rules for each line of the IP text file are: the first 15 bytes are the IP start address, the last 15 bytes are the IP end address, and the last is the region text description.
b. Converting IP to a 32-bit signed integer only takes up 4 bytes, and solves the problem that the PHP function ip2long has different values in 32-bit and 64-bit systems. The new function is as follows:
<span function</span> ip2Long32(<span $ip</span><span ) { </span><span $ip</span> = <span unpack</span>('l', <span pack</span>('l', <span ip2long</span>(<span $ip</span><span ))); </span><span return</span> <span $ip</span>[1<span ]; } </span><span //</span><span end func</span>
Of course, you can also develop your own PHP extension, see here for details: http://www.cnblogs.com/iblaze/archive/2013/06/02/3112603.html
c. The region needs to obtain the names of regions at all levels (including provinces, cities, counties, districts, etc., only countries are retained in foreign countries), as shown in the figure:
2. Convert the obtained region information into region ID
I can’t describe this part of the process very well, because everyone may use different regions, but the general principle is to first find the ID based on the lowest-level region name (depending on the actual situation, it may be possible To remove cities, counties, etc.), if not, search for the upper level, and so on until the region ID is obtained.
If the region ID is not found, it will be classified as unknown.
3. Compression, the compressed file is about 5.08M
The compression rules are as shown in the figure. The value in format corresponds to the type in pack:
There is a point here that must be reminded. Since IP is converted to a signed 32-bit integer, IPs after 128.0.0.0 will be negative numbers, so we need to judge the negative numbers and put them in front of our IP library. After all, it is To use binary search, ordered data is required.
4. To find IP, use binary search. 44W pieces of data only need to be searched 19 times at most. is similar to the following:
4. Single test, it seems the speed is okay
5. Simple pressure test to see the effect
a. For ab pressure measurement, use the local ab
b. The test script is on the Linux test machine (ordinary PC)
c. The stress test script is as follows:
d. Stress test statement: ab -n 10000 -c 50 http://192.168.206.71/ipdata.php?type=php
Performance is pretty good. Haha
It’s over. Is there any better way that we can discuss together? Thank you~