Professional statistical websites, such as Baidu Statistics, Google Analytics, cnzz and other statistical backends provide commonly used statistical indicators for webmasters, such as uv, pv, online time, ip, etc. In addition, due to network reasons, I found that Google Analytics is better than Baidu counts hundreds of IPs, so I want to write my own script to understand the real number of visits. However, the access logs based on nginx will be much larger than the statistical backend, because many spider visits will also be counted. There are also static file statistics. In fact, if the algorithm is improved, those useless statistical data can be filtered out. Today I will share the most basic statistics with you, and also to learn and review the python language.
For example, the nginx log on the server is as follows:
221.221.155.54 - - [02/Aug/2014:15:16:11 +0800] "GET / HTTP/1.1" 200 8482 "http://www. zuidaima.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.57 Safari/537.36" "-" "0.020"
221.221.155.53 - - [02/Aug /2014:15:16:11 +0800] "GET / HTTP/1.1" 200 8482 "http://www.zuidaima.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.57 Safari/537.36" "-" "0.020"
221.221.155.54 - - [02/Aug/2014:15:16:11 +0800] "GET / HTTP/1.1" 200 8482 "http: //www.zuidaima.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.57 Safari/537.36" "-" "0.020"
The statistical script is as follows:
stat_ip.py
#encoding=utf8