Reading the data of a page under python can be easily implemented through urllib2
import urllib2 print urllib2.urlopen('http://www.pythontab.com').read()
If it involves the POST request operation of the page, you need to provide header information, submitted post data and request page.
The post data needs urllib.encode(), which actually converts the dictionary into the format of "data1=value1&data2=value2".
import urllib import urllib2 HEADER = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0', 'Referer' : 'http://202.206.1.163/logout.do' } POSTDATA = { 'data1': 'value1', 'data2': 'value2' } HOSTURL = 'http://xxx.com' enpostdata = urllib.urlencode(POSTDATA) urlrequest = urllib2.Request(hosturl,enpostdata,HEADER) urlresponse = urllib2.urlopen(urlrequest) print urlresponse.read()
After the request, the browser will have a session maintenance process. The session is saved in a cookie. The next page request will put the cookie in the request header. If the cookie is lost, the session will be disconnected. .
You need to set up cookie retention under python
# cookie set # 用来保持会话 cj = cookielib.LWPCookieJar() cookie_support = urllib2.HTTPCookieProcessor(cj) opener = urllib2.build_opener(cookie_support, urllib2.HTTPHandler) urllib2.install_opener(opener)
The following is a library file that summarizes the above knowledge points for easy use:
# filename: analogop.py #!/usr/bin/python # -*-coding:UTF-8 -*- # author: 初行 # qq: 121866673 # mail: zxbd1016@163.com # message: I need a python job # time: 2014/10/8 import urllib import urllib2 import cookielib # cookie set # 用来保持会话 cj = cookielib.LWPCookieJar() cookie_support = urllib2.HTTPCookieProcessor(cj) opener = urllib2.build_opener(cookie_support, urllib2.HTTPHandler) urllib2.install_opener(opener) # default header HEADER = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0', 'Referer' : 'http://202.206.1.163/logout.do' } # operate method def geturlopen(hosturl, postdata = {}, headers = HEADER): # encode postdata enpostdata = urllib.urlencode(postdata) # request url urlrequest = urllib2.Request(hosturl, enpostdata, headers) # open url urlresponse = urllib2.urlopen(urlrequest) # return url return urlresponse
This is a test file because the reader has not tested it Environment, you need to build it yourself or find a website to test:
#filename: test.py from analogop import geturlopen postd = { 'usernum': '2011411111', 'upw': '124569', 'userip': '192.168.10.1', 'token': 'xxx' } urlread = geturlopen('http://127.0.0.1:8000/login/', postd) print urlread.read().decode('utf-8') urlread = geturlopen('http://127.0.0.1:8000/chafen/', {}) print urlread.read().decode('utf-8')