I implemented a simple douban.fm client using python on a whim a month ago. The plan is to gradually improve it into a douban.fm client that can replace the web version under Ubuntu. But later, because there were too many things to do, it was put on hold and was not further improved. Just yesterday, a garden friend mentioned the implementation of login in a comment. Although there are still many things to do recently, he suddenly wanted to implement this function. It just so happens that a few days ago, due to some needs, I used python to implement website login. I estimate that the login of douban.fm will not be too different.
About website authentication
The http protocol is designed as a connectionless protocol, but in reality, many websites need to identify users, and cookies are born for this purpose. When we use a browser to browse a website, the browser will transparently handle cookies for us. Now that we need a third party to log in to the website, we must have a certain understanding of the cookie workflow.
In addition, many websites use a verification code mechanism to prevent the program from automatically logging in. The intervention of the verification code will make the login process troublesome, but it is not too difficult to deal with.
The actual login process of douban.fm
In order to simulate a clean (without using existing cookies) login process, I use chromium's incognito mode.
Observing the request and response headers, you can see that the request header of the first request does not have a Cookie field, and the server's response header contains the Set-Cookie field, which tells the browser to request the website next time Cookies are required.
I noticed an interesting phenomenon here. When visiting douban.fm, I actually went through 3 redirects. Of course, generally speaking we don't need to pay attention to these details. Browsers and advanced httplib will handle redirections transparently, but if you use the underlying C Socket, you must handle these redirections carefully.
Click the login button, and the browser initiates several new requests, including several crucial requests. These requests are the key to our third-party login to douban.fm.
First of all, there is a requested URL is http://douban.fm/j/new_captcha. When requesting this URL, the server will return a random string. What is the use of this? (Actually, it is a verification code)
Look at the next request, http://douban.fm/misc/captcha?size=m&id=0iPlm837LsnSsJTMJrf5TZ7e, this request will return the verification code. It turns out that, request http://douban.fm/j/new_captcha, and use the string returned by the server as the id parameter value of the next request.
We can write a python code to verify our idea.
It is worth noting that python provides 3 http libraries, httplib, urllib and urllib2. The one that can handle cookies transparently is urllib2. I think it was painful to use httplib to manually process cookies before.
The code is as follows:
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(CookieJar())) captcha_id = opener.open(urllib2.Request('http://douban.fm/j/new_captcha')).read().strip('"') captcha = opener.open(urllib2.Request('http://douban.fm/misc/captcha?size=m&id=' + captcha_id)).read()) file = open('captcha.jpg', 'wb') file = write(captcha) file.close()
This code implements the download of the verification code.
Next, we fill out the form and submit it.
You can see that the target address of the login form is http://douban.fm/j/login, and the parameters are:
source: radio
alias: username
form_password: password
captcha_solution: verification code
captcha_id: Verification code ID
task: sync_channel_list
The next thing to do is to construct a form using python.
opener.open( urllib2.Request('http://douban.fm/j/login'), urllib.urlencode({ 'source': 'radio', 'alias': username, 'form_password': password, 'captcha_solution': captcha, 'captcha_id': captcha_id, 'task': 'sync_channel_list'}))
The data format returned by the server is json. The specific format will not be described here. You can test it yourself.
How do we know if the login is working? Yes, the previous article mentioned that channel=-3 is the heart megahertz, which is the user's favorite list. You cannot get the playlist of this channel without logging in. Request http://douban.fm/j/mine/playlist?type=n&channel=-3. If a list of your own favorite music is returned, then the login works.
Code organization
Combined the previous version and the new login function, plus command line parameter processing and channel selection, a slightly improved douban.fm is completed
View Code #!/usr/bin/python # coding: utf-8 import sys import os import subprocess import getopt import time import json import urllib import urllib2 import getpass import ConfigParser from cookielib import CookieJar # 保存到文件 def save(filename, content): file = open(filename, 'wb') file.write(content) file.close() # 获取播放列表 def getPlayList(channel='0', opener=None): url = 'http://douban.fm/j/mine/playlist?type=n&channel=' + channel if opener == None: return json.loads(urllib.urlopen(url).read()) else: return json.loads(opener.open(urllib2.Request(url)).read()) # 发送桌面通知 def notifySend(picture, title, content): subprocess.call([ 'notify-send', '-i', os.getcwd() + '/' + picture, title, content]) # 登录douban.fm def login(username, password): opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(CookieJar())) while True: print '正在获取验证码……' captcha_id = opener.open(urllib2.Request( 'http://douban.fm/j/new_captcha')).read().strip('"') save( '验证码.jpg', opener.open(urllib2.Request( 'http://douban.fm/misc/captcha?size=m&id=' + captcha_id )).read()) captcha = raw_input('验证码: ') print '正在登录……' response = json.loads(opener.open( urllib2.Request('http://douban.fm/j/login'), urllib.urlencode({ 'source': 'radio', 'alias': username, 'form_password': password, 'captcha_solution': captcha, 'captcha_id': captcha_id, 'task': 'sync_channel_list'})).read()) if 'err_msg' in response.keys(): print response['err_msg'] else: print '登录成功' return opener # 播放douban.fm def play(channel='0', opener=None): while True: if opener == None: playlist = getPlayList(channel) else: playlist = getPlayList(channel, opener) if playlist['song'] == []: print '获取播放列表失败' break picture, for song in playlist['song']: picture = 'picture/' + song['picture'].split('/')[-1] # 下载专辑封面 save( picture, urllib.urlopen(song['picture']).read()) # 发送桌面通知 notifySend( picture, song['title'], song['artist'] + '\n' + song['albumtitle']) # 播放 player = subprocess.Popen(['mplayer', song['url']]) time.sleep(song['length']) player.kill() def main(argv): # 默认参数 channel = '0' user = '' password = '' # 获取、解析命令行参数 try: opts, args = getopt.getopt( argv, 'u:p:c:', ['user=', 'password=', 'channel=']) except getopt.GetoptError as error: print str(error) sys.exit(1) # 命令行参数处理 for opt, arg in opts: if opt in ('-u', '--user='): user = arg elif opt in ('-p', '--password='): password = arg elif opt in ('-c', '--channel='): channel = arg if user == '': play(channel) else: if password == '': password = getpass.getpass('密码:') opener = login(user, password) play(channel, opener) if __name__ == '__main__': main(sys.argv[1:])