小弟最近正在研究如何爬取这家机票资料 https://m.tigerair.com/booking/search
从 Chrome dev tool 看到从client 端连续发送出了两个类似的 requests
curl 'https://m.tigerair.com/booking/search' -H 'Cookie: PLAY_SESSION="1d7f16c847d5a596f468c9c0f764a8eabf83f48c-id=1585d391-c8a4-431d-8e24-edaa7bbaef57"; --data 'departureStation=SHE&arrivalStation=MAA&roundtrip=false&departureDate=2016-02-11&returnDate=&adults=1&children=0&infants=0¤cy=CNY' --compressed
curl 'https://m.tigerair.com/booking/select' -H 'Cookie: PLAY_SESSION="30fdab1a897ba9ee088cba84ca28835efca28372-id=1585d391-c8a4-431d-8e24-edaa7bbaef57&searchForm=%7B%22currency%22%3A%22CNY%22%2C%22departureStation%22%3A%22SHE%22%2C%22arrivalStation%22%3A%22MAA%22%2C%22departureDate%22%3A1455148800000%2C%22children%22%3A0%2C%22adults%22%3A1%2C%22roundtrip%22%3Afalse%2C%22returnDate%22%3Anull%2C%22infants%22%3A0%2C%22switchMyFligthEnabled%22%3Afalse%7D"' -H 'Accept-Encoding: gzip, deflate, sdch' -H 'Accept-Language: en-US,en;q=0.8'
第一个request 看起来像是用POST 发送给 server打个交道,server回传cookie 要他记起来
接下来第二个 request 是GET 用来获取资料用
只是真是坑爹,我找了整个礼拜,就是无法用Python 复制这样的爬取行为。
看到谷歌大神有人建议用 request.session 也是没啥屁用
各位前辈高手,可以帮小弟给点方向吗? 小弟是用 Python requests 做开发的
題主你看看這段能跑麼?我命令行裡一邊試一邊複製的,應該沒問題。