Requests 自动登录挖财

2016 年 12 月 28 日

Python requests 自动登录某财BBS，自动签到打卡领铜钱，最后再配个plist，每天自动执行

某财的用户应该都知道这个网站，在“签到有礼”版块，每天会有一贴，用帖子中给出的关键字回帖，得铜钱，据说铜钱可以换现金，还可以换书。

真好，裸辞在家的失业人员最需要这个～每天领之。

基本思路：

先用抓包工具仔细分析下登陆以及回帖时post了哪些数据，这些数据从何而来（我用的Firefox ＋ Firebug，挺好用的，选上保持＋全部，就算页面重定向，所有的请求也都能看到）；

python requests库，用requests.Session().post来登陆和回帖，用get来读取页面内容；

登陆之后，拿到BBS首页HTML源码，正则＋BeautifulSoup，找到“签到有礼”子版块的相对URL，以及它的forumID，跟baseURL拼接得到子版块URL；

get子版块URL，拿到子版块首页HTML源码，相同的方法，得到当日签到帖子的相对URL和threadID；

get签到帖子URL，找到帖子里的关键字，post回帖之；

然后再一路找到自己信息栏，看看自己的铜钱数；

最后，把这一路走来的中间过程和状态写入到log文件里，方便出问题后反查。

最后的最后，使用写个.sh脚本，里面运行这个python程序，配置个相应的plist，每天自动执行（MAC OS）

先说说我踩过的坑：

登陆post之后，返回的是200状态值，也就是成功，但是回帖post时，永远提示未登陆，肯定是cookie出了问题，但是requests是自动保持cookie和session的，吭吭哧哧大半天之后，crab大神一语点醒了我。仔细看抓包工具里的相关包，仔细看post之后，响应的content！！！这个网站登陆数据post之后，响应的content里面有两个链接，乍一看，都是什么API…没猜错的话，这个链接就是种cookie的。从抓包工具里也能清楚的看到，post登陆数据之后，立马连着两个get，请求的URL正是post之后，响应的content里面的那两个URL。

小知识点GET：

f = open(file,’r+’)

f = open(file,’w+’)

乍一看，r+ 跟 w+ 没区别，其实有很大区别：r+ 方式，文件必须存在，否则会报错，用r+方式写的时候，它是从头开始覆盖的，覆盖到哪里算哪里；而w+方式，文件不存在时会新建，写入的时候是全部清空再写入的。a+则是可读可写，并且是用追加方式写入的。

下面程序运行方法：python 路径/Auto_Login_Reply.py 用户名/密码，这种方式有个好处，不用改.py里面的用户名密码参数，直接带参数运行。

本程序是面向过程的，从头至尾，一气呵成。

python真好，既能面向对象，也能面向过程，灵活巧妙，赞！

#!/usr/bin/env python
#-*- coding:utf-8 -*-

__author__ = 'Sophie2805'

import re
import time
import requests
from bs4 import BeautifulSoup
import os.path
import sys

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
python Auto_Login_Reply.py user/pwd
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''

args = sys.argv[1]
#print args
username = args[0:args.find('/')]
pwd = args[args.find('/')+1:len(args)]
#print username , pwd

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
using log_list[] to log the whole process
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''

#print os.path.abspath('.')
log_list = []
log_list.append('+++++++++++++++++++++++++++++++++++++++++++++n')
log_list.append('++++挖财签到有礼'+(time.strftime("%m.%d %T"))+' 每天签到得铜钱++++n')
log_list.append('+++++++++++++++++++++++++++++++++++++++++++++n')

s = requests.Session()

agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Firefox/38.0'
connection = 'keep-alive'

s.headers. update({'User-Agent':agent,
                   'Connection':connection})

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
post login request to this URL, observed in Firebug
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''

login_url = 'https://www.wacai.com/user/user!login.action?cmd=null'

login_post_data ={
    'user.account':username,
    'user.pwd':pwd
}

try:
    login_r = s.post(login_url,login_post_data)
except Exception,e:
    log_list.append(time.strftime("%m.%d %T") + '--Login Exception: '+ e + '.n')
#print login_r.content

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
these two get() are very import!!!
login_r.content return these 2 api URLs.
Without getting these 2 URLs, the BBS will not take our session as already login.
I assume, getting these 2 URLs, some critical cookie will be returned.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''

src1 = login_r.content[login_r.content.find('src')+5:login_r.content.find('"></script>')]
src2 = login_r.content[login_r.content.rfind('src')+5:login_r.content.rfind('"></script><script>')]
#print src1
#print src2
s.get(src1)
s.get(src2)

base_url = 'http://bbs.wacai.com/'
homepage_r = s.get(base_url)
if '我的挖财' in homepage_r.content:
    log_list.append(time.strftime("%m.%d %T") + '--Successfully login.n')

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
find the checkin forum URL and ID, which is used as fid parameter in the reply post URL
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''
pattern = '<.+>签到有礼<.+>'
p = re.compile(pattern)
soup = BeautifulSoup(p.findall(homepage_r.content)[0])
checkin_postfix = soup.a['href']
checkin_forum_url = base_url+ checkin_postfix
#print checkin_forum_url
forum_id = checkin_postfix[checkin_postfix.find('-')+1:checkin_postfix.rfind('-')]

if forum_id != '':
    log_list.append(time.strftime("%m.%d %T") + '--Successfully find the checkin forum ID.n')

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
get the checkin forum portal page and find today's thread URL and ID, which is used as tid parameter in the reply post URL
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''
checkin_forum_page=s.get(checkin_forum_url)
#print checkin_forum_page.status_code
title = '签到有礼'+(time.strftime("%m.%d")+'0').strip('0')+'每天签到得铜钱，每人限回一次'
pattern_1 = '<.+>'+title + '<.+>'
p_1 = re.compile(pattern_1)
soup = BeautifulSoup(p_1.findall(checkin_forum_page.content)[0])
thread_postfix = soup.a['href']
thread_url = base_url + thread_postfix
thread_id= thread_postfix[thread_postfix.find('-')+1:thread_postfix.rfind('-')-2]
#print thread_id

if thread_id != '':
    log_list.append(time.strftime("%m.%d %T") + '--Successfully find the thread ID.n')
t = s.get(thread_url)

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
formhash is a must in the post data, observed in Firebug.
So get the formhash from the html of the page
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''
pattern_2 = '<input type="hidden" name="formhash" .+/>'
p_2 = re.compile(pattern_2)
soup = BeautifulSoup(p_2.findall(t.content)[0])
formhash = soup.input['value']

pattern_3 = '回帖内容必须为'+'.+'+'</font>非此内容将收回铜钱奖励'
result_3 = re.compile(pattern_3).findall(t.content)
#print result_3
key = result_3[0][result_3[0].find('>')+1:result_3[0].rfind('<')-1]
if key != '':
    log_list.append(time.strftime("%m.%d %T") + '--Successfully find the key word.n')

'''~~~~~~~
auto reply
~~~~~~~~~~'''

host='bbs.wacai.com'
s.headers.update({'Referer':thread_url})
s.headers.update({'Host':host})
reply_data={
    'formhash':formhash,
    'message':key,
    'subject':'',
    'usesig':''
}
reply_post_url = 'http://bbs.wacai.com/forum.php?mod=post&action=reply&fid='+forum_id+'&tid='+thread_id+'&extra=&replysubmit=yes&infloat=yes&handlekey=fastpost&inajax=1'
try:
    reply_r = s.post(reply_post_url,data=reply_data)
except Exception,e:
    log_list.append(time.strftime("%m.%d %T") + '--Reply exception: '+ e +'.n' )
if '非常感谢，回复发布成功，现在将转入主题页，请稍候……' in reply_r.content:#success
    log_list.append(time.strftime("%m.%d %T") + '--Successfully auto reply.n')
else:
    log_list.append(time.strftime("%m.%d %T") + '--Fail to reply: '+ reply_r.content + '.n')

'''~~~~~~~~~~~~~~
find my WaCai URL
~~~~~~~~~~~~~~~~~'''
pattern_4 = '<.+访问我的空间.+</a>'
p_4 = re.compile(pattern_4)
soup = BeautifulSoup(p_4.findall(t.content)[0])
if soup.a['href'] != '':
    log_list.append(time.strftime("%m.%d %T") + '--Successfully find my WaCai link.n' )
mywacai_url = base_url + soup.a['href']
mywacai_page = s.get(mywacai_url)

'''~~~~~~~~~~~~~
find my info URL
~~~~~~~~~~~~~~~~'''
pattern_5 = '<.+个人资料</a>'
p_5 = re.compile(pattern_5)
soup = BeautifulSoup(p_5.findall(mywacai_page.content)[0])
if soup.a['href'] != '':
    log_list.append(time.strftime("%m.%d %T") + '--Successfully find my info link.n' )
myinfo_url = base_url+ soup.a['href']
myinfo_page = s.get(myinfo_url)

'''~~~~~~~~~~~~~~
find my coin info
~~~~~~~~~~~~~~~~~'''
pattern_6 = '<em>铜钱.+n.+n'
p_6 = re.compile(pattern_6)
coin = p_6.findall(myinfo_page.content)[0]
coin = coin[coin.find('</em>')+5:coin.find('</li>')]
if int(coin.strip()) != 0:
    log_list.append(time.strftime("%m.%d %T") + '--Successfully get my coin amount: %s.n'% int(coin.strip()))
log_list.append('n')

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
if log.txt does not exist under current executing path, create it.
write log, if the log file is larger than 100 lines, delete all then write from beginning
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''

file = os.path.abspath('.')+'/log.txt'#get the absolute path of .py executing
if not os.path.isfile(file):#not exist, create a new one
    f = open(file,'w')
    f.close()

if os.path.getsize(file)/1024 > 1024:#larger than 1MB
    f = open(file,'w')
    try:
        f.write('')
    finally:
        f.close()

f = open(file,'a')#append
try:
    f.writelines(log_list)
finally:
    f.close()

最后是plist，mac电脑用这个配置定时任务，windows的话，写个bat，然后也可以配置的貌似。

先写个test.sh脚本，注意用chmod 777 test.sh给它赋予可执行的权限：

cd /Users/Sophie/PycharmProjects/Auto_Login_Reply_BBS_WaCai

python Auto_Login_Reply.py username/password

然后到如下路径

~/Library/LaunchAgents，新建个plist文件，文件名为：wacai.bbs.auto.login.reply.plist。

注意label不要跟别的重复，写个特别点的，ProgramArguments里面写上test.sh的绝对路径，StartCalendarInterval里面配置成几点几分自动执行，最后的StandardOutPath和 StandardErrorPath要不要都行，要更好，出错了可以看看错误信息。





    Label
    wacai.bbs.auto.login.reply
    ProgramArguments
    
        /Users/Sophie/PycharmProjects/Auto_Login_Reply_BBS_WaCai/test.sh
    
    StartCalendarInterval
    
        Minute
        30
        Hour
        1
    
StandardOutPath
/Users/Sophie/PycharmProjects/Auto_Login_Reply_BBS_WaCai/run.log
StandardErrorPath
/Users/Sophie/PycharmProjects/Auto_Login_Reply_BBS_WaCai/runerror.log

编辑好了之后

launchctl load wacai.bbs.auto.login.reply.plist 启用这个plist

launchctl start wacai.bbs.auto.login.reply 立即执行一次，注意，这里是那个label值，不带plist后缀的

修改plist之后，要launchctl unload …. 再 launchctl load…重新加载

还可以用launchctl list | grep wacai 来看看执行状态，一般，有了PID，并且status为0即为一切正常，否则，哪里有问题导致执行出了问题。

最后附上某财网站截图，希望页面不要频繁变动，不然我就得debug改脚本了 = =#

转载自演道,想查看更及时的互联网产品技术热点文章请点击http://go2live.cn

About The Author

bjmayor

程序员，码农，php,python,ios,android,go，产品经理，创业。

M	T	W	T	F	S	S
« Jan
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Related Posts

About The Author

bjmayor