Requests 自动登录挖财

Python requests 自动登录某财BBS,自动签到打卡领铜钱,最后再配个plist,每天自动执行

某财的用户应该都知道这个网站,在“签到有礼”版块,每天会有一贴,用帖子中给出的关键字回帖,得铜钱,据说铜钱可以换现金,还可以换书。

真好,裸辞在家的失业人员最需要这个~每天领之。

基本思路:

  1. 先用抓包工具仔细分析下登陆以及回帖时post了哪些数据,这些数据从何而来(我用的Firefox + Firebug,挺好用的,选上保持+全部,就算页面重定向,所有的请求也都能看到);
  2. python requests库,用requests.Session().post来登陆和回帖,用get来读取页面内容;
  3. 登陆之后,拿到BBS首页HTML源码,正则+BeautifulSoup,找到“签到有礼”子版块的相对URL,以及它的forumID,跟baseURL拼接得到子版块URL;
  4. get子版块URL,拿到子版块首页HTML源码,相同的方法,得到当日签到帖子的相对URL和threadID;
  5. get签到帖子URL,找到帖子里的关键字,post回帖之;
  6. 然后再一路找到自己信息栏,看看自己的铜钱数;
  7. 最后,把这一路走来的中间过程和状态写入到log文件里,方便出问题后反查。
  8. 最后的最后,使用写个.sh脚本,里面运行这个python程序,配置个相应的plist,每天自动执行(MAC OS)


先说说我踩过的坑:

  • 登陆post之后,返回的是200状态值,也就是成功,但是回帖post时,永远提示未登陆,肯定是cookie出了问题,但是requests是自动保持cookie和session的,吭吭哧哧大半天之后,crab大神一语点醒了我。仔细看抓包工具里的相关包,仔细看post之后,响应的content!!!这 个网站登陆数据post之后,响应的content里面有两个链接,乍一看,都是什么API…没猜错的话,这个链接就是种cookie的。从抓包工具 里也能清楚的看到,post登陆数据之后,立马连着两个get,请求的URL正是post之后,响应的content里面的那两个URL。

小知识点GET:

  • f = open(file,’r+’)  
  • f = open(file,’w+’)
  • 乍一看,r+ 跟 w+ 没区别,其实有很大区别:r+ 方式,文件必须存在,否则会报错,用r+方式写的时候,它是从头开始覆盖的,覆盖到哪里算哪里;而w+方式,文件不存在时会新建,写入的时候是全部清空再写入的。a+则是可读可写,并且是用追加方式写入的。



下面程序运行方法:python 路径/Auto_Login_Reply.py 用户名/密码,这种方式有个好处,不用改.py里面的用户名密码参数,直接带参数运行。

本程序是面向过程的,从头至尾,一气呵成。

python真好,既能面向对象,也能面向过程,灵活巧妙,赞!


#!/usr/bin/env python
#-*- coding:utf-8 -*-

__author__ = 'Sophie2805'

import re
import time
import requests
from bs4 import BeautifulSoup
import os.path
import sys

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
python Auto_Login_Reply.py user/pwd
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''

args = sys.argv[1]
#print args
username = args[0:args.find('/')]
pwd = args[args.find('/')+1:len(args)]
#print username , pwd

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
using log_list[] to log the whole process
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''

#print os.path.abspath('.')
log_list = []
log_list.append('+++++++++++++++++++++++++++++++++++++++++++++n')
log_list.append('++++挖财签到有礼'+(time.strftime("%m.%d %T"))+' 每天签到得铜钱++++n')
log_list.append('+++++++++++++++++++++++++++++++++++++++++++++n')

s = requests.Session()

agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Firefox/38.0'
connection = 'keep-alive'

s.headers. update({'User-Agent':agent,
                   'Connection':connection})

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
post login request to this URL, observed in Firebug
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''

login_url = 'https://www.wacai.com/user/user!login.action?cmd=null'

login_post_data ={
    'user.account':username,
    'user.pwd':pwd
}

try:
    login_r = s.post(login_url,login_post_data)
except Exception,e:
    log_list.append(time.strftime("%m.%d %T") + '--Login Exception: '+ e + '.n')
#print login_r.content

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
these two get() are very import!!!
login_r.content return these 2 api URLs.
Without getting these 2 URLs, the BBS will not take our session as already login.
I assume, getting these 2 URLs, some critical cookie will be returned.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''

src1 = login_r.content[login_r.content.find('src')+5:login_r.content.find('"></script>')]
src2 = login_r.content[login_r.content.rfind('src')+5:login_r.content.rfind('"></script><script>')]
#print src1
#print src2
s.get(src1)
s.get(src2)

base_url = 'http://bbs.wacai.com/'
homepage_r = s.get(base_url)
if '我的挖财' in homepage_r.content:
    log_list.append(time.strftime("%m.%d %T") + '--Successfully login.n')

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
find the checkin forum URL and ID, which is used as fid parameter in the reply post URL
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''
pattern = '<.+>签到有礼<.+>'
p = re.compile(pattern)
soup = BeautifulSoup(p.findall(homepage_r.content)[0])
checkin_postfix = soup.a['href']
checkin_forum_url = base_url+ checkin_postfix
#print checkin_forum_url
forum_id = checkin_postfix[checkin_postfix.find('-')+1:checkin_postfix.rfind('-')]

if forum_id != '':
    log_list.append(time.strftime("%m.%d %T") + '--Successfully find the checkin forum ID.n')

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
get the checkin forum portal page and find today's thread URL and ID, which is used as tid parameter in the reply post URL
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''
checkin_forum_page=s.get(checkin_forum_url)
#print checkin_forum_page.status_code
title = '签到有礼'+(time.strftime("%m.%d")+'0').strip('0')+'每天签到得铜钱,每人限回一次'
pattern_1 = '<.+>'+title + '<.+>'
p_1 = re.compile(pattern_1)
soup = BeautifulSoup(p_1.findall(checkin_forum_page.content)[0])
thread_postfix = soup.a['href']
thread_url = base_url + thread_postfix
thread_id= thread_postfix[thread_postfix.find('-')+1:thread_postfix.rfind('-')-2]
#print thread_id

if thread_id != '':
    log_list.append(time.strftime("%m.%d %T") + '--Successfully find the thread ID.n')
t = s.get(thread_url)

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
formhash is a must in the post data, observed in Firebug.
So get the formhash from the html of the page
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''
pattern_2 = '<input type="hidden" name="formhash" .+/>'
p_2 = re.compile(pattern_2)
soup = BeautifulSoup(p_2.findall(t.content)[0])
formhash = soup.input['value']

pattern_3 = '回帖内容必须为'+'.+'+'</font>非此内容将收回铜钱奖励'
result_3 = re.compile(pattern_3).findall(t.content)
#print result_3
key = result_3[0][result_3[0].find('>')+1:result_3[0].rfind('<')-1]
if key != '':
    log_list.append(time.strftime("%m.%d %T") + '--Successfully find the key word.n')

'''~~~~~~~
auto reply
~~~~~~~~~~'''

host='bbs.wacai.com'
s.headers.update({'Referer':thread_url})
s.headers.update({'Host':host})
reply_data={
    'formhash':formhash,
    'message':key,
    'subject':'',
    'usesig':''
}
reply_post_url = 'http://bbs.wacai.com/forum.php?mod=post&action=reply&fid='+forum_id+'&tid='+thread_id+'&extra=&replysubmit=yes&infloat=yes&handlekey=fastpost&inajax=1'
try:
    reply_r = s.post(reply_post_url,data=reply_data)
except Exception,e:
    log_list.append(time.strftime("%m.%d %T") + '--Reply exception: '+ e +'.n' )
if '非常感谢,回复发布成功,现在将转入主题页,请稍候……' in reply_r.content:#success
    log_list.append(time.strftime("%m.%d %T") + '--Successfully auto reply.n')
else:
    log_list.append(time.strftime("%m.%d %T") + '--Fail to reply: '+ reply_r.content + '.n')

'''~~~~~~~~~~~~~~
find my WaCai URL
~~~~~~~~~~~~~~~~~'''
pattern_4 = '<.+访问我的空间.+</a>'
p_4 = re.compile(pattern_4)
soup = BeautifulSoup(p_4.findall(t.content)[0])
if soup.a['href'] != '':
    log_list.append(time.strftime("%m.%d %T") + '--Successfully find my WaCai link.n' )
mywacai_url = base_url + soup.a['href']
mywacai_page = s.get(mywacai_url)

'''~~~~~~~~~~~~~
find my info URL
~~~~~~~~~~~~~~~~'''
pattern_5 = '<.+个人资料</a>'
p_5 = re.compile(pattern_5)
soup = BeautifulSoup(p_5.findall(mywacai_page.content)[0])
if soup.a['href'] != '':
    log_list.append(time.strftime("%m.%d %T") + '--Successfully find my info link.n' )
myinfo_url = base_url+ soup.a['href']
myinfo_page = s.get(myinfo_url)

'''~~~~~~~~~~~~~~
find my coin info
~~~~~~~~~~~~~~~~~'''
pattern_6 = '<em>铜钱.+n.+n'
p_6 = re.compile(pattern_6)
coin = p_6.findall(myinfo_page.content)[0]
coin = coin[coin.find('</em>')+5:coin.find('</li>')]
if int(coin.strip()) != 0:
    log_list.append(time.strftime("%m.%d %T") + '--Successfully get my coin amount: %s.n'% int(coin.strip()))
log_list.append('n')

'''~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
if log.txt does not exist under current executing path, create it.
write log, if the log file is larger than 100 lines, delete all then write from beginning
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'''

file = os.path.abspath('.')+'/log.txt'#get the absolute path of .py executing
if not os.path.isfile(file):#not exist, create a new one
    f = open(file,'w')
    f.close()

if os.path.getsize(file)/1024 > 1024:#larger than 1MB
    f = open(file,'w')
    try:
        f.write('')
    finally:
        f.close()

f = open(file,'a')#append
try:
    f.writelines(log_list)
finally:
    f.close()

最后是plist,mac电脑用这个配置定时任务,windows的话,写个bat,然后也可以配置的貌似。

先写个test.sh脚本,注意用chmod 777 test.sh给它赋予可执行的权限:

cd /Users/Sophie/PycharmProjects/Auto_Login_Reply_BBS_WaCai

python Auto_Login_Reply.py username/password


然后到如下路径

~/Library/LaunchAgents,新建个plist文件,文件名为:wacai.bbs.auto.login.reply.plist。

注 意label不要跟别的重复,写个特别点的,ProgramArguments里面写上test.sh的绝对路 径,StartCalendarInterval里面配置成几点几分自动执行,最后的StandardOutPath和 StandardErrorPath要不要都行,要更好,出错了可以看看错误信息。





    Label
    wacai.bbs.auto.login.reply
    ProgramArguments
    
        /Users/Sophie/PycharmProjects/Auto_Login_Reply_BBS_WaCai/test.sh
    
    StartCalendarInterval
    
        Minute
        30
        Hour
        1
    
StandardOutPath
/Users/Sophie/PycharmProjects/Auto_Login_Reply_BBS_WaCai/run.log
StandardErrorPath
/Users/Sophie/PycharmProjects/Auto_Login_Reply_BBS_WaCai/runerror.log


编辑好了之后

launchctl load wacai.bbs.auto.login.reply.plist 启用这个plist


launchctl start wacai.bbs.auto.login.reply 立即执行一次,注意,这里是那个label值,不带plist后缀的


修改plist之后,要launchctl unload …. 再 launchctl load…重新加载


还可以用launchctl list | grep wacai 来看看执行状态,一般,有了PID,并且status为0即为一切正常,否则,哪里有问题导致执行出了问题。


最后附上某财网站截图,希望页面不要频繁变动,不然我就得debug改脚本了 = =#

转载自演道,想查看更及时的互联网产品技术热点文章请点击http://go2live.cn