为了今晚的表白，我爬了知乎热门撩妹情话

2011 年 3 月 2 日

今天七夕，给大家整理了些有用的。我爬了下知乎上的撩妹情话，希望大家能够学以致用。

图片来自 Pexels

文末会放上我筛选过的 99 条撩妹情话。话不多说，先来说说爬虫思路。

我们来到知乎，搜索一下情话这个话题，我们注意到知乎的每个话题下都有一个精华回答的页面，这个精华页面其实是所有按点赞数排序的内容，赞数越高的排在越前面。

下面这个就是情话这个话题的的精华页面：

我们把情话这个话题下每一页的内容都爬下来，存在数据库里，然后从数据库筛选赞数比较高的那些回答，这样就完成了整个过程。

思路非常简单，代码也非常明了。具体代码如下：

def get_qinghua_by_page(page_no): 
    offset = page_no * 10 
    url =  "&limit=10&offset={}".format(offset) 
    headers = { 
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X  10_13_6) AppleWebKit/537.36 (KHTML, like Gecko)  Chrome/69.0.3497.100 Safari/537.36", 
    } 
    r = requests.get(url, verify=False, headers=headers) 
    content = r.content.decode("utf-8") 
    data = json.loads(content) 
    is_end = data["paging"]["is_end"] 
    items = data["data"] 
    client = pymongo.MongoClient() 
    db = client["qinghua"] 
    if len(items) > 0: 
        db.answers.insert_many(items) 
    return is_end 
 
def get_qinghua(): 
    page_no = 0 
    client = pymongo.MongoClient() 
    db = client["qinghua"] 
    while True: 
        print(page_no) 
        is_end = get_qinghua_by_page(page_no) 
        page_no += 1 
        if is_end: 
            break

我们定义了 2 个函数：

get_qinghua_by_page：用于爬取一个页面下的情话。
get_qinghua： 用于获取所有页面下的情话。

我们执行上面的 get_qinghua 函数，就可以爬取所有的情话内容了。

我运行了下上面的程序，爬取了情话话题下的所有内容，对这些内容做了些整理，从中精选了 99 条，整理在下面。

建议大家经常阅读，牢记在心，活学活用，在适当的氛围下，真情流露：

M	T	W	T	F	S	S
« Jan
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

演道网

为了今晚的表白，我爬了知乎热门撩妹情话

About The Author

shine

Related Posts

About The Author

shine