Golang协程与Python协程速度比较
2013 年 1 月 29 日
本实验通过抓取50页诗词并做对每一页里面的a标签对应的html页面(每页40个a标签)进行抓取,然后对html做简单解析,总共就是请求50+50*40 = 2050个页面,并解析这些页面的html内容。
1.Python 速度
总耗时: 31.947秒
多次试验是在32s左右

image.png

image.png

image.png
源码:
from bs4 import BeautifulSoup import time import aiohttp import asyncio async def do_task(domain, pageUrl): async with aiohttp.ClientSession() as session: async with session.request('GET', pageUrl) as resp: if resp.status != 200: raise Exception('http error, url:{} code:{}'.format(pageUrl, resp.status)) html = await resp.read() # 可直接获取bytes soup = BeautifulSoup(html, 'html.parser') for h in soup.select('h3>a'): url = ''.join([domain, h.get('href')]) async with aiohttp.ClientSession() as session: async with session.request('GET', url) as resp: if resp.status != 200: raise Exception('http error, url:{} code:{}'.format(pageUrl, resp.status)) html = await resp.read() # 可直接获取bytes print('url:{} title:{}'.format(url, parse_text(html))) def parse_text(html): soup = BeautifulSoup(html, 'html.parser') return str(soup.select('.shici-title')[0].get_text()) def main(): domain = 'http://www.shicimingju.com' urlTemplate = domain + '/chaxun/zuozhe/9_{0}.html' pageNum = 50 # 读取50页诗词进行测试 loop = asyncio.get_event_loop() # 获取事件循环 tasks = [] for num in range(pageNum + 1): tasks.append(do_task(domain, urlTemplate.format(num + 1))) loop.run_until_complete(asyncio.wait(tasks)) # 协程 loop.close() if __name__ == '__main__': start = time.time() main() # 调用方 print('总耗时:%.3f秒' % float(time.time() - start))
2. Golang 速度
总耗时: 15.366秒
多次试验基本是在15s左右,最快的几次甚至到了12s,13s,最差也是22s。

image.png

image.png

image.png
源码:
package main import ( "fmt" "github.com/PuerkitoBio/goquery" "strconv" "strings" "sync" "time" ) func do_task(url string, domain string) { p, err := goquery.NewDocument(url) if err != nil { panic(err) } else { p.Find("h3").Find("a").Each(func(i int, selection *goquery.Selection) { href, _ := selection.Attr("href") link := domain + href h, err := goquery.NewDocument(link) if err != nil { panic(err) } else { title := h.Find(".shici-title").Text() fmt.Printf("url:%s title:%s \n", link, title) } }) } wg.Done() } var wg sync.WaitGroup func main() { start := time.Now().UnixNano() domain := "http://www.shicimingju.com" urlTemplate := domain + "/chaxun/zuozhe/9_{:num}.html" pageNum := 50 wg.Add(50) for page := 1; page <= pageNum; page++ { url := strings.Replace(urlTemplate, "{:num}", strconv.Itoa(page), -1) go do_task(url, domain) } wg.Wait() end := time.Now().UnixNano() fmt.Printf("总耗时:%.3f秒 \n", float32(end - start)/1000000000) }
3.结论
Golang的性能妥妥地要比Python,PHP好,毕竟是为并发而生的语言。