【Python】 urllibとBeautifulSoupのメモ

Pythonでクローラを作る際のメモ。

パーサーは「HTMLParser」ではなく、「BeautifulSoup」を使用。

【Example】

 1 # -*- coding: utf8 -*-
 2 import urllib.request
 3 from bs4 import BeautifulSoup
 4 
 5 import re
 6 
 7 if __name__ == '__main__':
 8     post_url = 'http://ggxrd.com/pg/member_record_battle_view.php'
 9     id = 1636
10     character_name = 'sol'
11     post_data = {
12         'user_id':id,
13         'character':character_name
14     }
15 
16     encode_post_data = urllib.parse.urlencode(post_data).encode(encoding='utf8')
17     page_text = ''
18     with urllib.request.urlopen(url=post_url, data=encode_post_data) as page:
19         for line in page.readlines():
20             page_text = page_text + line.decode('utf8')
21     soup = BeautifulSoup(page_text)
22     for li in soup.findAll('li'):
23         record = re.match("(.*)(vs.+％）)(.*)", str(li))
24         if not record is None:
25             print(record.group(2))

【Output】

vs ソル：295勝 10敗（96％）
vs カイ：204勝 1敗（99％）
vs メイ：65勝 1敗（98％）
vs ミリア：68勝 3敗（95％）
vs ザトー：77勝 14敗（84％）
vs ポチョムキン：18勝 1敗（94％）
vs チップ：25勝 2敗（92％）
vs ファウスト：93勝 37敗（71％）
vs アクセル：47勝 1敗（97％）
vs ヴェノム：21勝 2敗（91％）
vs スレイヤー：220勝 4敗（98％）
vs イノ：24勝 5敗（82％）
vs ベッドマン：31勝 1敗（96％）
vs ラムレザル：23勝 4敗（85％）

【Reference】