Python 爬虫小记

新闻 | 论坛 | 博客 | 在线研讨会

Python 爬虫小记

tanry111 | 2018-09-11 10:19:25 阅读：770

import urllib2

import re

def download(url):

print "Downloading:",url

try:

html=urllib2.urlopen(url).read()

except urllib2.URLError as e:

print "Download error:",e.reason

html=None

return html

def crawl_sitemap(url):

sitemap=download(url)

links=re.findall('<a href="(.*?)" title',sitemap)

txt=open("123.txt","w",)

print links

try:

for link in links:

html=download(link)

page=re.findall('<p>(.*?)</p>',html)

# print page

for i in page:

txt.write(i)

txt.write("\n")

except Exception as e:

print "Download error:",e

html=None

txt.close()

url="https://www.douban.com/group/shanghaizufang/"

crawl_sitemap(url)

*博客内容为网友个人发布，仅代表博主个人观点，如有侵权请联系工作人员删除。

tanry111的空间

最近文章

Python 爬虫小记
2018-09-11 10:19:25

Python 3D函数图像
2018-09-11 10:17:15

python 接收串口信息，并以二进制写入txt
2018-07-12 13:55:04

阅读更多文章，狠戳这里

推荐文章

PCB设计当我说到“灯芯效应”，台下的你们竟如此寂静 ……

1432534735 阅读：4535

图像采集卡和显卡有什么不同？深度解析两者的工作机制与使用场景

1744855079 阅读：9134

并联还是串联？MDD稳压二极管多颗配置的使用技巧与注意事项

1721030087 阅读：9612

干货丨Profinet转Canopen网关，让不同协议设备互联互通

1645687175 阅读：10178

最近访客

lantu