Python 爬虫小记

新闻 | 论坛 | 博客 | 在线研讨会

Python 爬虫小记

tanry111 | 2018-09-11 10:19:25 阅读：740

import urllib2

import re

def download(url):

print "Downloading:",url

try:

html=urllib2.urlopen(url).read()

except urllib2.URLError as e:

print "Download error:",e.reason

html=None

return html

def crawl_sitemap(url):

sitemap=download(url)

links=re.findall('<a href="(.*?)" title',sitemap)

txt=open("123.txt","w",)

print links

try:

for link in links:

html=download(link)

page=re.findall('<p>(.*?)</p>',html)

# print page

for i in page:

txt.write(i)

txt.write("\n")

except Exception as e:

print "Download error:",e

html=None

txt.close()

url="https://www.douban.com/group/shanghaizufang/"

crawl_sitemap(url)

*博客内容为网友个人发布，仅代表博主个人观点，如有侵权请联系工作人员删除。

tanry111的空间

最近文章

Python 爬虫小记
2018-09-11 10:19:25

Python 3D函数图像
2018-09-11 10:17:15

python 接收串口信息，并以二进制写入txt
2018-07-12 13:55:04

阅读更多文章，狠戳这里

推荐文章

新能源汽车充电桩选型以及安装应用

1682245215 阅读：3882

YOLOv5 的量化流程及部署方法

1725428951 阅读：9520

为什么光耦固态继电器（SSR）值得关注？

1645262788 阅读：11627

集成电路电磁兼容性及应对措施相关分析（三）—集成电路ESD 测试与分析

1500536942 阅读：13698

最近访客

lantu