一、在豆瓣电影网爬取以下剧照保存到本地:
本次案例只爬取前 5 页的剧照,先获取前五页的链接:
for i in range(5):
url = 'https://movie.douban.com/subject/26794435/photos?type=S&start='+str(i*30)+'&sortby=like&size=a&subtype=a'
print(url)
由下图可知这些剧照是在 ul 标签下
# 导入相应的模块
import os
import requests
from bs4 import BeautifulSoup
url_list = []
for i in range(5):
url = 'https://movie.douban.com/subject/26794435/photos?type=S&start='+str(i*30)+'&sortby=like&size=a&subtype=a'
url_list.append(url)
imag_link = []
for u in url_list:
txt = requests.get(u).text # 网页的请求
soup = BeautifulSoup(txt,'lxml') # 网页的解析
tags = soup.find('ul',class_='poster-col3 clearfix').find_all('img') # 图片标签的采集
imag_link.extend(tags)
imgSrc = [x['src']for x in imag_link] # 图片链接的获取
# 保存图片到本地
n = 0
for s in imgSrc:
n += 1
i = requests.get(s)
with open('%s.jpg'%n,'wb') as f:
f.write(i.content)
三、可查看已保存下来的图片: