python 下载小说 - 军军小站|张军博客

使用beautifulsoup 爬取小说，并整合到txt中。

            
              """
======================
@Auther:CacheYu
@Time:2019/9/16:16:09
======================
"""
# -*- coding:utf-8 -*-
import urllib.request
import urllib.error
import bs4
from bs4 import BeautifulSoup

def readdown(url):
    soup = BeautifulSoup(urllib.request.urlopen(url), 'html.parser')
    fixed_html = soup.prettify()
    table = soup.find('table', attrs={'id': 'tabletxt'})
    # # if isinstance(table, bs4.element.Tag):
    # tds = table.find_all('td')
    i = table.find('i').string
    print(i)
    div = table.find_all('div', attrs={'class': 'txt'})
    content = div[0].get_text().strip()
    couple = i + '\n' + content
    return couple

page_url = 'https://www.dushiyanqing.net/book/90/90659/index.html'
book = r'E:\story\谁把风声听成离别歌.txt'

soup = BeautifulSoup(urllib.request.urlopen(page_url), 'html.parser')
fixed_html = soup.prettify()

table = soup.find('table')
if isinstance(table, bs4.element.Tag):
    tds = table.find_all('td', attrs={'class': 'k4'})
    default_encode = 'utf-8'

    print('开始写入，请稍等……')
    with open(book, 'r+', encoding=default_encode) as target_file_writer:
        for td in tds:
            a = td.find('a')
            if a is not None:
                href = 'https://www.dushiyanqing.net' + a.get('href')
                # print(href)
                target_file_writer.write(readdown(href))
                # time.sleep(random.randint(5, 10))
    print('已完成！\n目录地址为：', book)

更多文章、技术交流、商务合作、联系博主

微信扫码或搜索：z360901061

微信扫一扫加我为好友

QQ号联系： 360901061

您的支持是博主写作最大的动力，如果您喜欢我的文章，感觉我的文章对您有帮助，请用微信扫描下面二维码支持博主2元、5元、10元、20元等您想捐的金额吧，狠狠点击下面给点支持吧，站长非常感激您！手机微信长按不能支付解决办法：请将微信支付二维码保存到相册，切换到微信，然后点击微信右上角扫一扫功能，选择支付二维码完成支付。

【本文对您有帮助就好】元

2元

5元

10元

20元

自定义