阅读更多
初学Python爬虫时都会从最简单的方法开始,以下为几种常见的基础做法。
""" 简单的循环处理 """ import requests url_list = [ "https://www.baidu.com" , "https://www.cnblogs.com/" ] for url in url_list: result = requests.get(url) print (result.text) """ 线程池处理 """ import requests from concurrent.futures import ThreadPoolExecutor def fetch_request ( url ): result = requests.get( url ) print (result.text) url_list = [ "https://www.baidu.com/" , "https://www.cnblogs.com/" ] pool = ThreadPoolExecutor( 10 ) for url in url_list: # 线程池中获取线程,执行fetch_request方法 pool.submit(fetch_request , url) # 关闭线程池 pool.shutdown() """ 线程池+回调函数 """ import requests from concurrent.futures import ThreadPoolExecutor def fetch_async ( url ): response = requests.get( url ) return response def callback ( future ): print ( future .result().text) url_list = [ "https://www.baidu.com/" , "https://www.cnblogs.com/" ] pool = ThreadPoolExecutor( 10 ) for url in url_list: v = pool.submit(fetch_async , url) # 调用回调函数 v.add_done_callback(callback) pool.shutdown() """ 进程池处理 """ import requests from concurrent.futures import ProcessPoolExecutor def fetch_requst ( url ): result = requests.get( url ) print (result.text) url_list = [ "https://www.baidu.com/" , "https://www.cnblogs.com/" ] if __name__ == '__main__' : pool = ProcessPoolExecutor( max_workers = 10 ) for url in url_list: pool.submit(fetch_requst , url) pool.shutdown() """ 进程池+回调函数 """ import requests from concurrent.futures import ProcessPoolExecutor def fetch_async ( url ): response = requests.get( url ) return response def callback ( future ): print ( future .result().text) url_list = [ "https://www.baidu.com/" , "https://www.cnblogs.com/" ] if __name__ == '__main__' : pool = ProcessPoolExecutor( 10 ) for url in url_list: v = pool.submit(fetch_async , url) v.add_done_callback(callback) pool.shutdown()
更多文章、技术交流、商务合作、联系博主
微信扫码或搜索:z360901061
微信扫一扫加我为好友
QQ号联系: 360901061
您的支持是博主写作最大的动力,如果您喜欢我的文章,感觉我的文章对您有帮助,请用微信扫描下面二维码支持博主2元、5元、10元、20元等您想捐的金额吧,狠狠点击下面给点支持吧,站长非常感激您!手机微信长按不能支付解决办法:请将微信支付二维码保存到相册,切换到微信,然后点击微信右上角扫一扫功能,选择支付二维码完成支付。
【本文对您有帮助就好】元