首页上一页 1 下一页尾页 1 条记录 1/1页
按照书上敲得代码,一直报错
发表在Python图书答疑
2021-03-20
《Python网络爬虫从入门到实践》第5章 请求模块requests 94页-0页
是否精华
是
否
版块置顶:
是
否
import requests from lxml import etree import pandas as pd ip_list = [] def get_ip(usl, fl): response = requests.get(usl, headers=fl) response.encoding = 'utf-8' if response.status_code == 200: html = etree.HTML(response.text) li_all = html.xpath('//li[@class="f-list col-lg-12 col-md-12 col-sm-12 col-xs-12"]') for j in li_all: ip = j.xpath('span[@class="f-address"]/text()') port = j.xpath('span[@class="f-port"]/text()') ip_list.append(ip + ':' + port) print('代理IP为:', ip, '对应端口为:', port) headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' 'AppleWebKit/537.36 (HTML, like Gecko) ' 'Chrome/89.0.4389.82 Safari/537.36'} if __name__ == '__main__': ip_table = pd.DataFrame(columns=['ip']) for i in range(1, 5): url = 'https://www.dieniao.com/FreeProxy/{page}.html'.format(page=i) get_ip(url, headers) ip_table['ip'] = ip_list ip_table.to_excel('ip.xlsx', sheet_name='data')
麻烦大佬们解决下
于2021-03-20 09:10:45编辑