社区

京东店铺
淘宝店铺
当当店铺
拼多多店铺

APP下载

扫描下载官方APP

VIP会员

Python图书答疑

+关注

已有184人关注

+发表新帖

所属版块： Python答疑区

天天C向上

学分:69

+关注

私信

立即签到

版块推荐

Python爬虫XPath问题

发表在Python图书答疑 2020-02-20

悬赏：30 学分《Python编程入门指南》第16章网络爬虫框架 270页-271页

是否精华是否

版块置顶: 是否

Python版本：3.8（64-bit）

系统版本：window10专业版 1909

爬虫框架：scrapy

爬虫对象：h ttps://www.ixigua.com/home/2352595849457134/video/

源代码

import scrapy  # 导入框架
import twisted


class QuotesSpider(scrapy.Spider):
    name = "quotes"  # 定义爬虫名称

    def start_requests(self):
        # 设置爬取目标的地址
        urls = [
            'https://www.ixigua.com/home/2352595849457134/video/',
        ]
        # 获取所有地址，有几个地址发送几次请求
        for url in urls:
            # 发送网络请求
            yield scrapy.Request(url=url, callback=self.parse)

    # 响应信息
    def parse(self, response):
        # 获取所有信息
        for quote in response.xpath("[class=/'HorizontalFeedCard']"):
            # 题目
            text = quote.xpath(".//*[@class='HorizontalFeedCard__title']/text()").extract_first()
            # 获取时间
            author = quote.xpath(".//*[@class='HorizontalFeedCard-accessories-bottomInfo__statistics']").extract_first()
            print(dict(text=text, author=author))


# 导入CrawlerProcess类
from scrapy.crawler import CrawlerProcess
# 导入获取项目设置信息
from scrapy.utils.project import get_project_settings


# 程序入口
if __name__=='__main__':
    # 创建CrawlerProcess类对象并传入项目设置信息参数
    process = CrawlerProcess(get_project_settings())
    # 设置需要启动的爬虫名称
    process.crawl('quotes')
    # 启动爬虫
    process.start()