scrapy 2.3 蜘蛛?yún)?shù)

2021-06-09 10:03 更新

通過(guò)使用 ?-a? 運(yùn)行它們時(shí)的選項(xiàng)：

scrapy crawl quotes -O quotes-humor.json -a tag=humor

這些論點(diǎn)被傳給蜘蛛 ?__init__? 方法并默認(rèn)成為spider屬性。

在本例中，為 ?tag? 參數(shù)將通過(guò) ?self.tag? . 您可以使用它使您的蜘蛛只獲取帶有特定標(biāo)記的引號(hào)，并基于以下參數(shù)構(gòu)建URL:：

import scrapy


class QuotesSpider(scrapy.Spider):
    name = "quotes"

    def start_requests(self):
        url = 'http://quotes.toscrape.com/'
        tag = getattr(self, 'tag', None)
        if tag is not None:
            url = url + 'tag/' + tag
        yield scrapy.Request(url, self.parse)

    def parse(self, response):
        for quote in response.css('div.quote'):
            yield {
                'text': quote.css('span.text::text').get(),
                'author': quote.css('small.author::text').get(),
            }

        next_page = response.css('li.next a::attr(href)').get()
        if next_page is not None:
            yield response.follow(next_page, self.parse)

如果你通過(guò) ?tag=humor? 對(duì)于這個(gè)蜘蛛，您會(huì)注意到它只訪問(wèn)來(lái)自 ?humor? 標(biāo)記，如 http://quotes.toscrape.com/tag/humor .

以上內(nèi)容是否對(duì)您有幫助：

← scrapy 2.3 數(shù)據(jù)抓取實(shí)例

scrapy 2.3 命令行工具 →

寫(xiě)筆記

我要補(bǔ)充

scrapy 2.3 蜘蛛?yún)?shù)

推薦文章

推薦教程

推薦課程