scrapy 2.3 協(xié)同程序

2021-06-17 16:51 更新

2.0 新版功能.

刮痧 partial support 對于 coroutine syntax .

支持的可調(diào)用項

以下可調(diào)用項可以定義為使用 ?async def? ，因此使用協(xié)同程序語法（例如。 ?await? ， ?async for? ， ?async with? ）：

?Request? 回調(diào)。注解在整個回調(diào)完成之前，不會處理回調(diào)輸出。作為副作用，如果回調(diào)引發(fā)異常，則不會處理其任何輸出。這是對當(dāng)前實現(xiàn)的一個已知警告，我們將在Scrapy的未來版本中解決這個問題。
這個 ?process_item()? 方法 item pipelines .
這個 ?process_request()? ， ?process_response()? 和 ?process_exception()? 方法 downloader middlewares .
Signal handlers that support deferreds .

使用

Scrapy中有幾個協(xié)同程序的用例。在為以前的垃圾版本（如下載程序中間件和信號處理程序）編寫時，會返回延遲的代碼可以重寫為更簡短、更干凈：

from itemadapter import ItemAdapter

class DbPipeline:
    def _update_item(self, data, item):
        adapter = ItemAdapter(item)
        adapter['field'] = data
        return item

    def process_item(self, item, spider):
        adapter = ItemAdapter(item)
        dfd = db.get_some_data(adapter['id'])
        dfd.addCallback(self._update_item, item)
        return dfd

變成：：

from itemadapter import ItemAdapter

class DbPipeline:
    async def process_item(self, item, spider):
        adapter = ItemAdapter(item)
        adapter['field'] = await db.get_some_data(adapter['id'])
        return item

異步協(xié)同程序可用于調(diào)用。這包括其他協(xié)同程序、返回延遲的函數(shù)和返回的函數(shù) awaitable objects 如 ?Future? . 這意味著您可以使用許多有用的Python庫來提供以下代碼：

class MySpider(Spider):
    # ...
    async def parse_with_deferred(self, response):
        additional_response = await treq.get('https://additional.url')
        additional_data = await treq.content(additional_response)
        # ... use response and additional_data to yield items and requests

    async def parse_with_asyncio(self, response):
        async with aiohttp.ClientSession() as session:
            async with session.get('https://additional.url') as additional_response:
                additional_data = await r.text()
        # ... use response and additional_data to yield items and requests

注解

例如許多類庫 aio-libs ，需要 asyncio 循環(huán)并使用它們你需要 enable asyncio support in Scrapy .

異步代碼的常見用例包括：

從網(wǎng)站、數(shù)據(jù)庫和其他服務(wù)（回調(diào)、管道和中間件）請求數(shù)據(jù)；
在數(shù)據(jù)庫中存儲數(shù)據(jù)（在管道和中間件中）；
將spider初始化延遲到某個外部事件（在 ?spider_opened? 經(jīng)辦人）；
調(diào)用諸如 ExecutionEngine.download （見 the screenshot pipeline example ）

以上內(nèi)容是否對您有幫助：

← scrapy 2.3 暫停和恢復(fù)爬行

scrapy 2.3 asyncio →

寫筆記

我要補(bǔ)充

scrapy 2.3 協(xié)同程序

支持的可調(diào)用項

使用

推薦文章

推薦教程

推薦課程