scrapy 2.3 在條件中使用文本節(jié)點(diǎn)

2021-06-03 14:37 更新

當(dāng)需要將文本內(nèi)容用作 XPath string function 避免使用 ?.//text()? and use just ?.? 相反。

這是因?yàn)楸磉_(dá)式 ?.//text()? 生成一個文本元素集合--a node-set . 當(dāng)一個節(jié)點(diǎn)集被轉(zhuǎn)換成一個字符串時，當(dāng)它作為參數(shù)傳遞給一個字符串函數(shù)（如 ?contains()? 或 ?starts-with()? ，它只為第一個元素生成文本。

例子：

>>> from scrapy import Selector
>>> sel = Selector(text='<a href="#">Click here to go to the <strong>Next Page</strong></a>')

轉(zhuǎn)換A node-set 字符串：

>>> sel.xpath('//a//text()').getall() # take a peek at the node-set
['Click here to go to the ', 'Next Page']
>>> sel.xpath("string(//a[1]//text())").getall() # convert it to string
['Click here to go to the ']

A node 但是，轉(zhuǎn)換為字符串后，會將其自身的文本加上其所有后代的文本組合在一起：

>>> sel.xpath("http://a[1]").getall() # select the first node
['<a href="#">Click here to go to the <strong>Next Page</strong></a>']
>>> sel.xpath("string(//a[1])").getall() # convert it to string
['Click here to go to the Next Page']

所以，使用 ?.//text()? 在這種情況下，節(jié)點(diǎn)集不會選擇任何內(nèi)容：

>>> sel.xpath("http://a[contains(.//text(), 'Next Page')]").getall()
[]

但是使用 ?.? 指的是節(jié)點(diǎn)：

>>> sel.xpath("http://a[contains(., 'Next Page')]").getall()
['<a href="#">Click here to go to the <strong>Next Page</strong></a>']

以上內(nèi)容是否對您有幫助：

← scrapy 2.3 節(jié)點(diǎn)之間的區(qū)別

scrapy 2.3 xpath表達(dá)式中的變量 →

寫筆記

我要補(bǔ)充

scrapy 2.3 在條件中使用文本節(jié)點(diǎn)

推薦文章

推薦教程

推薦課程