11.1 作為客戶端與HTTP服務交互

2018-02-24 15:27 更新

問題

你需要通過HTTP協(xié)議以客戶端的方式訪問多種服務。例如，下載數(shù)據(jù)或者與基于REST的API進行交互。

解決方案

對于簡單的事情來說，通常使用?urllib.request?模塊就夠了。例如，發(fā)送一個簡單的HTTP GET請求到遠程的服務上，可以這樣做：

from urllib import request, parse

# Base URL being accessed
url = 'http://httpbin.org/get'

# Dictionary of query parameters (if any)
parms = {
   'name1' : 'value1',
   'name2' : 'value2'
}

# Encode the query string
querystring = parse.urlencode(parms)

# Make a GET request and read the response
u = request.urlopen(url+'?' + querystring)
resp = u.read()

如果你需要使用POST方法在請求主體中發(fā)送查詢參數(shù)，可以將參數(shù)編碼后作為可選參數(shù)提供給urlopen()?函數(shù)，就像這樣：

from urllib import request, parse

# Base URL being accessed
url = 'http://httpbin.org/post'

# Dictionary of query parameters (if any)
parms = {
   'name1' : 'value1',
   'name2' : 'value2'
}

# Encode the query string
querystring = parse.urlencode(parms)

# Make a POST request and read the response
u = request.urlopen(url, querystring.encode('ascii'))
resp = u.read()

如果你需要在發(fā)出的請求中提供一些自定義的HTTP頭，例如修改?user-agent?字段,可以創(chuàng)建一個包含字段值的字典，并創(chuàng)建一個Request實例然后將其傳給?urlopen()?，如下：

from urllib import request, parse
...

# Extra headers
headers = {
    'User-agent' : 'none/ofyourbusiness',
    'Spam' : 'Eggs'
}

req = request.Request(url, querystring.encode('ascii'), headers=headers)

# Make a request and read the response
u = request.urlopen(req)
resp = u.read()

如果需要交互的服務比上面的例子都要復雜，也許應該去看看 requests 庫（https://pypi.python.org/pypi/requests）。例如，下面這個示例采用requests庫重新實現(xiàn)了上面的操作：

import requests

# Base URL being accessed
url = 'http://httpbin.org/post'

# Dictionary of query parameters (if any)
parms = {
   'name1' : 'value1',
   'name2' : 'value2'
}

# Extra headers
headers = {
    'User-agent' : 'none/ofyourbusiness',
    'Spam' : 'Eggs'
}

resp = requests.post(url, data=parms, headers=headers)

# Decoded text returned by the request
text = resp.text

關于requests庫，一個值得一提的特性就是它能以多種方式從請求中返回響應結果的內容。從上面的代碼來看，?resp.text?帶給我們的是以Unicode解碼的響應文本。但是，如果去訪問?resp.content，就會得到原始的二進制數(shù)據(jù)。另一方面，如果訪問?resp.json?，那么就會得到JSON格式的響應內容。

下面這個示例利用?requests?庫發(fā)起一個HEAD請求，并從響應中提取出一些HTTP頭數(shù)據(jù)的字段：

import requests

resp = requests.head('http://www.python.org/index.html')

status = resp.status_code
last_modified = resp.headers['last-modified']
content_type = resp.headers['content-type']
content_length = resp.headers['content-length']

Here is a requests example that executes a login into the Python Package index using
basic authentication:
import requests

resp = requests.get('http://pypi.python.org/pypi?:action=login',
                    auth=('user','password'))

Here is an example of using requests to pass HTTP cookies from one request to the
next:

import requests

# First request
resp1 = requests.get(url)
...

# Second requests with cookies received on first requests
resp2 = requests.get(url, cookies=resp1.cookies)

Last, but not least, here is an example of using requests to upload content:

import requests
url = 'http://httpbin.org/post'
files = { 'file': ('data.csv', open('data.csv', 'rb')) }

r = requests.post(url, files=files)

討論

對于真的很簡單HTTP客戶端代碼，用內置的?urllib?模塊通常就足夠了。但是，如果你要做的不僅僅只是簡單的GET或POST請求，那就真的不能再依賴它的功能了。這時候就是第三方模塊比如requests?大顯身手的時候了。

例如，如果你決定堅持使用標準的程序庫而不考慮像?requests?這樣的第三方庫，那么也許就不得不使用底層的?http.client?模塊來實現(xiàn)自己的代碼。比方說，下面的代碼展示了如何執(zhí)行一個HEAD請求：

from http.client import HTTPConnection
from urllib import parse

c = HTTPConnection('www.python.org', 80)
c.request('HEAD', '/index.html')
resp = c.getresponse()

print('Status', resp.status)
for name, value in resp.getheaders():
    print(name, value)

同樣地，如果必須編寫涉及代理、認證、cookies以及其他一些細節(jié)方面的代碼，那么使用?urllib?就顯得特別別扭和啰嗦。比方說，下面這個示例實現(xiàn)在Python包索引上的認證：

import urllib.request

auth = urllib.request.HTTPBasicAuthHandler()
auth.add_password('pypi','http://pypi.python.org','username','password')
opener = urllib.request.build_opener(auth)

r = urllib.request.Request('http://pypi.python.org/pypi?:action=login')
u = opener.open(r)
resp = u.read()

# From here. You can access more pages using opener
...

坦白說，所有的這些操作在?requests?庫中都變得簡單的多。

在開發(fā)過程中測試HTTP客戶端代碼常常是很令人沮喪的，因為所有棘手的細節(jié)問題都需要考慮（例如cookies、認證、HTTP頭、編碼方式等）。要完成這些任務，考慮使用httpbin服務（http://httpbin.org）。這個站點會接收發(fā)出的請求，然后以JSON的形式將相應信息回傳回來。下面是一個交互式的例子：

>>> import requests
>>> r = requests.get('http://httpbin.org/get?name=Dave&n=37',
...     headers = { 'User-agent': 'goaway/1.0' })
>>> resp = r.json
>>> resp['headers']
{'User-Agent': 'goaway/1.0', 'Content-Length': '', 'Content-Type': '',
'Accept-Encoding': 'gzip, deflate, compress', 'Connection':
'keep-alive', 'Host': 'httpbin.org', 'Accept': '*/*'}
>>> resp['args']
{'name': 'Dave', 'n': '37'}
>>>

在要同一個真正的站點進行交互前，先在 httpbin.org 這樣的網站上做實驗常常是可取的辦法。尤其是當我們面對3次登錄失敗就會關閉賬戶這樣的風險時尤為有用（不要嘗試自己編寫HTTP認證客戶端來登錄你的銀行賬戶）。

盡管本節(jié)沒有涉及，?request?庫還對許多高級的HTTP客戶端協(xié)議提供了支持，比如OAuth。requests?模塊的文檔（http://docs.python-requests.org)質量很高（坦白說比在這短短的一節(jié)的篇幅中所提供的任何信息都好），可以參考文檔以獲得更多地信息。

以上內容是否對您有幫助：

← 第十一章：網絡與Web編程

11.2 創(chuàng)建TCP服務器 →

寫筆記

我要補充

11.1 作為客戶端與HTTP服務交互

問題

解決方案

討論

推薦文章

推薦教程

推薦課程