2024欧洲杯买胜负手机APP?,中文字幕成线人熟女,特级小箩利无码毛片

原文： PyTorch torch.cuda

該軟件包增加了對 CUDA 張量類型的支持，該類型實現(xiàn)與 CPU 張量相同的功能，但是它們利用 GPU 進行計算。

它是延遲初始化的，因此您始終可以導入它，并使用 is_available() 確定您的系統(tǒng)是否支持 CUDA。

CUDA 語義具有有關(guān)使用 CUDA 的更多詳細信息。

torch.cuda.current_blas_handle()?

返回 cublasHandle_t 指向當前 cuBLAS 句柄的指針

torch.cuda.current_device()?

返回當前所選設備的索引。

torch.cuda.current_stream(device=None)?

返回給定設備的當前選擇的 Stream 。

參數(shù)

設備 (torch設備 或 python：int ，可選 )–所選設備。如果 device 為None(默認值），則返回由 current_device() 給定的當前設備當前選擇的 Stream 。

torch.cuda.default_stream(device=None)?

返回給定設備的默認 Stream 。

Parameters

設備 (torch設備 或 python：int ，可選 )–所選設備。如果 device 為None(默認值），則返回由 current_device() 給定的當前設備的默認 Stream 。

class torch.cuda.device(device)?

更改所選設備的上下文管理器。

Parameters

設備 (torch設備 或 python：int )–選擇的設備索引。如果此參數(shù)為負整數(shù)或None，則為空。

torch.cuda.device_count()?

返回可用的 GPU 數(shù)量。

class torch.cuda.device_of(obj)?

將當前設備更改為給定對象的上下文管理器。

您可以將張量和存儲都用作參數(shù)。如果未在 GPU 上分配給定對象，則為空操作。

Parameters

obj (tensor 或存儲）–在所選設備上分配的對象。

torch.cuda.get_device_capability(device=None)?

獲取設備的 CUDA 功能。

Parameters

設備 (torch設備 或 python：int ，可選 )–要為其返回設備功能的設備。如果此參數(shù)為負整數(shù)，則此函數(shù)為空操作。如果 device 為None，則使用 current_device() 給定的當前設備。

退貨

設備的主要和次要 CUDA 功能

返回類型

元組(int，int）

torch.cuda.get_device_name(device=None)?

獲取設備的名稱。

Parameters

設備 (torch設備 或 python：int ， 可選 )–要為其返回名稱的設備。如果此參數(shù)為負整數(shù)，則此函數(shù)為空操作。如果 device 為None，則使用 current_device() 給定的當前設備。

torch.cuda.init()?

初始化 PyTorch 的 CUDA 狀態(tài)。如果您通過 PyTorch 的 C API 與 PyTorch 進行交互，則可能需要顯式調(diào)用此方法，因為在進行初始化之前，CUDA 功能的 Python 綁定才可以。普通用戶不需要此，因為所有 PyTorch 的 CUDA 方法都會自動按需初始化 CUDA 狀態(tài)。

如果 CUDA 狀態(tài)已經(jīng)初始化，則不執(zhí)行任何操作。

torch.cuda.ipc_collect()?

CUDA IPC 釋放后，F(xiàn)orce 將收集 GPU 內(nèi)存。

注意

檢查是否可以從內(nèi)存中清除任何已發(fā)送的 CUDA 張量。如果沒有活動計數(shù)器，則強制關(guān)閉用于引用計數(shù)的共享內(nèi)存文件。當生產(chǎn)者進程停止主動發(fā)送張量并希望釋放未使用的內(nèi)存時，此選項很有用。

torch.cuda.is_available()?

返回一個布爾值，指示 CUDA 當前是否可用。

torch.cuda.is_initialized()?

返回 PyTorch 的 CUDA 狀態(tài)是否已初始化。

torch.cuda.set_device(device)?

設置當前設備。

不推薦使用此功能，而推薦使用 device 。在大多數(shù)情況下，最好使用CUDA_VISIBLE_DEVICES環(huán)境變量。

Parameters

設備 (torch設備 或 python：int )–選定的設備。如果此參數(shù)為負，則此函數(shù)為空操作。

torch.cuda.stream(stream)?

選擇給定流的上下文管理器。

在其上下文中排隊的所有 CUDA 內(nèi)核都將排隊在選定的流上。

Parameters

流 (流)–選擇的流。如果經(jīng)理是None，則為空手。

Note

流是按設備的。如果所選的流不在當前設備上，則此功能還將更改當前設備以匹配該流。

torch.cuda.synchronize(device=None)?

等待 CUDA 設備上所有流中的所有內(nèi)核完成。

Parameters

設備 (torch設備 或 python：int ， 可選 )–要同步的設備。如果 device 為None，則使用 current_device() 給定的當前設備。

隨機數(shù)發(fā)生器

torch.cuda.get_rng_state(device='cuda')?

以 ByteTensor 的形式返回指定 GPU 的隨機數(shù)生成器狀態(tài)。

Parameters

設備 (torch設備 或 python：int ， 可選 )–返回 RNG 狀態(tài)的設備。默認值：'cuda'(即，當前 CUDA 設備torch.device('cuda')）。

警告

該函數(shù)會急切地初始化 CUDA。

torch.cuda.get_rng_state_all()?

返回表示所有設備的隨機數(shù)狀態(tài)的 ByteTensor 元組。

torch.cuda.set_rng_state(new_state, device='cuda')?

設置指定 GPU 的隨機數(shù)生成器狀態(tài)。

Parameters

new_state (torch.ByteTensor )–所需狀態(tài)
設備 (torch設備 或 python：int ， 可選 )–設置 RNG 狀態(tài)的設備。默認值：'cuda'(即，當前 CUDA 設備torch.device('cuda')）。

torch.cuda.set_rng_state_all(new_states)?

設置所有設備的隨機數(shù)生成器狀態(tài)。

Parameters

new_state (Torch.ByteTensor 的元組）–每個設備的所需狀態(tài)

torch.cuda.manual_seed(seed)?

設置種子以為當前 GPU 生成隨機數(shù)。如果沒有 CUDA，則可以安全地調(diào)用此函數(shù)；在這種情況下，它會被靜默忽略。

Parameters

種子 (python：int )–所需的種子。

Warning

如果您使用的是多 GPU 模型，則此功能不足以獲得確定性。要播種所有 GPU，請使用 manual_seed_all() 。

torch.cuda.manual_seed_all(seed)?

設置用于在所有 GPU 上生成隨機數(shù)的種子。如果沒有 CUDA，則可以安全地調(diào)用此函數(shù)；在這種情況下，它會被靜默忽略。

Parameters

seed (python:int) – The desired seed.

torch.cuda.seed()?

將用于生成隨機數(shù)的種子設置為當前 GPU 的隨機數(shù)。如果沒有 CUDA，則可以安全地調(diào)用此函數(shù)；在這種情況下，它會被靜默忽略。

Warning

如果您使用的是多 GPU 模型，則此功能將僅在一個 GPU 上初始化種子。要初始化所有 GPU，請使用 seed_all() 。

torch.cuda.seed_all()?

將在所有 GPU 上生成隨機數(shù)的種子設置為隨機數(shù)。如果沒有 CUDA，則可以安全地調(diào)用此函數(shù)；在這種情況下，它會被靜默忽略。

torch.cuda.initial_seed()?

返回當前 GPU 的當前隨機種子。

Warning

This function eagerly initializes CUDA.

傳播集體

torch.cuda.comm.broadcast(tensor, devices)?

向多個 GPU 廣播張量。

Parameters

張量 (tensor)–張量要廣播。
設備(可迭代）–可以在其中廣播的設備的可迭代方式。請注意，它應該像(src，dst1，dst2，…），其第一個元素是要從中廣播的源設備。

Returns

包含tensor副本的元組，放置在與devices的索引相對應的設備上。

torch.cuda.comm.broadcast_coalesced(tensors, devices, buffer_size=10485760)?

將序列張量廣播到指定的 GPU。首先將小張量合并到緩沖區(qū)中以減少同步次數(shù)。

Parameters

張量(序列）–要廣播的張量。
devices (Iterable) – an iterable of devices among which to broadcast. Note that it should be like (src, dst1, dst2, …), the first element of which is the source device to broadcast from.
buffer_size (python：int )–用于合并的緩沖區(qū)的最大大小

Returns

A tuple containing copies of the tensor, placed on devices corresponding to indices from devices.

torch.cuda.comm.reduce_add(inputs, destination=None)?

來自多個 GPU 的張量求和。

所有輸入應具有匹配的形狀。

Parameters

輸入(可迭代 [ tensor ] )–可累加的張量。
目標 (python：int ，可選）–將放置輸出的設備(默認值：當前設備）。

Returns

包含所有輸入的元素和的張量，放置在destination設備上。

torch.cuda.comm.scatter(tensor, devices, chunk_sizes=None, dim=0, streams=None)?

在多個 GPU 上分散張量。

Parameters

張量 (tensor)–張量散布。
設備(可迭代 [ python：int ] )–可迭代的 int，指定張量在哪個設備中應該分散。
chunk_sizes (可迭代 [ python：int ] ，可選））–每個設備上要放置的塊的大小。它的長度應與devices相匹配，并且總和應等于tensor.size(dim)。如果未指定，則張量將分為相等的塊。
暗淡的 (python：int ，可選）–沿張量分塊的尺寸。

Returns

包含tensor塊的元組，分布在給定的devices中。

torch.cuda.comm.gather(tensors, dim=0, destination=None)?

收集來自多個 GPU 的張量。

與dim不同的所有維度中的張量大小必須匹配。

Parameters

張量(可迭代 [ tensor ] )–張量的可迭代集合。
暗淡的 (python：int )–將張量連接在一起的尺寸。
目標 (python：int ，可選）–輸出設備(-1 表示 CPU，默認值：當前設備）

Returns

位于destination設備上的張量，這是tensors與dim并置的結(jié)果。

流和事件

class torch.cuda.Stream?

CUDA 流周圍的包裝器。

CUDA 流是屬于特定設備的線性執(zhí)行序列，獨立于其他流。有關(guān)詳細信息，請參見 CUDA 語義。

Parameters

設備 (torch設備 或 python：int ， 可選 )–在其上分配流的設備。如果 device 為None(默認值）或負整數(shù)，則將使用當前設備。
優(yōu)先級 (python：int ，可選）–流的優(yōu)先級。較低的數(shù)字表示較高的優(yōu)先級。

query()?

檢查所有提交的工作是否已完成。

Returns

一個布爾值，指示該流中的所有內(nèi)核是否已完成。

record_event(event=None)?

記錄事件。

Parameters

事件 (事件 ， 可選）–記錄事件。如果未給出，將分配一個新的。

Returns

記錄的事件。

synchronize()?

等待此流中的所有內(nèi)核完成。

Note

這是對cudaStreamSynchronize()的包裝：有關(guān)更多信息，請參見 CUDA 流文檔。

wait_event(event)?

使所有將來提交到流的工作都等待事件。

Parameters

事件 (事件)–等待的事件。

Note

這是對cudaStreamWaitEvent()的包裝：有關(guān)更多信息，請參見 CUDA 流文檔。

該函數(shù)無需等待event就返回：僅影響以后的操作。

wait_stream(stream)?

與另一個流同步。

提交給該流的所有將來的工作將等到調(diào)用完成時提交給給定流的所有內(nèi)核。

Parameters

流 (流)–要同步的流。

Note

該函數(shù)返回而無需等待 stream 中當前排隊的內(nèi)核：僅影響將來的操作。

class torch.cuda.Event?

CUDA 事件的包裝器。

CUDA 事件是同步標記，可用于監(jiān)視設備的進度，準確測量時序并同步 CUDA 流。

當?shù)谝淮斡涗浽撌录驅(qū)⑵鋵С龅搅硪粋€進程時，基礎 CUDA 事件將被延遲初始化。創(chuàng)建后，只有同一設備上的流才能記錄該事件。但是，任何設備上的流都可以等待事件。

Parameters

enable_timing (bool ，可選）–指示事件是否應該測量時間(默認值：False）
阻止 (bool ，可選）–如果True， wait() 將被阻止(默認：False）
進程間 (bool )–如果True，則事件可以在進程之間共享(默認值：False）

elapsed_time(end_event)?

返回記錄事件之后到記錄 end_event 之前經(jīng)過的時間(以毫秒為單位）。

classmethod from_ipc_handle(device, handle)?

從給定設備上的 IPC 句柄重構(gòu)事件。

ipc_handle()?

返回此事件的 IPC 句柄。如果尚未錄制，則該事件將使用當前設備。

query()?

檢查事件當前捕獲的所有工作是否已完成。

Returns

一個布爾值，指示當前由事件捕獲的所有工作是否已完成。

record(stream=None)?

在給定的流中記錄事件。

如果未指定流，則使用torch.cuda.current_stream()。流的設備必須與活動的設備匹配。

synchronize()?

等待事件完成。

等待直到此事件中當前捕獲的所有工作完成。這樣可以防止 CPU 線程在事件完成之前繼續(xù)執(zhí)行。

Note

這是cudaEventSynchronize()的包裝：有關(guān)更多信息，請參見 CUDA 事件文檔。

wait(stream=None)?

使所有將來提交給定流的工作都等待此事件。

如果未指定流，則使用torch.cuda.current_stream()。

內(nèi)存管理

torch.cuda.empty_cache()?

釋放當前由緩存分配器保留的所有未占用的緩存內(nèi)存，以便這些內(nèi)存可在其他 GPU 應用程序中使用，并在 <cite>nvidia-smi</cite> 中可見。

Note

empty_cache() 不會增加 PyTorch 可用的 GPU 內(nèi)存量。但是，在某些情況下，它可能有助于減少 GPU 內(nèi)存的碎片。有關(guān) GPU 內(nèi)存管理的更多詳細信息，請參見內(nèi)存管理。

torch.cuda.memory_stats(device=None)?

返回給定設備的 CUDA 內(nèi)存分配器統(tǒng)計信息的字典。

此函數(shù)的返回值是統(tǒng)計字典，每個字典都是非負整數(shù)。

核心統(tǒng)計數(shù)據(jù)：

"allocated.{all,large_pool,small_pool}.{current,peak,allocated,freed}"：內(nèi)存分配器接收到的分配請求數(shù)。
"allocated_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}"：分配的內(nèi)存量。
"segment.{all,large_pool,small_pool}.{current,peak,allocated,freed}"：來自cudaMalloc()的保留段數(shù)。
"reserved_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}"：保留的內(nèi)存量。
"active.{all,large_pool,small_pool}.{current,peak,allocated,freed}"：活動存儲塊的數(shù)量。
"active_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}"：活動內(nèi)存量。
"inactive_split.{all,large_pool,small_pool}.{current,peak,allocated,freed}"：非活動，不可釋放的存儲塊的數(shù)量。
"inactive_split_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}"：非活動，不可釋放的內(nèi)存量。

對于這些核心統(tǒng)計信息，值細分如下。

泳池類型：

all：所有內(nèi)存池的組合統(tǒng)計信息。
large_pool：大型分配池的統(tǒng)計信息(截至 2019 年 10 月，>大小= 1MB 分配）。
small_pool：小型分配池的統(tǒng)計信息(截至 2019 年 10 月，<大小為 1MB 分配）。

指標類型：

current：此度量標準的當前值。
peak：此指標的最大值。
allocated：此指標的歷史總數(shù)增長。
freed：此指標的歷史總數(shù)下降。

除了核心統(tǒng)計信息之外，我們還提供了一些簡單的事件計數(shù)器：

"num_alloc_retries"：導致高速緩存刷新并重試的cudaMalloc調(diào)用失敗的次數(shù)。
"num_ooms"：拋出的內(nèi)存不足錯誤數(shù)。

Parameters

設備 (torch設備 或 python：int ，可選 )–所選設備。如果 device 為None(默認值），則返回由 current_device() 給定的當前設備的統(tǒng)計信息。

Note

有關(guān) GPU 內(nèi)存管理的更多詳細信息，請參見內(nèi)存管理。

torch.cuda.memory_summary(device=None, abbreviated=False)?

返回給定設備的當前內(nèi)存分配器統(tǒng)計信息的可讀記錄。

這對于在訓練期間或處理內(nèi)存不足異常時定期顯示很有用。

Parameters

設備 (torch設備 或 python：int ，可選 )–所選設備。如果 device 為None(默認值），則返回由 current_device() 給定的當前設備的打印輸出。
縮寫為 (bool ， 可選）–是否返回縮寫摘要(默認值：False）。

Note

See Memory management for more details about GPU memory management.

torch.cuda.memory_snapshot()?

返回所有設備上 CUDA 內(nèi)存分配器狀態(tài)的快照。

解釋此函數(shù)的輸出需要熟悉內(nèi)存分配器內(nèi)部。

Note

See Memory management for more details about GPU memory management.

torch.cuda.memory_allocated(device=None)?

返回給定設備的張量占用的當前 GPU 內(nèi)存(以字節(jié)為單位）。

Parameters

設備 (torch設備 或 python：int ， 可選 )–所選設備。如果 device 為None(默認值），則返回由 current_device() 給定的當前設備的統(tǒng)計信息。

Note

這可能少于 <cite>nvidia-smi</cite> 中顯示的數(shù)量，因為某些未使用的內(nèi)存可以由緩存分配器保存，并且某些上下文需要在 GPU 上創(chuàng)建。有關(guān) GPU 內(nèi)存管理的更多詳細信息，請參見內(nèi)存管理。

torch.cuda.max_memory_allocated(device=None)?

返回給定設備的張量占用的最大 GPU 內(nèi)存(以字節(jié)為單位）。

默認情況下，這將返回自此程序開始以來的峰值分配內(nèi)存。 reset_peak_stats()可用于重置跟蹤該指標的起點。例如，這兩個功能可以測量訓練循環(huán)中每個迭代的峰值分配內(nèi)存使用量。

Parameters

device (torch.device or python:int__, optional) – selected device. Returns statistic for the current device, given by current_device(), if device is None (default).

Note

See Memory management for more details about GPU memory management.

torch.cuda.reset_max_memory_allocated(device=None)?

重置用于跟蹤給定設備的張量占用的最大 GPU 內(nèi)存的起點。

有關(guān)詳細信息，請參見 max_memory_allocated() 。

Parameters

device (torch.device or python:int__, optional) – selected device. Returns statistic for the current device, given by current_device(), if device is None (default).

Warning

現(xiàn)在，此函數(shù)調(diào)用reset_peak_memory_stats()，它將重置/ all /峰值內(nèi)存狀態(tài)。

Note

See Memory management for more details about GPU memory management.

torch.cuda.memory_reserved(device=None)?

返回給定設備由緩存分配器管理的當前 GPU 內(nèi)存，以字節(jié)為單位。

Parameters

device (torch.device or python:int__, optional) – selected device. Returns statistic for the current device, given by current_device(), if device is None (default).

Note

See Memory management for more details about GPU memory management.

torch.cuda.max_memory_reserved(device=None)?

返回給定設備的緩存分配器管理的最大 GPU 內(nèi)存(以字節(jié)為單位）。

默認情況下，這將返回自此程序開始以來的峰值緩存內(nèi)存。 reset_peak_stats()可用于重置跟蹤該指標的起點。例如，這兩個功能可以測量訓練循環(huán)中每次迭代的峰值緩存內(nèi)存量。

Parameters

device (torch.device or python:int__, optional) – selected device. Returns statistic for the current device, given by current_device(), if device is None (default).

Note

See Memory management for more details about GPU memory management.

torch.cuda.memory_cached(device=None)?

不推薦使用；參見 memory_reserved() 。

torch.cuda.max_memory_cached(device=None)?

不推薦使用；參見 max_memory_reserved() 。

torch.cuda.reset_max_memory_cached(device=None)?

重置跟蹤由給定設備的緩存分配器管理的最大 GPU 內(nèi)存的起點。

Parameters

device (torch.device or python:int__, optional) – selected device. Returns statistic for the current device, given by current_device(), if device is None (default).

Warning

This function now calls reset_peak_memory_stats(), which resets /all/ peak memory stats.

Note

See Memory management for more details about GPU memory management.

NVIDIA 工具擴展(NVTX）

torch.cuda.nvtx.mark(msg)?

描述在某個時刻發(fā)生的瞬時事件。

Parameters

msg (字符串）–與事件關(guān)聯(lián)的 ASCII 消息。

torch.cuda.nvtx.range_push(msg)?

將范圍推入嵌套范圍跨度的堆棧中。返回從零開始的范圍的深度。

Parameters

msg (字符串）–與范圍關(guān)聯(lián)的 ASCII 消息

torch.cuda.nvtx.range_pop()?

從嵌套范圍跨度堆棧中彈出范圍。返回結(jié)束范圍的從零開始的深度。

PyTorch torch.cuda

隨機數(shù)發(fā)生器

傳播集體

流和事件

內(nèi)存管理

NVIDIA 工具擴展(NVTX）

推薦文章

推薦教程

推薦課程