推力(thrust)是一個(gè)非常強(qiáng)大的庫,用于各種各樣的cuda加速算法。然而推力被設(shè)計(jì)為與向量而不是投影矩陣一起工作。以下教程將討論將cv :: cuda :: GpuMat包裝到可用于推力算法的推力迭代器中。
本教程將向您展示如何:
將GpuMat包裝成推力迭代器
用隨機(jī)數(shù)字填充GpuMat
將GpuMat的列排序到位
將大于0的值復(fù)制到新的gpu矩陣
Use streams with thrust
將GpuMat包裝到推力迭代器中
以下代碼將為GpuMat生成一個(gè)迭代器
/*
@Brief GpuMatBeginItr returns a thrust compatible iterator to the beginning of a GPU mat's memory.
@Param mat is the input matrix
@Param channel is the channel of the matrix that the iterator is accessing. If set to -1, the iterator will access every element in sequential order
*/
template<typename T>
thrust::permutation_iterator<thrust::device_ptr<T>, thrust::transform_iterator<step_functor<T>, thrust::counting_iterator<int>>> GpuMatBeginItr(cv::cuda::GpuMat mat, int channel = 0)
{
if (channel == -1)
{
mat = mat.reshape(1);
channel = 0;
}
CV_Assert(mat.depth() == cv::DataType<T>::depth);
CV_Assert(channel < mat.channels());
return thrust::make_permutation_iterator(thrust::device_pointer_cast(mat.ptr<T>(0) + channel),
thrust::make_transform_iterator(thrust::make_counting_iterator(0), step_functor<T>(mat.cols, mat.step / sizeof(T), mat.channels())));
}
/*
@Brief GpuMatEndItr returns a thrust compatible iterator to the end of a GPU mat's memory.
@Param mat is the input matrix
@Param channel is the channel of the matrix that the iterator is accessing. If set to -1, the iterator will access every element in sequential order
*/
template<typename T>
thrust::permutation_iterator<thrust::device_ptr<T>, thrust::transform_iterator<step_functor<T>, thrust::counting_iterator<int>>> GpuMatEndItr(cv::cuda::GpuMat mat, int channel = 0)
{
if (channel == -1)
{
mat = mat.reshape(1);
channel = 0;
}
CV_Assert(mat.depth() == cv::DataType<T>::depth);
CV_Assert(channel < mat.channels());
return thrust::make_permutation_iterator(thrust::device_pointer_cast(mat.ptr<T>(0) + channel),
thrust::make_transform_iterator(thrust::make_counting_iterator(mat.rows*mat.cols), step_functor<T>(mat.cols, mat.step / sizeof(T), mat.channels())));
}
{
cv::cuda::GpuMat d_data(1, 100, CV_32SC2);
// Thrust compatible begin and end iterators to channel 1 of this matrix
auto keyBegin = GpuMatBeginItr<int>(d_data, 1);
auto keyEnd = GpuMatEndItr<int>(d_data, 1);
// Thrust compatible begin and end iterators to channel 0 of this matrix
auto idxBegin = GpuMatBeginItr<int>(d_data, 0);
auto idxEnd = GpuMatEndItr<int>(d_data, 0);
// Fill the index channel with a sequence of numbers from 0 to 100
thrust::sequence(idxBegin, idxEnd);
// Fill the key channel with random numbers between 0 and 10. A counting iterator is used here to give an integer value for each location as an input to prg::operator()
thrust::transform(thrust::make_counting_iterator(0), thrust::make_counting_iterator(d_data.cols), keyBegin, prg(0, 10));
// Sort the key channel and index channel such that the keys and indecies stay together
thrust::sort_by_key(keyBegin, keyEnd, idxBegin);
cv::Mat h_idx(d_data);
}
使用streams時(shí),將值大于0的值復(fù)制到新的gpu矩陣
在這個(gè)例子中,我們將看到如何使用cv :: cuda :: Streams。不幸的是,這個(gè)具體例子使用必須將結(jié)果返回給CPU的功能,因此它不是Streams的最佳使用。
{
cv::cuda::GpuMat d_value(1, 100, CV_32F);
auto valueBegin = GpuMatBeginItr<float>(d_value);
auto valueEnd = GpuMatEndItr<float>(d_value);
cv::cuda::Stream stream;
//! [random_gen_stream]
// Same as the random generation code from before except now the transformation is being performed on a stream
thrust::transform(thrust::system::cuda::par.on(cv::cuda::StreamAccessor::getStream(stream)), thrust::make_counting_iterator(0), thrust::make_counting_iterator(d_value.cols), valueBegin, prg(-1, 1));
//! [random_gen_stream]
// Count the number of values we are going to copy
int count = thrust::count_if(thrust::system::cuda::par.on(cv::cuda::StreamAccessor::getStream(stream)), valueBegin, valueEnd, pred_greater<float>(0.0));
// Allocate a destination for copied values
cv::cuda::GpuMat d_valueGreater(1, count, CV_32F);
// Copy values that satisfy the predicate.
thrust::copy_if(thrust::system::cuda::par.on(cv::cuda::StreamAccessor::getStream(stream)), valueBegin, valueEnd, GpuMatBeginItr<float>(d_valueGreater), pred_greater<float>(0.0));
cv::Mat h_greater(d_valueGreater);
}
// Same as the random generation code from before except now the transformation is being performed on a stream
thrust::transform(thrust::system::cuda::par.on(cv::cuda::StreamAccessor::getStream(stream)), thrust::make_counting_iterator(0), thrust::make_counting_iterator(d_value.cols), valueBegin, prg(-1, 1));
請(qǐng)注意使用thrust :: system :: cuda :: par.on(...),這將創(chuàng)建一個(gè)用于在Streams上執(zhí)行推力代碼的執(zhí)行策略。在cuda工具包分發(fā)的推力版本中有一個(gè)錯(cuò)誤,從版本7.5開始,這還沒有被修正。這個(gè)錯(cuò)誤導(dǎo)致代碼不能在Streams上執(zhí)行。然而,可以通過使用git存儲(chǔ)庫中的最新版本的推力來修復(fù)該錯(cuò)誤。(http://github.com/thrust/thrust.git)接下來,我們將使用以下謂詞使用推力:: count_if來確定多少值大于0:
更多建議: