gpufilter
GPU-Efficient Recursive Filtering and Summed-Area Tables
GPU Computation functions

Functions

__global__ void gpufilter::alg4_stage1 (float2 *g_transp_pybar, float2 *g_transp_ezhat)
 Algorithm 4 stage 1.
__global__ void gpufilter::alg4_stage2_3 (float2 *g_transp_pybar, float2 *g_transp_ezhat)
 Algorithm 4 stage 2 and 3 (fusioned)
__global__ void gpufilter::alg4_stage4 (float *g_transp_out, float2 *g_transp_py, float2 *g_transp_ez, float2 *g_pubar, float2 *g_evhat, int out_stride)
 Algorithm 4 stage 4.
__global__ void gpufilter::alg4_stage5_6 (float2 *g_transp_pybar, float2 *g_transp_ezhat)
 Algorithm 4 stage 5 and 6 (fusioned)
__global__ void gpufilter::alg4_stage7 (float *g_out, float2 *g_transp_py, float2 *g_transp_ez, int out_stride)
 Algorithm 4 stage 7.
__global__ void gpufilter::alg5_stage1 (float *g_transp_pybar, float *g_transp_ezhat, float *g_ptucheck, float *g_etvtilde)
 Algorithm 5 stage 1.
__global__ void gpufilter::alg5_stage2_3 (float *g_transp_pybar, float *g_transp_ezhat)
 Algorithm 5 stage 2 and 3 (fusioned)
__global__ void gpufilter::alg5_stage4_5 (float *g_ptucheck, float *g_etvtilde, const float *g_transp_py, const float *g_transp_ez)
 Algorithm 5 stage 4 and 5 (fusioned)
__global__ void gpufilter::alg5_stage6 (float *g_out, const float *g_transp_py, const float *g_transp_ez, const float *g_ptu, const float *g_etv)
 Algorithm 5 stage 6.
__global__ void gpufilter::algSAT_stage1 (const float *g_in, float *g_ybar, float *g_vhat)
 Algorithm SAT stage 1.
__global__ void gpufilter::algSAT_stage2 (float *g_ybar, float *g_ysum)
 Algorithm SAT stage 2.
__global__ void gpufilter::algSAT_stage3 (const float *g_ysum, float *g_vhat)
 Algorithm SAT stage 3.
__global__ void gpufilter::algSAT_stage4 (float *g_inout, const float *g_y, const float *g_v)
 Algorithm SAT stage 4.
__global__ void gpufilter::algSAT_stage4 (float *g_out, const float *g_in, const float *g_y, const float *g_v)
 Algorithm SAT stage 4 (not-in-place computation)

Function Documentation

__global__ void gpufilter::alg4_stage1 ( float2 *  g_transp_pybar,
float2 *  g_transp_ezhat 
)

Algorithm 4 stage 1.

This function computes the algorithm stage 4.1 following:

  • In parallel for all $m$ and $n$, compute and store the $P_{m,n}(\bar{Y})$ and $E_{m,n}(\hat{Z})$.
Note:
The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.
See also:
[Nehab:2011] cited in alg5()
Parameters:
[out]g_transp_pybarAll $P_{m,n}(\bar{Y})$
[out]g_transp_ezhatAll $E_{m,n}(\hat{Z})$
__global__ void gpufilter::alg4_stage2_3 ( float2 *  g_transp_pybar,
float2 *  g_transp_ezhat 
)

Algorithm 4 stage 2 and 3 (fusioned)

This function computes the algorithm stages 4.2 and 4.3 following:

  • Sequentially for each $m$, but in parallel for each $n$, compute and store the $P_{m,n}(Y)$ and using the previously computed $P_{m,n}(\bar{Y})$.
  • Sequentially for each $m$, but in parallel for each $n$, compute and store the $E_{m,n}(Z)$ using the previously computed $P_{m-1,n}(Y)$ and $E_{m,n}(\hat{Z})$.
Note:
The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.
See also:
[Nehab:2011] cited in alg5()
Parameters:
[in,out]g_transp_pybarAll $P_{m,n}(\bar{Y})$ fixed to $P_{m,n}(Y)$
[in,out]g_transp_ezhatAll $E_{m,n}(\hat{Z})$ fixed to $E_{m,n}(Z)$
__global__ void gpufilter::alg4_stage4 ( float *  g_transp_out,
float2 *  g_transp_py,
float2 *  g_transp_ez,
float2 *  g_pubar,
float2 *  g_evhat,
int  out_stride 
)

Algorithm 4 stage 4.

This function computes the algorithm stage 4.4 following:

  • In parallel for all $m$ and $n$, compute $B_{m,n}(Y)$ using the previously computed $P_{m-1,n}(Y)$. Then compute and store the $B_{m,n}(Z)$ using the previously computed $E_{m+1,n}(Z)$. Finally, compute and store both $P^T_{m,n}(\bar{U})$ and $E^T_{m,n}(\hat{V})$.
Note:
The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.
See also:
[Nehab:2011] cited in alg5()
Parameters:
[out]g_transp_outThe output 2D image transposed
[in]g_transp_pyAll $P_{m,n}(Y)$
[in]g_transp_ezAll $E_{m,n}(Z)$
[out]g_pubarAll $P^T_{m,n}(\bar{U})$
[out]g_evhatAll $E^T_{m,n}(\hat{V})$
[in]out_strideTransposed output image stride
__global__ void gpufilter::alg4_stage5_6 ( float2 *  g_transp_pybar,
float2 *  g_transp_ezhat 
)

Algorithm 4 stage 5 and 6 (fusioned)

This function computes algorithm stages 4.5 and 4.6 following:

  • Sequentially for each $n$, but in parallel for each $m$, compute and store the $P^T_{m,n}(U)$ from $P^T_{m,n}(\bar{U})$.
  • Sequentially for each $n$, but in parallel for each $m$, compute and store each $E^T_{m,n}(V)$ using the previously computed $P^T_{m,n-1}(U)$ and $E^T_{m,n}(\hat{V})$.
Note:
This function is exactly the same as alg4_stage2_3() given the intrinsic similarities of the kernels.
The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.
See also:
[Nehab:2011] cited in alg5()
Parameters:
[in,out]g_transp_pybarAll $P_{m,n}(\bar{Y})$ fixed to $P_{m,n}(Y)$
[in,out]g_transp_ezhatAll $E_{m,n}(\hat{Z})$ fixed to $E_{m,n}(Z)$
__global__ void gpufilter::alg4_stage7 ( float *  g_out,
float2 *  g_transp_py,
float2 *  g_transp_ez,
int  out_stride 
)

Algorithm 4 stage 7.

This function computes the algorithm stage 4.7 following:

  • In parallel for all $m$ and $n$, compute $B_{m,n}(V)$ using the previously computed $P^T_{m,n-1}(V)$ and $B_{m,n}(Z)$. Then compute and store the $B_{m,n}(U)$ using the previously computed $E_{m,n+1}(U)$.
Note:
This function is exactly the same as alg4_stage4() given the intrinsic similarities of the kernels.
The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.
See also:
[Nehab:2011] cited in alg5()
Parameters:
[out]g_outThe output 2D image
[in]g_transp_pyAll $P_{m,n}(Y)$
[in]g_transp_ezAll $E_{m,n}(Z)$
[in]out_strideOutput image stride
__global__ void gpufilter::alg5_stage1 ( float *  g_transp_pybar,
float *  g_transp_ezhat,
float *  g_ptucheck,
float *  g_etvtilde 
)

Algorithm 5 stage 1.

This function computes the algorithm stage 5.1 following:

  • In parallel for all $m$ and $n$, compute and store each $P_{m,n}(\bar{Y})$, $E_{m,n}(\hat{Z})$, $P^T_{m,n}(\check{U})$, and $E^T_{m,n}(\tilde{V})$.
Note:
The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.
See also:
[Nehab:2011] cited in alg5()
Parameters:
[out]g_transp_pybarAll $P_{m,n}(\bar{Y})$
[out]g_transp_ezhatAll $E_{m,n}(\hat{Z})$
[out]g_ptucheckAll $P^T_{m,n}(\check{U})$
[out]g_etvtildeAll $E^T_{m,n}(\tilde{V})$
__global__ void gpufilter::alg5_stage2_3 ( float *  g_transp_pybar,
float *  g_transp_ezhat 
)

Algorithm 5 stage 2 and 3 (fusioned)

This function computes the algorithm stages 5.2 and 5.3 following:

  • In parallel for all $n$, sequentially for each $m$, compute and store the $P_{m,n}(Y)$ and using the previously computed $P_{m,n}(\bar{Y})$.

with simple kernel fusioned (going thorough global memory):

  • In parallel for all $n$, sequentially for each $m$, compute and store $E_{m,n}(Z)$ using the previously computed $P_{m-1,n}(Y)$ and $E_{m+1,n}(\hat{Z})$.
Note:
The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.
See also:
[Nehab:2011] cited in alg5()
Parameters:
[in,out]g_transp_pybarAll $P_{m,n}(\bar{Y})$ fixed to $P_{m,n}(Y)$
[in,out]g_transp_ezhatAll $E_{m,n}(\hat{Z})$ fixed to $E_{m,n}(Z)$
__global__ void gpufilter::alg5_stage4_5 ( float *  g_ptucheck,
float *  g_etvtilde,
const float *  g_transp_py,
const float *  g_transp_ez 
)

Algorithm 5 stage 4 and 5 (fusioned)

This function computes the algorithm stages 5.4 and 5.5 following:

  • In parallel for all $m$, sequentially for each $n$, compute and store $P^T_{m,n}(U)$ and using the previously computed $P^T_{m,n}(\check{U})$, $P_{m-1,n}(Y)$, and $E_{m+1,n}(Z)$.

with simple kernel fusioned (going thorough global memory):

  • In parallel for all $m$, sequentially for each $n$, compute and store $E^T_{m,n}(V)$ and using the previously computed $E^T_{m,n}(\tilde{V})$, $P^T_{m,n-1}(U)$, $P_{m-1,n}(Y)$, and $E_{m+1,n}(Z)$.
Note:
The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.
See also:
[Nehab:2011] cited in alg5()
Parameters:
[in,out]g_ptucheckAll $P^T_{m,n}(\check{U})$ fixed to $P^T_{m,n}(\bar{U})$
[in,out]g_etvtildeAll $E^T_{m,n}(\tilde{V})$ fixed to $E^T_{m,n}(\check{V})$
[in]g_transp_pyAll $P_{m,n}(Y)$
[in]g_transp_ezAll $E_{m,n}(Z)$
__global__ void gpufilter::alg5_stage6 ( float *  g_out,
const float *  g_transp_py,
const float *  g_transp_ez,
const float *  g_ptu,
const float *  g_etv 
)

Algorithm 5 stage 6.

This function computes the algorithm stage 5.6 following:

  • In parallel for all $m$ and $n$, compute one after the other $B_{m,n}(Y)$, $B_{m,n}(Z)$, $B_{m,n}(U)$, and $B_{m,n}(V)$ and using the previously computed $P_{m-1,n}(Y)$, $E_{m+1,n}(Z)$, $P^T_{m,n-1}(U)$, and $E^T_{m,n+1}(V)$. Store $B_{m,n}(V)$.
Note:
The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.
See also:
[Nehab:2011] cited in alg5()
Parameters:
[out]g_outThe output 2D image
[in]g_transp_pyAll $P_{m,n}(Y)$
[in]g_transp_ezAll $E_{m,n}(Z)$
[in]g_ptuAll $P^T_{m,n}(U)$
[in]g_etvAll $E^T_{m,n}(V)$
__global__ void gpufilter::algSAT_stage1 ( const float *  g_in,
float *  g_ybar,
float *  g_vhat 
)

Algorithm SAT stage 1.

This function computes the algorithm stage S.1 following:

In parallel for all $m$ and $n$, compute and store the $P_{m,n}(\bar{Y})$ and $P^T_{m,n}(\hat{V})$.

Note:
The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.
See also:
[Nehab:2011] cited in alg5() and figure in algSAT()
Parameters:
[in]g_inInput image
[out]g_ybarAll $P_{m,n}(\bar{Y})$
[out]g_vhatAll $P^T_{m,n}(\hat{V})$
__global__ void gpufilter::algSAT_stage2 ( float *  g_ybar,
float *  g_ysum 
)

Algorithm SAT stage 2.

This function computes the algorithm stage S.2 following:

Sequentially for each $m$, but in parallel for each $n$, compute and store the $P_{m,n}(Y)$ and using the previously computed $P_{m,n}(\bar{Y})$. Compute and store $s(P_{m,n}(Y))$.

Note:
The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.
See also:
[Nehab:2011] cited in alg5() and figure in algSAT()
Parameters:
[in,out]g_ybarAll $P_{m,n}(\bar{Y})$ fixed to $P_{m,n}(Y)$
[out]g_ysumAll $s(P_{m,n}(Y))$
__global__ void gpufilter::algSAT_stage3 ( const float *  g_ysum,
float *  g_vhat 
)

Algorithm SAT stage 3.

This function computes the algorithm stage S.3 following:

Sequentially for each $n$, but in parallel for each $m$, compute and store the $P^T{m,n}(V)$ using the previously computed $P_{m-1,n}(Y)$, $P^T_{m,n}(\hat{V})$ and $s(P_{m,n}(Y))$.

Note:
The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.
See also:
[Nehab:2011] cited in alg5() and figure in algSAT()
Parameters:
[in]g_ysumAll $s(P_{m,n}(Y))$
[in,out]g_vhatAll $P^T_{m,n}(\hat{V})$ fixed to $P^T_{m,n}(V)$
__global__ void gpufilter::algSAT_stage4 ( float *  g_inout,
const float *  g_y,
const float *  g_v 
)

Algorithm SAT stage 4.

This function computes the algorithm stage S.4 following:

In parallel for all $m$ and $n$, compute $B_{m,n}(Y)$ then compute and store $B_{m,n}(V)$ and using the previously computed $P_{m,n}(Y)$ and $P^T_{m,n}(V)$.

Note:
The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.
See also:
[Nehab:2011] cited in alg5() and figure in algSAT()
Parameters:
[in,out]g_inoutThe input and output image
[in]g_yAll $P_{m,n}(Y)$
[in]g_vAll $P^T_{m,n}(V)$
__global__ void gpufilter::algSAT_stage4 ( float *  g_out,
const float *  g_in,
const float *  g_y,
const float *  g_v 
)

Algorithm SAT stage 4 (not-in-place computation)

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Note:
The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.
See also:
[Nehab:2011] cited in alg5() and figure in algSAT()
Parameters:
[out]g_outThe output image
[in]g_inThe input image
[in]g_yAll $P_{m,n}(Y)$
[in]g_vAll $P^T_{m,n}(V)$