Functions
__global__ void	gpufilter::alg4_stage1 (float2 g_transp_pybar, float2 g_transp_ezhat)
	Algorithm 4 stage 1.
__global__ void	gpufilter::alg4_stage2_3 (float2 g_transp_pybar, float2 g_transp_ezhat)
	Algorithm 4 stage 2 and 3 (fusioned)
__global__ void	gpufilter::alg4_stage4 (float g_transp_out, float2 g_transp_py, float2 g_transp_ez, float2 g_pubar, float2 *g_evhat, int out_stride)
	Algorithm 4 stage 4.
__global__ void	gpufilter::alg4_stage5_6 (float2 g_transp_pybar, float2 g_transp_ezhat)
	Algorithm 4 stage 5 and 6 (fusioned)
__global__ void	gpufilter::alg4_stage7 (float g_out, float2 g_transp_py, float2 *g_transp_ez, int out_stride)
	Algorithm 4 stage 7.
__global__ void	gpufilter::alg5_stage1 (float g_transp_pybar, float g_transp_ezhat, float g_ptucheck, float g_etvtilde)
	Algorithm 5 stage 1.
__global__ void	gpufilter::alg5_stage2_3 (float g_transp_pybar, float g_transp_ezhat)
	Algorithm 5 stage 2 and 3 (fusioned)
__global__ void	gpufilter::alg5_stage4_5 (float g_ptucheck, float g_etvtilde, const float g_transp_py, const float g_transp_ez)
	Algorithm 5 stage 4 and 5 (fusioned)
__global__ void	gpufilter::alg5_stage6 (float g_out, const float g_transp_py, const float g_transp_ez, const float g_ptu, const float *g_etv)
	Algorithm 5 stage 6.
__global__ void	gpufilter::algSAT_stage1 (const float g_in, float g_ybar, float *g_vhat)
	Algorithm SAT stage 1.
__global__ void	gpufilter::algSAT_stage2 (float g_ybar, float g_ysum)
	Algorithm SAT stage 2.
__global__ void	gpufilter::algSAT_stage3 (const float g_ysum, float g_vhat)
	Algorithm SAT stage 3.
__global__ void	gpufilter::algSAT_stage4 (float g_inout, const float g_y, const float *g_v)
	Algorithm SAT stage 4.
__global__ void	gpufilter::algSAT_stage4 (float g_out, const float g_in, const float g_y, const float g_v)
	Algorithm SAT stage 4 (not-in-place computation)

Function Documentation

__global__ void gpufilter::alg4_stage1	(	float2 *	g_transp_pybar,
		float2 *	g_transp_ezhat
	)

Algorithm 4 stage 1.

This function computes the algorithm stage 4.1 following:

In parallel for all and , compute and store the $P_{m,n}(\bar{Y})$ and $E_{m,n}(\hat{Z})$ .

Note:: The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.

See also:: [Nehab:2011] cited in alg5()

Parameters:

[out]	g_transp_pybar	All $P_{m,n}(\bar{Y})$
[out]	g_transp_ezhat	All $E_{m,n}(\hat{Z})$

__global__ void gpufilter::alg4_stage2_3	(	float2 *	g_transp_pybar,
		float2 *	g_transp_ezhat
	)

Algorithm 4 stage 2 and 3 (fusioned)

This function computes the algorithm stages 4.2 and 4.3 following:

Sequentially for each , but in parallel for each , compute and store the $P_{m,n}(Y)$ and using the previously computed $P_{m,n}(\bar{Y})$ .

Sequentially for each , but in parallel for each , compute and store the $E_{m,n}(Z)$ using the previously computed $P_{m-1,n}(Y)$ and $E_{m,n}(\hat{Z})$ .

Note:: The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.

See also:: [Nehab:2011] cited in alg5()

Parameters:

[in,out]	g_transp_pybar	All $P_{m,n}(\bar{Y})$ fixed to $P_{m,n}(Y)$
[in,out]	g_transp_ezhat	All $E_{m,n}(\hat{Z})$ fixed to $E_{m,n}(Z)$

__global__ void gpufilter::alg4_stage4	(	float *	g_transp_out,
		float2 *	g_transp_py,
		float2 *	g_transp_ez,
		float2 *	g_pubar,
		float2 *	g_evhat,
		int	out_stride
	)

Algorithm 4 stage 4.

This function computes the algorithm stage 4.4 following:

In parallel for all and , compute $B_{m,n}(Y)$ using the previously computed $P_{m-1,n}(Y)$ . Then compute and store the $B_{m,n}(Z)$ using the previously computed $E_{m+1,n}(Z)$ . Finally, compute and store both $P^T_{m,n}(\bar{U})$ and $E^T_{m,n}(\hat{V})$ .

Note:: The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.

See also:: [Nehab:2011] cited in alg5()

Parameters:

[out]	g_transp_out	The output 2D image transposed
[in]	g_transp_py	All $P_{m,n}(Y)$
[in]	g_transp_ez	All $E_{m,n}(Z)$
[out]	g_pubar	All $P^T_{m,n}(\bar{U})$
[out]	g_evhat	All $E^T_{m,n}(\hat{V})$
[in]	out_stride	Transposed output image stride

__global__ void gpufilter::alg4_stage5_6	(	float2 *	g_transp_pybar,
		float2 *	g_transp_ezhat
	)

Algorithm 4 stage 5 and 6 (fusioned)

This function computes algorithm stages 4.5 and 4.6 following:

Sequentially for each , but in parallel for each , compute and store the $P^T_{m,n}(U)$ from $P^T_{m,n}(\bar{U})$ .

Sequentially for each , but in parallel for each , compute and store each $E^T_{m,n}(V)$ using the previously computed $P^T_{m,n-1}(U)$ and $E^T_{m,n}(\hat{V})$ .

Note:: This function is exactly the same as alg4_stage2_3() given the intrinsic similarities of the kernels.; The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.

See also:: [Nehab:2011] cited in alg5()

Parameters:

[in,out]	g_transp_pybar	All $P_{m,n}(\bar{Y})$ fixed to $P_{m,n}(Y)$
[in,out]	g_transp_ezhat	All $E_{m,n}(\hat{Z})$ fixed to $E_{m,n}(Z)$

__global__ void gpufilter::alg4_stage7	(	float *	g_out,
		float2 *	g_transp_py,
		float2 *	g_transp_ez,
		int	out_stride
	)

Algorithm 4 stage 7.

This function computes the algorithm stage 4.7 following:

In parallel for all and , compute $B_{m,n}(V)$ using the previously computed $P^T_{m,n-1}(V)$ and $B_{m,n}(Z)$ . Then compute and store the $B_{m,n}(U)$ using the previously computed $E_{m,n+1}(U)$ .

Note:: This function is exactly the same as alg4_stage4() given the intrinsic similarities of the kernels.; The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.

See also:: [Nehab:2011] cited in alg5()

Parameters:

[out]	g_out	The output 2D image
[in]	g_transp_py	All $P_{m,n}(Y)$
[in]	g_transp_ez	All $E_{m,n}(Z)$
[in]	out_stride	Output image stride

__global__ void gpufilter::alg5_stage1	(	float *	g_transp_pybar,
		float *	g_transp_ezhat,
		float *	g_ptucheck,
		float *	g_etvtilde
	)

Algorithm 5 stage 1.

This function computes the algorithm stage 5.1 following:

In parallel for all and , compute and store each $P_{m,n}(\bar{Y})$ , $E_{m,n}(\hat{Z})$ , $P^T_{m,n}(\check{U})$ , and $E^T_{m,n}(\tilde{V})$ .

Note:: The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.

See also:: [Nehab:2011] cited in alg5()

Parameters:

[out]	g_transp_pybar	All $P_{m,n}(\bar{Y})$
[out]	g_transp_ezhat	All $E_{m,n}(\hat{Z})$
[out]	g_ptucheck	All $P^T_{m,n}(\check{U})$
[out]	g_etvtilde	All $E^T_{m,n}(\tilde{V})$

__global__ void gpufilter::alg5_stage2_3	(	float *	g_transp_pybar,
		float *	g_transp_ezhat
	)

Algorithm 5 stage 2 and 3 (fusioned)

This function computes the algorithm stages 5.2 and 5.3 following:

In parallel for all , sequentially for each , compute and store the $P_{m,n}(Y)$ and using the previously computed $P_{m,n}(\bar{Y})$ .

with simple kernel fusioned (going thorough global memory):

In parallel for all , sequentially for each , compute and store $E_{m,n}(Z)$ using the previously computed $P_{m-1,n}(Y)$ and $E_{m+1,n}(\hat{Z})$ .

Note:: The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.

See also:: [Nehab:2011] cited in alg5()

Parameters:

[in,out]	g_transp_pybar	All $P_{m,n}(\bar{Y})$ fixed to $P_{m,n}(Y)$
[in,out]	g_transp_ezhat	All $E_{m,n}(\hat{Z})$ fixed to $E_{m,n}(Z)$

__global__ void gpufilter::alg5_stage4_5	(	float *	g_ptucheck,
		float *	g_etvtilde,
		const float *	g_transp_py,
		const float *	g_transp_ez
	)

Algorithm 5 stage 4 and 5 (fusioned)

This function computes the algorithm stages 5.4 and 5.5 following:

In parallel for all , sequentially for each , compute and store $P^T_{m,n}(U)$ and using the previously computed $P^T_{m,n}(\check{U})$ , $P_{m-1,n}(Y)$ , and $E_{m+1,n}(Z)$ .

with simple kernel fusioned (going thorough global memory):

In parallel for all , sequentially for each , compute and store $E^T_{m,n}(V)$ and using the previously computed $E^T_{m,n}(\tilde{V})$ , $P^T_{m,n-1}(U)$ , $P_{m-1,n}(Y)$ , and $E_{m+1,n}(Z)$ .

Note:: The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.

See also:: [Nehab:2011] cited in alg5()

Parameters:

[in,out]	g_ptucheck	All $P^T_{m,n}(\check{U})$ fixed to $P^T_{m,n}(\bar{U})$
[in,out]	g_etvtilde	All $E^T_{m,n}(\tilde{V})$ fixed to $E^T_{m,n}(\check{V})$
[in]	g_transp_py	All $P_{m,n}(Y)$
[in]	g_transp_ez	All $E_{m,n}(Z)$

__global__ void gpufilter::alg5_stage6	(	float *	g_out,
		const float *	g_transp_py,
		const float *	g_transp_ez,
		const float *	g_ptu,
		const float *	g_etv
	)

Algorithm 5 stage 6.

This function computes the algorithm stage 5.6 following:

In parallel for all and , compute one after the other $B_{m,n}(Y)$ , $B_{m,n}(Z)$ , $B_{m,n}(U)$ , and $B_{m,n}(V)$ and using the previously computed $P_{m-1,n}(Y)$ , $E_{m+1,n}(Z)$ , $P^T_{m,n-1}(U)$ , and $E^T_{m,n+1}(V)$ . Store $B_{m,n}(V)$ .

Note:: The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.

See also:: [Nehab:2011] cited in alg5()

Parameters:

[out]	g_out	The output 2D image
[in]	g_transp_py	All $P_{m,n}(Y)$
[in]	g_transp_ez	All $E_{m,n}(Z)$
[in]	g_ptu	All $P^T_{m,n}(U)$
[in]	g_etv	All $E^T_{m,n}(V)$

__global__ void gpufilter::algSAT_stage1	(	const float *	g_in,
		float *	g_ybar,
		float *	g_vhat
	)

Algorithm SAT stage 1.

This function computes the algorithm stage S.1 following:

In parallel for all $m$ and $n$ , compute and store the $P_{m,n}(\bar{Y})$ and $P^T_{m,n}(\hat{V})$ .

Note:: The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.

See also:: [Nehab:2011] cited in alg5() and figure in algSAT()

Parameters:

[in]	g_in	Input image
[out]	g_ybar	All $P_{m,n}(\bar{Y})$
[out]	g_vhat	All $P^T_{m,n}(\hat{V})$

__global__ void gpufilter::algSAT_stage2	(	float *	g_ybar,
		float *	g_ysum
	)

Algorithm SAT stage 2.

This function computes the algorithm stage S.2 following:

Sequentially for each $m$ , but in parallel for each $n$ , compute and store the $P_{m,n}(Y)$ and using the previously computed $P_{m,n}(\bar{Y})$ . Compute and store $s(P_{m,n}(Y))$ .

Note:: The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.

See also:: [Nehab:2011] cited in alg5() and figure in algSAT()

Parameters:

[in,out]	g_ybar	All $P_{m,n}(\bar{Y})$ fixed to $P_{m,n}(Y)$
[out]	g_ysum	All $s(P_{m,n}(Y))$

__global__ void gpufilter::algSAT_stage3	(	const float *	g_ysum,
		float *	g_vhat
	)

Algorithm SAT stage 3.

This function computes the algorithm stage S.3 following:

Sequentially for each $n$ , but in parallel for each $m$ , compute and store the $P^T{m,n}(V)$ using the previously computed $P_{m-1,n}(Y)$ , $P^T_{m,n}(\hat{V})$ and $s(P_{m,n}(Y))$ .

Note:: The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.

See also:: [Nehab:2011] cited in alg5() and figure in algSAT()

Parameters:

[in]	g_ysum	All $s(P_{m,n}(Y))$
[in,out]	g_vhat	All $P^T_{m,n}(\hat{V})$ fixed to $P^T_{m,n}(V)$

__global__ void gpufilter::algSAT_stage4	(	float *	g_inout,
		const float *	g_y,
		const float *	g_v
	)

Algorithm SAT stage 4.

This function computes the algorithm stage S.4 following:

In parallel for all $m$ and $n$ , compute $B_{m,n}(Y)$ then compute and store $B_{m,n}(V)$ and using the previously computed $P_{m,n}(Y)$ and $P^T_{m,n}(V)$ .

Note:: The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.

See also:: [Nehab:2011] cited in alg5() and figure in algSAT()

Parameters:

[in,out]	g_inout	The input and output image
[in]	g_y	All $P_{m,n}(Y)$
[in]	g_v	All $P^T_{m,n}(V)$

__global__ void gpufilter::algSAT_stage4	(	float *	g_out,
		const float *	g_in,
		const float *	g_y,
		const float *	g_v
	)

Algorithm SAT stage 4 (not-in-place computation)

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Note:: The CUDA kernel functions (as this one) have many idiosyncrasies and should not be used lightly.

See also:: [Nehab:2011] cited in alg5() and figure in algSAT()

Parameters:

[out]	g_out	The output image
[in]	g_in	The input image
[in]	g_y	All $P_{m,n}(Y)$
[in]	g_v	All $P^T_{m,n}(V)$

Functions

Function Documentation