Conversation
|
|
||
| itr=4, # number of OSEM iterations | ||
| fwhm=0., # Gaussian Smoothing FWHM | ||
| fwhm_rm=0., # Resolution Modelling |
There was a problem hiding this comment.
| fwhm_rm=0., # Resolution Modelling | |
| fwhm_rm=2.5, # Resolution Modelling |
might be better to be 2.5mm by default.
There was a problem hiding this comment.
The default I would prefer to be raw with no added modelling
There was a problem hiding this comment.
but, when switching on rm, then this can be as default, no problem
There was a problem hiding this comment.
ok leaving as default=0 for now. "switching on RM" would be the same as specifying a non-zero value. We could also change this later to accepting an entire kernel rather than just a Gaussian FWHM.
| #define RECON_H | ||
|
|
||
| /* separable convolution */ | ||
| #define KERNEL_RADIUS 5 |
There was a problem hiding this comment.
radius 5 is wide enough that the edges are at floating point machine precision at 4.5 mm FWHM (<1 sigma)
| /// z: how many Z-slices to add | ||
| __global__ void pad(float *dst, float *src, const int z) { | ||
| int i = threadIdx.x + blockDim.x * blockIdx.x; | ||
| if (i >= SZ_IMX) | ||
| return; | ||
| int j = threadIdx.y + blockDim.y * blockIdx.y; | ||
| if (j >= SZ_IMY) | ||
| return; | ||
| src += i * SZ_IMY * SZ_IMZ + j * SZ_IMZ; | ||
| dst += i * SZ_IMY * (SZ_IMZ + z) + j * (SZ_IMZ + z); | ||
| for (int k = 0; k < SZ_IMZ; ++k) | ||
| dst[k] = src[k]; | ||
| } | ||
| void d_pad(float *dst, float *src, const int z = COLUMNS_BLOCKDIM_X - SZ_IMZ % COLUMNS_BLOCKDIM_X) { | ||
| HANDLE_ERROR(cudaMemset(dst, 0, SZ_IMX * SZ_IMY * (SZ_IMZ + z) * sizeof(float))); | ||
| dim3 BpG((SZ_IMX + NTHRDS / 32 - 1) / (NTHRDS / 32), (SZ_IMY + 31) / 32); | ||
| dim3 TpB(NTHRDS / 32, 32); | ||
| pad<<<BpG, TpB>>>(dst, src, z); | ||
| } |
There was a problem hiding this comment.
NB:
- This is required as z-padding occurs at regular intervals in
dst[]; it's not as simple as padding the end of the array. Look at this array pointer offset:
dst += i * SZ_IMY * (SZ_IMZ + z) + j * (SZ_IMZ + z);- alternatively using (vis pre 6df9dfd)
for (size_t i = 0; i < SZ_IMX; ++i)
for (size_t j = 0; j < SZ_IMY; ++j)
cudaMemcpy(&dst[i * SZ_IMY * (SZ_IMZ + z) + j * (SZ_IMZ + z)],
&src[i * SZ_IMY * SZ_IMZ + j * SZ_IMZ],
SZ_IMZ * sizeof(float), cudaMemcpyDeviceToDevice);is very slow. It's quicker to use a CUDA kernel rather than calling cudaMemcpy 320x320 times.
- alternatively, "fixing"
d_convto work onz=127is too painful for me at the moment.
fwhm_rm : floatargumentSIGMA_RMtoresources.py&Cntfwhm_rm=0doesn't do any filteringfwhm_rm=2.5mm by default? Better than nothing for Siddon projectors (see images below)mmrrecmmrsiminfImages
Gaussian FWHM [mm]: Left: 2.5, Right: 4.5
Using 3D filter 969fc2a
Using separable filter 6df9dfd (see diff 969fc2a...6df9dfd)
Profiling (extra time is mostly padding & unpadding Z 127<->128)
Comparison to raw Siddon a19133c (
fwhm_rm in [0, 2.5, 4.5])