Skip to content

Add image-space resolution modelling using Gaussian PSF#25

Merged
pjmark merged 17 commits intodev2from
resmod
Oct 27, 2020
Merged

Add image-space resolution modelling using Gaussian PSF#25
pjmark merged 17 commits intodev2from
resmod

Conversation

@casperdcl
Copy link
Member

@casperdcl casperdcl commented Oct 15, 2020

  • add a new fwhm_rm : float argument
    • add SIGMA_RM to resources.py & Cnt
    • in future, this could also support a custom kernel
    • use a faster separable convolution
      • profiling results for 1 iteration: f_prj: 120ms, b_prj: 350ms, filter: 35x2=70ms, scatter: the enemy
    • make sure fwhm_rm=0 doesn't do any filtering
    • use fwhm_rm=2.5mm by default? Better than nothing for Siddon projectors (see images below)
  • update mmrrec
  • update mmrsim
  • prevent nearly-zero division causing inf

Images

Gaussian FWHM [mm]: Left: 2.5, Right: 4.5

Using 3D filter 969fc2a

image

Using separable filter 6df9dfd (see diff 969fc2a...6df9dfd)

image

Profiling (extra time is mostly padding & unpadding Z 127<->128)

image

Comparison to raw Siddon a19133c (fwhm_rm in [0, 2.5, 4.5])

image

@casperdcl casperdcl self-assigned this Oct 15, 2020

itr=4, # number of OSEM iterations
fwhm=0., # Gaussian Smoothing FWHM
fwhm_rm=0., # Resolution Modelling
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fwhm_rm=0., # Resolution Modelling
fwhm_rm=2.5, # Resolution Modelling

might be better to be 2.5mm by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default I would prefer to be raw with no added modelling

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but, when switching on rm, then this can be as default, no problem

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok leaving as default=0 for now. "switching on RM" would be the same as specifying a non-zero value. We could also change this later to accepting an entire kernel rather than just a Gaussian FWHM.

@casperdcl casperdcl marked this pull request as ready for review October 17, 2020 01:30
@casperdcl casperdcl requested a review from pjmark October 17, 2020 01:30
#define RECON_H

/* separable convolution */
#define KERNEL_RADIUS 5
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

radius 5 is wide enough that the edges are at floating point machine precision at 4.5 mm FWHM (<1 sigma)

Comment on lines +16 to +34
/// z: how many Z-slices to add
__global__ void pad(float *dst, float *src, const int z) {
int i = threadIdx.x + blockDim.x * blockIdx.x;
if (i >= SZ_IMX)
return;
int j = threadIdx.y + blockDim.y * blockIdx.y;
if (j >= SZ_IMY)
return;
src += i * SZ_IMY * SZ_IMZ + j * SZ_IMZ;
dst += i * SZ_IMY * (SZ_IMZ + z) + j * (SZ_IMZ + z);
for (int k = 0; k < SZ_IMZ; ++k)
dst[k] = src[k];
}
void d_pad(float *dst, float *src, const int z = COLUMNS_BLOCKDIM_X - SZ_IMZ % COLUMNS_BLOCKDIM_X) {
HANDLE_ERROR(cudaMemset(dst, 0, SZ_IMX * SZ_IMY * (SZ_IMZ + z) * sizeof(float)));
dim3 BpG((SZ_IMX + NTHRDS / 32 - 1) / (NTHRDS / 32), (SZ_IMY + 31) / 32);
dim3 TpB(NTHRDS / 32, 32);
pad<<<BpG, TpB>>>(dst, src, z);
}
Copy link
Member Author

@casperdcl casperdcl Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB:

  • This is required as z-padding occurs at regular intervals in dst[]; it's not as simple as padding the end of the array. Look at this array pointer offset:
dst += i * SZ_IMY * (SZ_IMZ + z) + j * (SZ_IMZ + z);
  • alternatively using (vis pre 6df9dfd)
for (size_t i = 0; i < SZ_IMX; ++i)
  for (size_t j = 0; j < SZ_IMY; ++j)
    cudaMemcpy(&dst[i * SZ_IMY * (SZ_IMZ + z) + j * (SZ_IMZ + z)],
               &src[i * SZ_IMY * SZ_IMZ + j * SZ_IMZ],
               SZ_IMZ * sizeof(float), cudaMemcpyDeviceToDevice);

is very slow. It's quicker to use a CUDA kernel rather than calling cudaMemcpy 320x320 times.

  • alternatively, "fixing" d_conv to work on z=127 is too painful for me at the moment.

@pjmark pjmark merged commit dc7da7d into dev2 Oct 27, 2020
@pjmark pjmark deleted the resmod branch October 27, 2020 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants