well, if I can trust the changes I'm going to upload in a bit, then the previous custom setup was wrong and we need to regenerate the reference data for using the correct setup. Still investigating
Paul Bauer (cd480c76) at 20 Mar 15:38
Fix buffer pinning
Fixes #5559
Paul Bauer (d03953d5) at 20 Mar 14:48
Paul Bauer (1f1ca835) at 20 Mar 14:48
Allow for multiple identical SETTLE setups
marking as draft, because things seem to have gotten broken in a serious way
or should this condition be something else, as it was for the old code?
why did you place the assert inside the branch here? It can by definition never be false if we entered the branch
Paul Bauer (9c4eb363) at 20 Mar 13:32
Need to use filler particles
This is pure refactoring, no behaviour changes apart from not rushing to destruct PmePpCommGpu.
Now a single object coordinates both communication and force-reduction aspects of PP ranks when they use a separate PME-only rank. Direct-GPU communication is handled by an object it owns. It keeps track of init- and repartition-time constants so these no longer have to be passed by do_force() every step.
The new object does not depend on t_forcerec, interaction_const_t, or StatePropagatorGpu (which the old code did).
Send-parameters and send-coordinates commands embed their own communication logic rather than call a common function that had little in common.
Signal-reset, switch-grid, and finish commands now send simple blocking MPI messages, rather than delegate to that common function, so their methods can be const.
Eliminated fields of gmx_domdec_t that related primarily to PP-PME communication. The DD builder still prepares the data needed to create PmePpComm, which can only be created after GPU task assignment is complete, so PmePpCommSettings is introduced to hold that data. The PmePpCommSettings is built in a new method of the DD builder, refactored out of setupGroupCommunication().
The NVSHMEM helper needs to know the rank of a possible PME-only rank, which is now supplied by a getter of PmePpComm rather than via gmx_domdec_t.
PME load balancing now directs PmePpComm to switch grids.
MPI requests are now handled in a std::vector
Fewer objects are made on the heap
Include statements have been adjusted to minimize dependencies
More const correctness
Refs #5539
my fault for only building with HIP locally
Paul Bauer (ea03f50e) at 20 Mar 12:19
Apply 1 suggestion(s) to 1 file(s)
Move towards using more of the existing code instead of reimplementing what already exists.
Closes #5589
This is pure refactoring, no behaviour changes apart from not rushing to destruct PmePpCommGpu.
Now a single object coordinates both communication and force-reduction aspects of PP ranks when they use a separate PME-only rank. Direct-GPU communication is handled by an object it owns. It keeps track of init- and repartition-time constants so these no longer have to be passed by do_force() every step.
The new object does not depend on t_forcerec, interaction_const_t, or StatePropagatorGpu (which the old code did).
Send-parameters and send-coordinates commands embed their own communication logic rather than call a common function that had little in common.
Signal-reset, switch-grid, and finish commands now send simple blocking MPI messages, rather than delegate to that common function, so their methods can be const.
Eliminated fields of gmx_domdec_t that related primarily to PP-PME communication. The DD builder still prepares the data needed to create PmePpComm, which can only be created after GPU task assignment is complete, so PmePpCommSettings is introduced to hold that data. The PmePpCommSettings is built in a new method of the DD builder, refactored out of setupGroupCommunication().
The NVSHMEM helper needs to know the rank of a possible PME-only rank, which is now supplied by a getter of PmePpComm rather than via gmx_domdec_t.
PME load balancing now directs PmePpComm to switch grids.
MPI requests are now handled in a std::vector
Fewer objects are made on the heap
Include statements have been adjusted to minimize dependencies
More const correctness
Refs #5539
Paul Bauer (2d83cf6f) at 20 Mar 09:32
Deduplicate NBNxM FEP GPU test setup
it was supposed to be used for EXCLUSION_FORCES, but that is not needed for the FEP kernel. I still want to use it for the regular kernels, but will do that in a follow-up. Removed now
sorry, I misunderstood your earlier comment then. I can move things back then as you suggest