We have files being saved to disk (see #125 ), but they can't be re-read yet. Much of this code already exists (see Hl7FromFile#readOnceAndQueue ) but it needs to be joined up and made configurable.
Also needs to deal with the potential for duplication of data. Currently data will be appended to CSV files regardless of whether it's already present. De-dupe at the ->parquet stage?
Suggested implementation
waveform-reader usually listens on a TCP port for HL7 messages. I strongly suggest implementing this replay facility as a separate command line entry point to the code in waveform-reader. This allows the normal version to stay up and listening as usual, while a separate, ephemeral container reads from file but otherwise goes down a pretty similar code path.
Separate entry points can be implemented as Spring profiles. Set the active profile on the command line with
-Dspring.profiles.active=replay and use @Profile("replay") to tag a Spring Bean as something that should only be used with that profile. The latter is also how you'd turn off TCP listener when in this mode.
See uk.ac.ucl.rits.inform.datasources.ids.IdsOperations#populateIDS as an example.
The closest existing functionality is WAVEFORM_HL7_TEST_DUMP_FILE. Since this option attempts to co-exist with TCP listening, and your input files are not necessarily "test" files, I suggest removing this option entirely. You could add command line parameters:
--time-start
--time-end
--source-location
--waveform-variable
Because the saved messages directory is in a predictable structure, these options allow you to filter down to exactly which files should be included, except for the waveform variable, which will involve more than just selective file inclusion/exclusion because one HL7 message can contains multiple variables. You will have to find a good place to put that filter.
For the case where an ad-hoc HL7 file needs to go in, you could have a param --hl7-file which is mutually exclusive with the params above. You could auto-detect bz2 vs uncompressed, or just require that all input files are bz2.
Might want to define a new service in waveform-reader/docker-compose.yml called waveform-reader-replay with following changes: mounts save dir as a read-only volume, no listen port, different command line that sets the spring profile. (Since you want to pass command line params, is it a Docker "command" or an "entry point" you want to use, I can never remember?)
You can invoke the container with something like emap docker run waveform-reader-replay --time-start '2026-01-22T00:00:01.000Z' --time-end ''2026-01-22T12:00:01.000Z''.
Bear in mind the files are named after the start of the time interval that they cover, and the length is not fixed.
Add a dry run mode to say how many files would be included/excluded.
Don't forget to disable file saving when replaying ;)
We have files being saved to disk (see #125 ), but they can't be re-read yet. Much of this code already exists (see
Hl7FromFile#readOnceAndQueue) but it needs to be joined up and made configurable.Also needs to deal with the potential for duplication of data. Currently data will be appended to CSV files regardless of whether it's already present. De-dupe at the ->parquet stage?
Suggested implementation
waveform-readerusually listens on a TCP port for HL7 messages. I strongly suggest implementing this replay facility as a separate command line entry point to the code inwaveform-reader. This allows the normal version to stay up and listening as usual, while a separate, ephemeral container reads from file but otherwise goes down a pretty similar code path.Separate entry points can be implemented as Spring profiles. Set the active profile on the command line with
-Dspring.profiles.active=replayand use@Profile("replay")to tag a Spring Bean as something that should only be used with that profile. The latter is also how you'd turn off TCP listener when in this mode.See
uk.ac.ucl.rits.inform.datasources.ids.IdsOperations#populateIDSas an example.The closest existing functionality is
WAVEFORM_HL7_TEST_DUMP_FILE. Since this option attempts to co-exist with TCP listening, and your input files are not necessarily "test" files, I suggest removing this option entirely. You could add command line parameters:--time-start--time-end--source-location--waveform-variableBecause the saved messages directory is in a predictable structure, these options allow you to filter down to exactly which files should be included, except for the waveform variable, which will involve more than just selective file inclusion/exclusion because one HL7 message can contains multiple variables. You will have to find a good place to put that filter.
For the case where an ad-hoc HL7 file needs to go in, you could have a param
--hl7-filewhich is mutually exclusive with the params above. You could auto-detect bz2 vs uncompressed, or just require that all input files are bz2.Might want to define a new service in
waveform-reader/docker-compose.ymlcalledwaveform-reader-replaywith following changes: mounts save dir as a read-only volume, no listen port, different command line that sets the spring profile. (Since you want to pass command line params, is it a Docker "command" or an "entry point" you want to use, I can never remember?)You can invoke the container with something like
emap docker run waveform-reader-replay --time-start '2026-01-22T00:00:01.000Z' --time-end ''2026-01-22T12:00:01.000Z''.Bear in mind the files are named after the start of the time interval that they cover, and the length is not fixed.
Add a dry run mode to say how many files would be included/excluded.
Don't forget to disable file saving when replaying ;)