Skip to content

Use HL7 files on disk as a source (aka HL7 replay) #139

@jeremyestein

Description

@jeremyestein

We have files being saved to disk (see #125 ), but they can't be re-read yet. Much of this code already exists (see Hl7FromFile#readOnceAndQueue ) but it needs to be joined up and made configurable.

Also needs to deal with the potential for duplication of data. Currently data will be appended to CSV files regardless of whether it's already present. De-dupe at the ->parquet stage?

Suggested implementation

waveform-reader usually listens on a TCP port for HL7 messages. I strongly suggest implementing this replay facility as a separate command line entry point to the code in waveform-reader. This allows the normal version to stay up and listening as usual, while a separate, ephemeral container reads from file but otherwise goes down a pretty similar code path.

Separate entry points can be implemented as Spring profiles. Set the active profile on the command line with
-Dspring.profiles.active=replay and use @Profile("replay") to tag a Spring Bean as something that should only be used with that profile. The latter is also how you'd turn off TCP listener when in this mode.

See uk.ac.ucl.rits.inform.datasources.ids.IdsOperations#populateIDS as an example.

The closest existing functionality is WAVEFORM_HL7_TEST_DUMP_FILE. Since this option attempts to co-exist with TCP listening, and your input files are not necessarily "test" files, I suggest removing this option entirely. You could add command line parameters:

  • --time-start
  • --time-end
  • --source-location
  • --waveform-variable

Because the saved messages directory is in a predictable structure, these options allow you to filter down to exactly which files should be included, except for the waveform variable, which will involve more than just selective file inclusion/exclusion because one HL7 message can contains multiple variables. You will have to find a good place to put that filter.

For the case where an ad-hoc HL7 file needs to go in, you could have a param --hl7-file which is mutually exclusive with the params above. You could auto-detect bz2 vs uncompressed, or just require that all input files are bz2.

Might want to define a new service in waveform-reader/docker-compose.yml called waveform-reader-replay with following changes: mounts save dir as a read-only volume, no listen port, different command line that sets the spring profile. (Since you want to pass command line params, is it a Docker "command" or an "entry point" you want to use, I can never remember?)
You can invoke the container with something like emap docker run waveform-reader-replay --time-start '2026-01-22T00:00:01.000Z' --time-end ''2026-01-22T12:00:01.000Z''.

Bear in mind the files are named after the start of the time interval that they cover, and the length is not fixed.

Add a dry run mode to say how many files would be included/excluded.

Don't forget to disable file saving when replaying ;)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions