Use HL7 files on disk as a source (aka HL7 replay)

We have files being saved to disk (see #125 ), but they can't be re-read yet. Much of this code already exists (see `Hl7FromFile#readOnceAndQueue` ) but it needs to be joined up and made configurable.

Also needs to deal with the potential for duplication of data. Currently data will be appended to CSV files regardless of whether it's already present. De-dupe at the ->parquet stage?


## Suggested implementation
`waveform-reader` usually listens on a TCP port for HL7 messages. I strongly suggest implementing this replay facility as a separate command line entry point to the code in `waveform-reader`. This allows the normal version to stay up and listening as usual, while a separate, ephemeral container reads from file but otherwise goes down a pretty similar code path.

Separate entry points can be implemented as Spring profiles. Set the active profile on the command line with
`-Dspring.profiles.active=replay` and use `@Profile("replay")` to tag a Spring Bean as something that should only be used with that profile. The latter is also how you'd turn off TCP listener when in this mode.

See `uk.ac.ucl.rits.inform.datasources.ids.IdsOperations#populateIDS` as an example.

The closest existing functionality is `WAVEFORM_HL7_TEST_DUMP_FILE`. Since this option attempts to co-exist with TCP listening, and your input files are not necessarily "test" files, I suggest removing this option entirely. You could add command line parameters:
- `--time-start`
- `--time-end`
- `--source-location`
- `--waveform-variable`

Because the saved messages directory is in a predictable structure, these options allow you to filter down to exactly which files should be included, except for the waveform variable, which will involve more than just selective file inclusion/exclusion because one HL7 message can contains multiple variables. You will have to find a good place to put that filter.

For the case where an ad-hoc HL7 file needs to go in, you could have a param `--hl7-file` which is mutually exclusive with the params above. You could auto-detect bz2 vs uncompressed, or just require that all input files are bz2.

Might want to define a new service in `waveform-reader/docker-compose.yml` called `waveform-reader-replay` with following changes: mounts save dir as a read-only volume, no listen port, different command line that sets the spring profile. (Since you want to pass command line params, is it a Docker "command" or an "entry point" you want to use, I can never remember?)
You can invoke the container with something like `emap docker run waveform-reader-replay --time-start '2026-01-22T00:00:01.000Z' --time-end ''2026-01-22T12:00:01.000Z''`.

Bear in mind the files are named after the *start* of the time interval that they cover, and the length is not fixed.

Add a dry run mode to say how many files would be included/excluded.

Don't forget to disable file saving when replaying ;)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use HL7 files on disk as a source (aka HL7 replay) #139

Suggested implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use HL7 files on disk as a source (aka HL7 replay) #139

Description

Suggested implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions