TST: Calculate RMS and diff image in C++#29102
Merged
story645 merged 1 commit intomatplotlib:mainfrom Jun 19, 2025
Merged
Conversation
Member
Author
|
So I no longer have any memory-based skips on the PR adding WASM, but maybe we still want to do this to save memory in general? |
Member
|
This seems to make sense! Should we also use this in |
oscargus
approved these changes
Jun 5, 2025
story645
reviewed
Jun 13, 2025
Member
Author
story645
reviewed
Jun 18, 2025
The current implementation is not slow, but uses a lot of memory per image. In `compare_images`, we have: - one actual and one expected image as uint8 (2×image) - both converted to int16 (though original is thrown away) (4×) which adds up to 4× the image allocated in this function. Then it calls `calculate_rms`, which has: - a difference between them as int16 (2×) - the difference cast to 64-bit float (8×) - the square of the difference as 64-bit float (though possibly the original difference was thrown away) (8×) which at its peak has 16× the image allocated in parallel. If the RMS is over the desired tolerance, then `save_diff_image` is called, which: - loads the actual and expected images _again_ as uint8 (2× image) - converts both to 64-bit float (throwing away the original) (16×) - calculates the difference (8×) - calculates the absolute value (8×) - multiples that by 10 (in-place, so no allocation) - clips to 0-255 (8×) - casts to uint8 (1×) which at peak uses 32× the image. So at their peak, `compare_images`→`calculate_rms` will have 20× the image allocated, and then `compare_images`→`save_diff_image` will have 36× the image allocated. This is generally not a problem, but on resource-constrained places like WASM, it can sometimes run out of memory just in `calculate_rms`. This implementation in C++ always allocates the diff image, even when not needed, but doesn't have all the temporaries, so it's a maximum of 3× the image size (plus a few scalar temporaries).
story645
approved these changes
Jun 18, 2025
Member
story645
left a comment
There was a problem hiding this comment.
Not merging b/c you keep pushing, but you're welcome to merge when you're done tweaking. The memory improvements look awesome!
Member
Author
|
That was just fixing stubtest; it should be good now. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


PR summary
The current implementation is not slow, but uses a lot of memory per image.
In
compare_images, we have:which adds up to 4× the image allocated in this function.
Then it calls
calculate_rms, which has:which at its peak has 16× the image allocated in parallel.
If the RMS is over the desired tolerance, then
save_diff_imageis called, which:which at peak uses 32× the image.
So at their peak,
compare_images→calculate_rmswill have 20× the image allocated, and thencompare_images→save_diff_imagewill have 36× the image allocated. This is generally not a problem, but on resource-constrained places like WASM, it can sometimes run out of memory just incalculate_rms.This implementation in C++ always allocates the diff image, even when not needed, but doesn't have all the temporaries, so it's a maximum of 3× the image size (plus a few scalar temporaries).
PR checklist