Force-discard cache if cache format changed#20152
Conversation
This comment has been minimized.
This comment has been minimized.
JukkaL
left a comment
There was a problem hiding this comment.
Looks good, I only have a few potential improvements and a question.
| if meta[0] != cache_version() or meta[1] != CACHE_VERSION: | ||
| manager.log(f"Metadata abandoned for {id}: incompatible cache format") | ||
| return None | ||
| data_io = Buffer(meta[2:]) |
There was a problem hiding this comment.
The slice operation does an almost full copy of the meta buffer. It's probably not a big deal, but it would be nice if we could avoid it. What about adding read_byte operation in the internal API (that would be just an alias to read_tag for now I guess) that could be used to read the initial two bytes? (No need to do this in this PR.)
There was a problem hiding this comment.
Could also maybe slice the memoryview?
There was a problem hiding this comment.
FWIW I just measured this slice on a "micro-benchmark", it takes around 0.1 microsecond per file (interpreted). So even with 10K files in the build this will be an extra millisecond. It is probably even less when compiled. I will add a TODO here and below.
| meta.write(data_io) | ||
| meta_bytes = data_io.getvalue() | ||
| # Prefix with both low- and high-level cache format versions for future validation. | ||
| meta_bytes = bytes([cache_version(), CACHE_VERSION]) + data_io.getvalue() |
There was a problem hiding this comment.
Similar to above, the concatenate operation does a full copy of the cache data. We could add a write_byte operation to write the version bytes to a Buffer object in a future-proof way.
mypy/cache.py
Outdated
| conventionally *does not* read the start tag (to simplify logic for unions). Known exceptions | ||
| are MypyFile.read() and SymbolTableNode.read(), since those two never appear in a union. | ||
|
|
||
| If any of these details change, please bump CACHE_VERSION below. |
There was a problem hiding this comment.
Do we need to bump CACHE_VERSION also if the structure of the meta cache serialization format changes? We might not able to read the mypy version info field from the meta file.
There was a problem hiding this comment.
Yes, we do need to bump it. It will most likely fail with ValueError and thus will be caught and cache will be discarded, but the log line will be wrong. I will update this.
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
If either low-level (i.e.
librt) or high-level cache format changes, discard the cache. Note I intentionally don't uselibrtto read/write the first two bytes of cache meta, se we are 100% sure we can always read them.