-
-
Notifications
You must be signed in to change notification settings - Fork 34.3k
Open
Labels
docsDocumentation in the Doc dirDocumentation in the Doc dir
Description
Documentation
The tarfile docs does not make it clear how a programmer can read data from a tarfile into memory without doing a round-trip writing it to the file system. As far as I understand, reading partial data from a tar file essentially amounts to the following steps:
import tarfile
with open("myfile.tar") as f:
tar = tarfile.TarFile(fileobj=f)
tar_info = next(member for member in f.getmembers() if member.is_file())
f.seek(tar_info.offset_data)
data = f.read(tar_info.size)However, to arrive at this, you either need to be confident enough to read the CPython source code, or you need to know that tar-files stores the byte-contents unchanged, and that TarInfo.size is the size of the data without the file header. Neither of these are obvious for less experienced programmers.
I suggest that we make two changes to the tarfile docs:
- Expand the documentation for
TarInfo.sizeso it says more than just "Size in bytes". Size of what exactly? The archived file as far as I can tell. - Include a minimal example (like I have above, but slightly more pedagogical maybe) to the Reading Examples section.
I can propose a PR with these changes if you think that is useful.
Linked PRs
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
docsDocumentation in the Doc dirDocumentation in the Doc dir
Projects
Status
No status
Status
Todo