Skip to content

Allow parsing files with UTF-8 BOM #5

@jbvsmo

Description

@jbvsmo

I don't know what the gedcom 5.5 format says about this, but for the sake of simplicity and because most text editors nowadays add it by default, this code should detect and ignore an UTF-8 BOM mark at the start of the file.

It is super complicated to understand why the loading failed because it only says: Line 1 of document violates GEDCOM format 5.5 and nothing more. Because these bytes are meant to be ignored, you can't see the issue on line 1 unless you load the file in python and print a representation of said line.

One option is to use the utf-8-sig codec instead.
https://docs.python.org/3/library/codecs.html#module-encodings.utf_8_sig

Metadata

Metadata

Assignees

Labels

duplicateThis issue or pull request already existsenhancementNew feature or request

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions