Skip to content

gedcom7code/java-converter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Incomplete Draft

This is an incomplete, work-in-progress 5.5.1-to-7.0 converter. Some parts are ported directly from the C converter (such as the ANSEL Charset and date and age parsing) while others are built from the ground up. The hope is that having two somewhat-separate implementations will allow me to use the two to test one another, a hope that has already resulted in a few bug fixes in the C version.

Updating to new versions of GEDCOM

The file edu/virginia/ged5to7/GedcomDefinitions.java contains preprocessed copies of the TSV files from https://github.com/FamilySearch/GEDCOM/tree/main/extracted-files. When a new (minor or major) version of the spec is released, updates to those files will need to be incorporated by running

javac DownloadDefinitions.java
java DownloadDefinitions

The above will overwrite the file edu/virginia/ged5to7/GedcomDefinitions.java with an updated version.

DownloadDefinitions.java is otherwise unneeded, and should not be included in distributions of the ged5to7 package.

Current status

  • Detect character encodings, as documented in ELF Serialisation.
  • Convert to UTF-8
  • Normalize line whitespace, including stripping leading spaces
  • Remove CONC
  • Fix @ usage
  • Limit character set of cross-reference identifiers
  • Normalize case of tags
  • Covert DATE
    • replace date_phrase with PHRASE structure
    • replace calendar escapes with calendar tags
    • change BC and B.C. to BCE and remove if found in unsupported calendars
    • replace dual years with single years and PHRASEs
    • replace just-year dual years in unqualified date with BET/AND
  • Convert AGE
    • change age words to canonical forms (stillborn as 0y, child as < 8y, infant as < 1y) with PHRASEs
    • Normalize spacing in AGE payloads
    • add missing y
  • Change any illegal tag XYZ into _EXT_XYZ
    • or to _XYZ and add a SCHMA entry for it
    • leave unchanged under extensions
  • change SOUR with text payload into pointer to SOUR with NOTE
  • change OBJE with no payload to pointer to new OBJE record
  • change NOTE record or with pointer payload into SNOTE
    • use heuristic to change some pointer-NOTE to nested-NOTE instead of SNOTE
  • Convert LANG payloads to BCP 47 tags, using FHISO's mapping
  • Convert MEDI.FORM payloads to media types
  • Enumerated values
    • Normalize case
    • Convert user-text to PHRASEs
  • Convert FONE and ROMN to TRAN and their TYPEs to BCP-47 LANGs
  • tag renaming, including
    • EMAI, _EMAILEMAIL
    • FORM.TYPEFORM.MEDI
    • (deferred) _SDATESDATE -- _SDATE is also used as "accessed at" date for web resources by some applications so this change is not universally correct
    • _UIDUID
    • _ASSOASSO
    • _CRE, _CREATCREA
    • _DATEDATE
    • other?
  • ASSO.RELAASSO.ROLE (changing payload OTHER + PHRASE)
  • change RFN, RIN, and AFN to EXID
  • change _FSFTID, _APID to EXID
  • remove SUBN, HEAD.FILE, HEAD.CHAR
    • (deferred) HEAD.PLAC was originally on this list, but has been deferred to a later version
  • change FILE payloads into URLs
    • Windows-style \ becomes /
    • Windows diver letter C:\WINDOWS becomes file:///c:/WINDOWS
    • POSIX-stye /User/foo becomes file:///User/foo
  • update the GEDC.VERS to 7.0
  • (extra) change string-valued INDI.ALIA into NAME with TYPE AKA
  • (5.5) change base64-encoded OBJE into GEDZIP
  • add SCHMA for all used known extensions

About

5.5.1 to 7.0 converter

Resources

License

Stars

Watchers

Forks

Contributors

Languages