This is an open source Java implementation of a GEDCOM 5.5.1 to GEDCOM 7.0 converter. It is a fork of java-converter released to public domain by its author Luther Tychonievich.
The aim of this fork is to publish a somewhat polished and maven-buildable version. Furthermore I will try to complete the missing functionalities (see below).
mvn package
java -jar target/gedcom-5to7-1.0.2.jar data/gedcom551.ged
java -jar target/gedcom-5to7-1.0.2.jar data/gedcom551.ged > data/gedcom7.gedAdd the repository and the dependency to your application's pom.xml:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://maven.apache.org/POM/4.0.0"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
...
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
<dependencies>
...
<dependency>
<groupId>com.github.cbettinger</groupId>
<artifactId>gedcom-5to7</artifactId>
<version>1.0.2</version>
</dependency>
</dependencies>
</project>Parse GEDCOM 5.5.1 file and write GEDCOM 7 file:
import bettinger.gedcom5to7.Converter;
import bettinger.gedcom5to7.Converter.ConvertException;
...
try (final OutputStream output = new FileOutputStream(target)) {
final Converter converter = Converter.parse(source);
converter.write(output);
} catch (final ConvertException e1) {
System.err.println(e1.toString());
} catch (final IOException e2) {
System.err.println(e2.toString());
}This implements all of the major pieces of a 5.5.1-to-7.0 converter. Some tests were perfomed during development, but not enough to provide confidence of bug-free status.
- change string-valued
INDI.ALIAintoNAMEwithTYPEAKA - move base64-encoded OBJE into GEDZIP file
- add
SCHMAfor all used known extensions- add URIs (or standard tags) for all extensions from https://wiki-de.genealogy.net/GEDCOM/_Nutzerdef-Tag and http://www.gencom.org.nz/GEDCOM_tags.html
- Detect character encodings, as documented in ELF Serialisation.
- Convert to UTF-8
- Normalize line whitespace, including stripping leading spaces
- Remove
CONC - Fix
@usage - Limit character set of cross-reference identifiers
- Normalize case of tags
- Covert
DATE- replace date_phrase with
PHRASEstructure - replace calendar escapes with calendar tags
- change
BCandB.C.toBCEand remove if found in unsupported calendars - replace dual years with single years and
PHRASEs - replace just-year dual years in unqualified date with
BET/AND
- replace date_phrase with
- Convert
AGE- change age words to canonical forms (stillborn as
0y, child as< 8y, infant as< 1y) withPHRASEs - Normalize spacing in
AGEpayloads - add missing
y
- change age words to canonical forms (stillborn as
- change
SOURwith text payload into pointer toSOURwithNOTE - change
OBJEwith no payload to pointer to newOBJErecord - change
NOTErecord or with pointer payload intoSNOTE- use heuristic to change some pointer-
NOTEto nested-NOTEinstead ofSNOTE
- use heuristic to change some pointer-
- Convert
LANGpayloads to BCP 47 tags, using FHISO's mapping - tag renaming, including
EMAI,_EMAIL→EMAILFORM.TYPE→FORM.MEDI- (deferred)
_SDATE→SDATE--_SDATEis also used as "accessed at" date for web resources by some applications so this change is not universally correct _UID→UID_ASSO→ASSO_CRE,_CREAT→CREA_DATE→DATEASSO.RELA→ASSO.ROLE- other?
- Enumerated values
- Normalize case
- Convert user-text to
PHRASEs
- change
RFN,RIN, andAFNtoEXID - change
_FSFTID,_APIDtoEXID - Convert
MEDI.FORMpayloads to media types - Convert
FONEandROMNtoTRANand theirTYPEs to BCP-47LANGs - change
FILEpayloads into URLs- Windows-style
\becomes/ - Windows diver letter
C:\WINDOWSbecomesfile:///c:/WINDOWS - POSIX-stye
/User/foobecomesfile:///User/foo
- Windows-style
- remove
SUBN,HEAD.FILE,HEAD.CHAR - update the
GEDC.VERSto7.0 - Change any illegal tag
XYZinto_EXT_XYZ- or to
_XYZand add a SCHMA entry for it - leave unchanged under extensions
- or to
The folder src/main/resources contains copies of the TSV defintion files from https://github.com/FamilySearch/GEDCOM/, https://github.com/fhiso/legacy-format/ and https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry used during runtime.
These can be updated by running
javac DownloadDefinitions.java
java DownloadDefinitionsfrom the projects root directory.
DownloadDefinitions.java is otherwise unneeded, and should not be included in distributions of the gedcom-5to7 package.