- FAIRtracks - metadata standard for genomic tracks
FAIRtracks is a set of JSON Schemas developed through the ELIXIR implementation study: "FAIRification of Genomic Tracks", as a minimal standard for genomic track metadata. For more information on the implementation study, please check out:
FAIRtracks v1.0.2
-
The FAIRtracks standard consists of a main JSON Schema and a set of subschemas. A JSON document of track metadata must validate towards the main FAIRtracks JSON Schema to be said to follow the standard.
-
The main FAIRtracks JSON Schema is simply named
fairtracks.schema.jsonand is documented here:Title JSON Schema Schema documentation Example JSON document FAIRtracks JSON Schema fairtracks.schema.json fairtracks.md fairtracks.example.json -
This top-level FAIRtracks JSON Schema contains, in addition to some general metadata fields, four arrays of JSON sub-documents for the four main object types in FAIRtracks:
studies,experiments,samples, andtracks. Each of these object types are described in a separate sub-schema:Title JSON Schema Schema documentation Example JSON document Study fairtracks_study.schema.json fairtracks_study.md fairtracks_study.example.json Experiment fairtracks_experiment.schema.json fairtracks_experiment.md fairtracks_experiment.example.json Sample fairtracks_sample.schema.json fairtracks_sample.md fairtracks_sample.example.json Track fairtracks_track.schema.json fairtracks_track.md fairtracks_track.example.json -
FAIRtracks also contains the following convenience sub-schemas:
Title JSON Schema Schema documentation Example JSON document Phenotype fairtracks_phenotype.schema.json fairtracks_phenotype.md fairtracks_phenotype.example.json Contact fairtracks_contact.schema.json fairtracks_contact.md fairtracks_contact.example.json
- Linux-like shell with "bash". Mac OS X will do, but you probably need to install either XCode (from the App Store) or the XCode Command line tools.
- Python >= 3.6
- Node.js >= v10 and npm >= 3.10.8
- git (relatively recent version is probably best)
- On Mac OS X, all the above can be installed using HomeBrew.
- An OPML editor is also recommended, but not required. See OPML editors below for more information.
- Create personal fork in GitHub ("Fork" button).
- Clone the fork to your computer (e.g.,
git clone https://github.com/myusername/fairtracks_standard.git). - Run
make raw, and edit the raw OPML files to your liking. For more information about themaketargets, see below. - Run
makeormake all - Repeat step 4 and 5 until you are satisfied with the changes.
- Run
make rawcleanto remove the raw OPML files before committing. - Commit and push your changes to a feature branch in your personal fork and create a pull request, as described in the standard GitHub Flow workflow.
- Once the Pull Request is accepted:
- Pull the latest changes in the
masterbranch to your local repo. - Rebase your feature branch on top of
master. - Make sure that all commits are consistently built. The automatically
installed
git-hookswill also check for consistency. To make a commit consistent, rebuild it with therebuild_all.shscript. To clean up previous commits, use interactive rebase as described under 1b. make git-hooks below.
- Pull the latest changes in the
- Force push your feature branch to your personal fork, which should update the pull request, and notify us.
There is an inherent order to the different types of files in this repo,
defined in the Makefile. The FAIRtracks standard is almost fully defined in
the OPML files found under json/overview, with just a small bit of top-level
logic being handled by opml_to_json.py. All
the JSON Schema and JSON example files are automatically generated based upon
the OPML files. Such automatic file generation are handled by various make
targets:
These make targets are run automatically if needed by the other make
targets, but are also available for manual use if there is need.
a. make venv
-
Autogenerates a Python virtual environment in the
.venvdirectory, if not already present. In case the Python executable you want to link up to the virtual environment is located in a non-standard path, you can use the environment variablePYTHON_EXEbefore the firstmake venvcommand. For instance:PYTHON_EXE=/path/to/my/python3 make .venv
b. make git-hooks
-
Installs the version-controlled git hooks into the local repo. The git hooks makes sure that:
- All changed files are committed together
- All secondary files have been recompiled with
make
The checks are run before git commits or remote pushes are finalized.
It is especially important that the git hooks are installed before merging or rebasing is done, as the SHA256 signatures of the JSON files may then need to be recalculated (by
make) on merged/rebased commits. To fix such issues (which will appear when trying to push to GitHub) one will need to carry out an interactive rebase:- Start interactive rebase:
git rebase -i $FIRST_COMMIT^, where$FIRST_COMMITis the first commit that need editing (you can find this in the log messages from the failed remote push). - In the editor that appears, replace
pickwitheditfor the commits that needs editing. You should also at this point plan to clean up your commits by reordering or squashing them, as well as improving the commit messages. ./rebuild_all.sh- For all changed files:
git add $FILE git commit --amendgit rebase --continue- Repeat iii-vi for all commits selected for editing.
c. make jsonschema2md
- Installs the node package "jsonschema2md" which is used to generate the JSON Schema documentation. The package is installed under "node_modules", together with all its dependencies.
The following process should be followed when changing the contents of the
FAIRtracks standard itself:
a. make raw
- This makes copies of the existing *.opml files into similarly names *.raw.opml files. The raw OPML files are made to be opened for editing in specialized outlining tools. As such tools vary in the exact content of the exported OPML files, the raw OPML files need to be compiled into standardized, cleaned-up versions before they are committed to git.
- You only need to run
make rawonce. If you accidentally run the command twice, any existing raw OPML files will be renamed to *.raw.opml.old. - The raw OPML files are ignored by git and can be edited in an OPML editor of choice. See OPML file format below for more information.
- Be sure to delete the raw OPML files (with
make rawclean) before carrying out any git commands. This is important, as e.g. changing branches will not change the raw OPML files, since they are ignored by git. Thus, if one fails to remove the raw OPML files before switching commits,makewill just regenerate the prevous commit on top of the new one.
b. make or make all
-
After the raw OPML files have been edited,
makeruns:make opmlto generate cleaned up, standardized versons of the raw OPML files.make jsonto generate JSON Schema files and related example JSON files from the cleaned up OPML files.make docsto generate Markdown documentation files under thedocsdirectory.
All the generated JSON Schema files, as well as the top-level JSON example file, include a stable SHA256 signature of their contents.
a. make signature
- Computes and prints the stable SHA256 signature for all the JSON files.
b. make rawclean
- Removes all raw OPML and related .old files.
- Should only be run if you are sure that all changes in the raw OPML files
have propagated to other files, i.e. you should make sure that you have run
makefirst. - Raw OPML files must be removed prior to running any
gitcommand, as explained above, section 2a.
c. make clean
- Runs
make rawclean, in addition to removing the virtual environment in the.venvdirectory, the git hooks, and thenode_modulesdirectory.
OPML is a standard file format defined specifically for outlining software.
Raw OPML files can be edited by specific outlining tools, but as the format it is a subtype of XML one can also use generic XML editors:
- On Mac OS, we recommend using the commercial tool OmniOutliner, as there are really no open source alternatives with similar user interface.
- As an open source, platform-agnostic alternative, we recommend TreeLine.
- The OPML files can of course also be edited manually, in which case you can ignore the raw OPML files completely.
-
Each
<outline>tag defines a JSON property, with the hierarchy defined by the XML hierarchy. -
The details for each JSON property is defined by a set of possible attributes for each tag. Many of the standard JSON Schema keywords are directly supported:
Attribute Description _nameThe name of the JSON property. constConstant value (the only value allowed). defaultDefault value if no value is provided. descriptionHuman-readable description of the property. enumSet of values allowed, separated by |.examplesSet of example values, separated by |. All properties must have the same number of examples (or none) within each JSON Schema.formatFormat of current string property. Supports all of the standard JSON Schema formats, and in addition we support two custom formats: "curie" and "term", for respectively Identifiers.org-resolvable CURIEs and ontology terms. minItemsMinimal number of items in current array property. patternRegexp format for current string property. refJSON Pointer to another JSON Schema to import under the property. requiredIf "true"the current property is required.titleTitle of the JSON Schema typeData type of the current property: string, object, array, number, or boolean. In addition to the standard JSON keywords detailed above, a set of extended attributes have been defined:
Attribute Description ancestorsOntology labels, separated by |, used to validate properties intermformat. At least one of these terms must be an ancestor of the value in one of the specified ontologies.autogeneratedIf true, the contents of the current property will be filled automatically by the FAIRtracks autogenerate service (to be implemented later).commentsComments that will remain in the OPML files only. constIfIf the specified if_propertyhas the specifiedif_value, the current property must follow the specifiedthen_value, interpreted asconst.foreignPropertyJSON Pointer to a linked identifier property in another schema. Two JSON documents, one following the current schema and the other following the foreign schema, are related if the values in the two linked properties are the same. matchTypeValidation rule. For properties in curieformat: eitherbasic,loose, orcanonical. For properties incurieformat: eitherexact,suffix, orlabel.namespaceNamespaces, separated by |, registered in http://identifiers.org. Is used to validatecurievalues.ontologyURLs to downloadable ontologies in OWL format, separated by |. To be used to validate properties intermformat, which is used for ontologyterm_idproperties.ontologyTermPairPair of JSON Pointers in the format id=IDPTR;label=LABELPTR, whereIDPTRandLABELPTRare JSON Pointers to, respectively, an ontology term id and its corresponding (primary) label. Currently only pointers to child properties are supported, e.g.id=0/term_id;label=0/term_label. To be used in autogeneration and validation.requireAnyOfFor every level of the object hierarchy, at least one of the properties with requireAnyOf="true"at that level isrequired.requireIfIf the specified if_propertyhas the specifiedif_value, the current property isrequired.uniqueIf "true"the value of the current property must be unique across all JSON documents.For more information, please visit the FAIRtracks validator GitHub repository (see VALIDATION.md for directions).
-
The
constIfandrequireIfattributes require the value to follow a specific pattern:Pattern part Description Attribute(s) Obligatory Example if_property=Relative JSON Pointer to property to check constIfrequireIfYes 2/technique/term_id=if_valueValue to check for constIfrequireIfYes http://purl.obolibrary.org/obo/OBI_0001853;If-then delimiter constIfYes ;then_property=Relative JSON Pointer to property to acquire constvalueconstIfNo 1/term_id=then_valueconstvalue forthen_propertyconstIfYes http://purl.obolibrary.org/obo/SO_0000685|Pattern delimiter (between patterns if more than one) constIfrequireIfNo -
In order to support multiple OPML editors, the first
<outline>tag in the OPML files (the one with_text="#title") should contain all properties in alphabetical order, with an attached value (typically"."or"0"). These parameters are ignored for that line (as it is just used to generate thetitleof the JSON Schema). -
When adding, removing, or renaming attributes:
- Please update the first
<outline>tag (with_text="#title") for all OPML files, as described directly above. - In most cases, new attributes should also be added to the
ATTRIBS_TO_IMPORTconstant in the opml_to_json.py script, in the order in which they should appear in the generated JSON Schemas.
- Please update the first
- Please visit the VALIDATION.md document.
