AStorage-Java

Supported Formats:

Fasta
dbNSFP v4.3a
gnomAD v4
SpliceAI v1.3
PharmGKB
ClinVar
GTEx v8
GTF
GERP
dbSNP

Formats mapped in the universal variant query:

dbNSFP v4.3a
gnomAD v4
SpliceAI v1.3
ClinVar
GERP
dbSNP

Setup: Building and Running [Linux/MacOS]

Clone the master branch and package the application as a JAR file:

git clone [email protected]:ForomePlatform/AStorage-Java.git
cd AStorage-Java
./mvnw clean package

The JAR file will be generated inside the target directory as astorage-java-1.0.0.jar.

On the first run the application creates a data folder in the user’s home directory with the name AStorage by default if not specified otherwise.
The service is running on port 8080 by default if not specified otherwise.

Note	These properties can be adjusted using a config.json file.

config.json example:

{
    "dataDirectoryPath": "/home/user/ExampleStorage",
    "serverPort": 3000
}

To start the application run:

cd target
java -jar astorage-java-1.0.0.jar [config_json_path]

Note	AStorage logs(e.g. ingestion progress) are being written in <dataDirectoryPath>/output_<currentTimeMillis>.log file. Some of the output is printed in terminal where the program is being run.

For detailed API specification access the Swagger UI via: http://localhost:8080/api.

Setup: Ingestion

Important Note on Data Ingestion!

To avoid issues such as overlaps, duplicates, or data inconsistencies, it is crucial to drop the specific repository corresponding to the format being ingested using the provided Drop Repository API if the previous ingestion was unsuccessful or encountered errors.

Always ensure that any failed or corrupted repository is properly cleared before attempting another ingestion.

Note

The UniversalVariant repository stores normalized variants for supported formats when the normalize parameter is set to true during ingestion. For example, if an error occurs while ingesting ClinVar data, dropping the ClinVar repository using the provided API will automatically remove ClinVar-related data from the UniversalVariant repository. However, dropping the entire UniversalVariant repository will remove all normalized data across every format.

Fasta:

Download the reference genome: GRCh38.p14_genomic and its assembly report: GRCh38.p14_assembly_report and run ingestion:

curl -X POST "http://localhost:8080/ingestion/fasta?refBuild=GRCh38&dataPath={dataPath}&metadataPath={assemblyReportPath}"

API reference: Fasta Ingestion.

dbNSFP:

Download the entire dbNSFP database: dbNSFP4.3a, extract the downloaded content and run ingestion for each chromosome variant one by one:

curl -X POST "http://localhost:8080/ingestion/dbnsfp?dataPath={chrDataPath}"

API reference: dbNSFP Ingestion.

gnomAD:

Download available exomes and genomes from: gnomAD v4 and ingest the downloaded files:

Note	If you set the normalize parameter to true for ingestion Fasta GRCh38 should already be ingested into the AStorage.

curl -X POST "http://localhost:8080/ingestion/gnomad?dataPath={dataPath}&sourceType={sourceType}&normalize=true&refBuild=GRCh38"

API reference: gnomAD Ingestion.

SpliceAI:

Access the SpliceAI annotations here: SpliceAI v1.3 for which you’ll need an account of Illumina.

From the Illumina Sequence Hub Projects tab open the added project: Predicting splicing from primary sequence, then open genome_scores_v1.3, click on FILES and download spliceai_scores.raw.indel.hg38.vcf.gz and spliceai_scores.raw.snv.hg38.vcf.gz.

Run the ingestion for each data file:

Note	If you set the normalize parameter to true for ingestion Fasta GRCh38 should already be ingested into the AStorage.

curl -X POST "http://localhost:8080/ingestion/spliceai?dataPath={dataPath}&normalize=true&refBuild=GRCh38"

API reference: SpliceAI Ingestion.

PharmGKB:

Download the appropriate data files from: PharmGKB Downloads and ingest the downloaded files:

curl -X POST "http://localhost:8080/ingestion/pharmgkb?dataType={dataType}&dataPath={dataPath}"

API reference: PhramGKB Ingestion.

ClinVar:

Download the latest ClinVar release: ClinVarFullRelease_00-latest and its variant summary: variant_summary and ingest the downloaded files:

Note	If you set the normalize parameter to true for ingestion required Fasta reference genomes should already be ingested into the AStorage.

curl -X POST "http://localhost:8080/ingestion/clinvar?dataPath={dataPath}&dataSummaryPath={dataSummaryPath}&normalize=true"

API reference: ClinVar Ingestion.

GTEx:

Download the GTEx v8 bulk tissue expression data: GTEx_Analysis_2017-06-05_v8 and ingest the downloaded file:

curl -X POST "http://localhost:8080/ingestion/gtex?dataPath={dataPath}"

API reference: GTEx Ingestion.

GTF:

Download the GRCh38 GTF data file: Homo_sapiens.GRCh38.111.chr and ingest the downloaded file:

curl -X POST "http://localhost:8080/ingestion/gtf?dataPath={dataPath}"

API reference: GTF Ingestion.

GERP:

Retrieve the necessary GERP rates files for each chromosome and ingest the downloaded files one by one:

curl -X POST "http://localhost:8080/ingestion/gerp?dataPath={dataPath}"

API reference: GERP Ingestion.

dbSNP:

Download the complete dbSNP data: 00-All and ingest the downloaded file:

curl -X POST "http://localhost:8080/ingestion/dbsnp?dataPath={dataPath}"

API reference: dbSNP Ingestion.

Additional Notes

Batch-query parameters match single-query parameters for every format.
To use the normalization service appropriate genome reference builds(e.g. GRCh38 and GRCh37) should be ingested into Fasta first.
To batch-normalize the data same approach is used as in the batch-query.

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
.idea		.idea
.mvn/wrapper		.mvn/wrapper
docs		docs
src/main		src/main
.gitignore		.gitignore
README.adoc		README.adoc
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AStorage-Java

Supported Formats:

Formats mapped in the universal variant query:

Setup: Building and Running [Linux/MacOS]

Setup: Ingestion

Important Note on Data Ingestion!

Fasta:

dbNSFP:

gnomAD:

SpliceAI:

PharmGKB:

ClinVar:

GTEx:

GTF:

GERP:

dbSNP:

Additional Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AStorage-Java

Supported Formats:

Formats mapped in the universal variant query:

Setup: Building and Running [Linux/MacOS]

Setup: Ingestion

Important Note on Data Ingestion!

Fasta:

dbNSFP:

gnomAD:

SpliceAI:

PharmGKB:

ClinVar:

GTEx:

GTF:

GERP:

dbSNP:

Additional Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages