Clone the master branch and package the application as a JAR file:
git clone [email protected]:ForomePlatform/AStorage-Java.git
cd AStorage-Java
./mvnw clean packageThe JAR file will be generated inside the target directory as astorage-java-1.0.0.jar.
-
On the first run the application creates a data folder in the user’s home directory with the name AStorage by default if not specified otherwise.
-
The service is running on port 8080 by default if not specified otherwise.
|
Note
|
These properties can be adjusted using a config.json file. |
config.json example:
{
"dataDirectoryPath": "/home/user/ExampleStorage",
"serverPort": 3000
}To start the application run:
cd target
java -jar astorage-java-1.0.0.jar [config_json_path]|
Note
|
AStorage logs(e.g. ingestion progress) are being written in <dataDirectoryPath>/output_<currentTimeMillis>.log file. Some of the output is printed in terminal where the program is being run. |
For detailed API specification access the Swagger UI via: http://localhost:8080/api.
To avoid issues such as overlaps, duplicates, or data inconsistencies, it is crucial to drop the specific repository corresponding to the format being ingested using the provided Drop Repository API if the previous ingestion was unsuccessful or encountered errors.
Always ensure that any failed or corrupted repository is properly cleared before attempting another ingestion.
|
Note
|
The UniversalVariant repository stores normalized variants for supported formats when the normalize parameter is set to true during ingestion. For example, if an error occurs while ingesting ClinVar data, dropping the ClinVar repository using the provided API will automatically remove ClinVar-related data from the UniversalVariant repository. However, dropping the entire UniversalVariant repository will remove all normalized data across every format. |
Download the reference genome: GRCh38.p14_genomic and its assembly report: GRCh38.p14_assembly_report and run ingestion:
curl -X POST "http://localhost:8080/ingestion/fasta?refBuild=GRCh38&dataPath={dataPath}&metadataPath={assemblyReportPath}"API reference: Fasta Ingestion.
Download the entire dbNSFP database: dbNSFP4.3a, extract the downloaded content and run ingestion for each chromosome variant one by one:
curl -X POST "http://localhost:8080/ingestion/dbnsfp?dataPath={chrDataPath}"API reference: dbNSFP Ingestion.
Download available exomes and genomes from: gnomAD v4 and ingest the downloaded files:
|
Note
|
If you set the normalize parameter to true for ingestion Fasta GRCh38 should already be ingested into the AStorage. |
curl -X POST "http://localhost:8080/ingestion/gnomad?dataPath={dataPath}&sourceType={sourceType}&normalize=true&refBuild=GRCh38"API reference: gnomAD Ingestion.
Access the SpliceAI annotations here: SpliceAI v1.3 for which you’ll need an account of Illumina.
From the Illumina Sequence Hub Projects tab open the added project: Predicting splicing from primary sequence, then open genome_scores_v1.3, click on FILES and download spliceai_scores.raw.indel.hg38.vcf.gz and spliceai_scores.raw.snv.hg38.vcf.gz.
Run the ingestion for each data file:
|
Note
|
If you set the normalize parameter to true for ingestion Fasta GRCh38 should already be ingested into the AStorage. |
curl -X POST "http://localhost:8080/ingestion/spliceai?dataPath={dataPath}&normalize=true&refBuild=GRCh38"API reference: SpliceAI Ingestion.
Download the appropriate data files from: PharmGKB Downloads and ingest the downloaded files:
curl -X POST "http://localhost:8080/ingestion/pharmgkb?dataType={dataType}&dataPath={dataPath}"API reference: PhramGKB Ingestion.
Download the latest ClinVar release: ClinVarFullRelease_00-latest and its variant summary: variant_summary and ingest the downloaded files:
|
Note
|
If you set the normalize parameter to true for ingestion required Fasta reference genomes should already be ingested into the AStorage. |
curl -X POST "http://localhost:8080/ingestion/clinvar?dataPath={dataPath}&dataSummaryPath={dataSummaryPath}&normalize=true"API reference: ClinVar Ingestion.
Download the GTEx v8 bulk tissue expression data: GTEx_Analysis_2017-06-05_v8 and ingest the downloaded file:
curl -X POST "http://localhost:8080/ingestion/gtex?dataPath={dataPath}"API reference: GTEx Ingestion.
Download the GRCh38 GTF data file: Homo_sapiens.GRCh38.111.chr and ingest the downloaded file:
curl -X POST "http://localhost:8080/ingestion/gtf?dataPath={dataPath}"API reference: GTF Ingestion.
Retrieve the necessary GERP rates files for each chromosome and ingest the downloaded files one by one:
curl -X POST "http://localhost:8080/ingestion/gerp?dataPath={dataPath}"API reference: GERP Ingestion.
Download the complete dbSNP data: 00-All and ingest the downloaded file:
curl -X POST "http://localhost:8080/ingestion/dbsnp?dataPath={dataPath}"API reference: dbSNP Ingestion.