You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docker/data-review-tool/README.md
+12-8Lines changed: 12 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
This docker image contains `Finding Fossils`, a data review tool built using Dash, Python. It is used to visualize the outputs of the models and verify the extracted entities for inclusion in the Neotoma Database.
4
4
5
-
The expected inputs are mounted onto the newly created container as volumes and can be dumped in the `data/data-review-tool` folder. It assumes the following:
5
+
The expected inputs are mounted onto the newly created container as volumes and can be dumped in any folder. An environment variable is setup to provide the path to this folder. It assumes the following:
6
6
1. A parquet file containing the outputs from the article relevance prediction component.
7
7
2. A zipped file containing the outputs from the named entity extraction component.
8
8
3. Once the articles have been verified we update the same parquet file referenced using the environment variable `ARTICLE_RELEVANCE_BATCH` with the entities verified by the steward and the status of review for the article.
@@ -15,13 +15,9 @@ The following environment variables can be set to change the behavior of the pip
15
15
16
16
## Sample Docker Compose Setup
17
17
18
-
Update the environment variables defined under the `data-review-tool` service in the `docker-compose.yml` file under the root directory. Then build and run the docker image to install the required dependencies using `docker-compose` as follows:
19
-
```bash
20
-
docker-compose build
21
-
docker-compose up data-review-tool
22
-
```
18
+
Update the environment variables and the volume paths defined under the `data-review-tool` service in the `docker-compose.yml` file under the root directory. The volume paths are:
23
19
24
-
This is the basic docker compose configuration for running the image.
20
+
`INPUT_PATH`: The path to the directory where the data is dumped. eg. `./data/data-review-tool` (recommended)
Copy file name to clipboardExpand all lines: docker/entity-extraction-pipeline/README.md
+13-8Lines changed: 13 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,11 +19,10 @@ The following environment variables can be set to change the behavior of the pip
19
19
20
20
## Sample Docker Compose Setup
21
21
22
-
Update the environment variables defined under the `entity-extraction-pipeline` service in the `docker-compose.yml` file under the root directory. Then build and run the docker image to install the required dependencies using `docker-compose` as follows:
23
-
```bash
24
-
docker-compose build
25
-
docker-compose up entity-extraction-pipeline
26
-
```
22
+
Update the environment variables defined under the `entity-extraction-pipeline` service in the `docker-compose.yml` file under the root directory. The volume paths are:
23
+
24
+
-`INPUT_PATH`: The folder containing the raw text `nlp352` TSV file, eg. `./data/entity-extraction/raw/original_files/` (recommended)
25
+
-`OUTPUT_PATH`: The folder to dump the final JSON files, eg. `./data/entity-extraction/processed/processed_articles/` (recommended)
27
26
28
27
Below is a sample docker compose configuration for running the image:
29
28
```yaml
@@ -36,13 +35,19 @@ services:
36
35
ports:
37
36
- "5000:5000"
38
37
volumes:
39
-
- ./data/raw/:/app/inputs/
40
-
- ./data/processed/:/app/outputs/
38
+
- {INPUT_PATH}:/app/inputs/
39
+
- {OUTPUT_PATH}:/app/outputs/
41
40
environment:
42
41
- HF_NER_MODEL_NAME=finding-fossils/metaextractor
43
42
- SPACY_NER_MODEL_NAME=en_metaextractor_spacy
44
43
- USE_NER_MODEL_TYPE=huggingface
45
44
- LOG_OUTPUT_DIR=/app/outputs/
46
45
- MAX_SENTENCES=20
47
46
- MAX_ARTICLES=1
48
-
```
47
+
```
48
+
Then build and run the docker image to install the required dependencies using `docker-compose` as follows:
0 commit comments