You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/entity_extraction/training/spacy_ner/README.md
+1-2Lines changed: 1 addition & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,11 +12,10 @@ This folder contains the training and evaluation scripts for the SpaCy Transform
12
12
## Training Workflow
13
13
14
14
A bash script is used to initialize a training job. Model training is fully customizable and users are encouraged to update the parameters in the `run_spacy_training.sh` and `spacy_transfomer_train.cfg` files prior to training. The training workflow is as follows:
15
-
1. Create a new data directory and dump all the TXT files (contains annotations in the JSONLines format) from Label Studio.
15
+
1. Create a new data directory and dump all the JSON files containing annotations from Label Studio and any reviewed parquet files.
16
16
2. Most parameters can be used with the default value, open the `run_spacy_training.sh` bash script and update the following fields with absolute paths or relative paths from the root of the repository:
17
17
-`DATA_PATH`: path to directory with Label Studio labelled data
18
18
-`DATA_OUTPUT_PATH`: path to directory to store the split dataset (train/val/test) as well as other data artifacts required for training.
19
-
-`MODEL_PATH`: If retraining, specify path to model artifacts. If training a model from scratch, pass empty string `""`
20
19
-`MODEL_OUTPUT_PATH`: path to store new model artifacts
21
20
-`VERSION`: Version can be updated to keep track of different training runs.
22
21
-`--gpu-id`: While executing the `spacy train` command, GPU can be used, if available, by setting this flag to **0**.
0 commit comments