Skip to content

Commit df97660

Browse files
committed
docs: setup training/spacy folder
1 parent b7713a0 commit df97660

File tree

7 files changed

+13
-300
lines changed

7 files changed

+13
-300
lines changed

src/entity_extraction/training/spacy_ner/README.md renamed to src/entity_extraction/training/spacy/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,11 @@ A bash script is used to initialize a training job. Model training is fully cust
2222
- `--gpu-id`: While executing the `spacy train` command, GPU can be used, if available, by setting this flag to **0**.
2323
3. Make the training script executable:
2424
```bash
25-
chmod +x src/entity_extraction/training/spacy_ner/run_spacy_training.sh
25+
chmod +x src/entity_extraction/training/spacy/run_spacy_training.sh
2626
```
2727
4. Execute the training script from the :
2828
```bash
29-
./src/entity_extraction/training/spacy_ner/run_spacy_training.sh
29+
./src/entity_extraction/training/spacy/run_spacy_training.sh
3030
```
3131

3232
## Evaluation Workflow
@@ -40,11 +40,11 @@ To run full evaluation of the trained model to get detailed metrics and plots, f
4040
5. `GPU` - whether to use GPU or not.
4141
2. Make the evaluation script executable:
4242
```bash
43-
chmod +x src/entity_extraction/training/spacy_ner/run_evaluation.sh
43+
chmod +x src/entity_extraction/training/spacy/run_evaluation.sh
4444
```
4545
3. Run the evaluation script results will be generated in the `OUTPUT_DIR` folder. **This may take while on CPU and even GPU.**
4646
```bash
47-
./src/entity_extraction/training/spacy_ner/run_evaluation.sh
47+
./src/entity_extraction/training/spacy/run_evaluation.sh
4848
```
4949

5050
## Overall Process Diagram
@@ -93,7 +93,7 @@ This notebook sets up the NER model training on Google Colab with GPU. Use the f
9393
3. Create a `data` folder inside the folder you just created and upload the `train.spacy` and `val.spacy` files into it
9494
4. Create a `models` folder, this is where checkpoints will be saved during training
9595
5. Create an `evaluation-results` folder, this is where the evaluation results will be saved
96-
6. Create a copy of the `run_spacy_training.sh` and `run_evaluation.sh` files from `src/entity_extraction/training/spacy_ner` and place it in training run folder
96+
6. Create a copy of the `run_spacy_training.sh` and `run_evaluation.sh` files from `src/entity_extraction/training/spacy` and place it in training run folder
9797
7. Your folder structure should now look like:
9898
```
9999
spacy-transformer-v1

src/entity_extraction/training/spacy_ner/colab_start_training.ipynb renamed to src/entity_extraction/training/spacy/colab_start_training.ipynb

File renamed without changes.

src/entity_extraction/training/spacy_ner/create_config.py renamed to src/entity_extraction/training/spacy/create_config.py

File renamed without changes.

src/entity_extraction/training/spacy_ner/run_spacy_training.sh renamed to src/entity_extraction/training/spacy/run_spacy_training.sh

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,9 @@ VAL_SPLIT=0.15
1717
TEST_SPLIT=0.15
1818

1919

20-
rm -f src/entity_extraction/training/spacy_ner/spacy_transformer_$VERSION.cfg
20+
rm -f src/entity_extraction/training/spacy/spacy_transformer_$VERSION.cfg
2121

22-
python3 src/preprocessing/labelling_data_split.py \
22+
python3 src/entity_extraction/preprocessing/labelling_data_split.py \
2323
--raw_label_path $DATA_PATH \
2424
--output_path $DATA_OUTPUT_PATH \
2525
--train_split $TRAIN_SPLIT \
@@ -33,25 +33,25 @@ if [ -z "$MODEL_PATH" ]; then
3333

3434
# Fill configuration with required fields
3535
python -m spacy init fill-config \
36-
src/entity_extraction/training/spacy_ner/spacy_transformer_train.cfg \
37-
src/entity_extraction/training/spacy_ner/spacy_transformer_$VERSION.cfg
36+
src/entity_extraction/training/spacy/spacy_transformer_train.cfg \
37+
src/entity_extraction/training/spacy/spacy_transformer_$VERSION.cfg
3838

3939
# Execute the training job by pointing to the new config file
4040
python -m spacy train \
41-
src/entity_extraction/training/spacy_ner/spacy_transformer_$VERSION.cfg \
41+
src/entity_extraction/training/spacy/spacy_transformer_$VERSION.cfg \
4242
--paths.train $DATA_OUTPUT_PATH/train.spacy \
4343
--paths.dev $DATA_OUTPUT_PATH/val.spacy \
4444
--output $MODEL_OUTPUT_PATH \
4545
--gpu-id -1
4646

4747
else
4848
# Else create a new config file to resume training
49-
python src/entity_extraction/training/spacy_ner/create_config.py \
49+
python src/entity_extraction/training/spacy/create_config.py \
5050
--model_path $MODEL_PATH \
51-
--output_path src/entity_extraction/training/spacy_ner/spacy_transformer_$VERSION.cfg
51+
--output_path src/entity_extraction/training/spacy/spacy_transformer_$VERSION.cfg
5252

5353
python -m spacy train \
54-
src/entity_extraction/training/spacy_ner/spacy_transformer_$VERSION.cfg \
54+
src/entity_extraction/training/spacy/spacy_transformer_$VERSION.cfg \
5555
--paths.train $DATA_OUTPUT_PATH/train.spacy \
5656
--paths.dev $DATA_OUTPUT_PATH/val.spacy \
5757
--components.ner.source $MODEL_PATH \

src/entity_extraction/training/spacy_ner/spacy_transformer_train.cfg renamed to src/entity_extraction/training/spacy/spacy_transformer_train.cfg

File renamed without changes.

src/entity_extraction/training/spacy_ner/run_evaluation.sh

Lines changed: 0 additions & 24 deletions
This file was deleted.

src/entity_extraction/training/spacy_ner/spacy_evaluate.py

Lines changed: 0 additions & 263 deletions
This file was deleted.

0 commit comments

Comments
 (0)