- The data used is available here.
Download the csv file and update the path to the csv file in the
config.yamlfile indata.raw_dataor in the environment variableDATA_PATH - Update the python environment in
.envfile - Install
poetryif not already installed - Install the dependencies using poetry
poetry install - update the config and model parameters in the
config.yamlfile - Add
./srcto thePYTHONPATH-export PYTHONPATH="${PYTHONPATH}:./src" - Run
python src/main.pyorpoetry run python src/main.py
- Build the docker image -
docker build -t regression . - Bring up the dependencies by using
docker compose up -d - Run the container with the correct
DATA_PATHandMLFLOW_TRACKING_URIas environment variables. (Refer to the following Environment Variables table for complete list)
docker run -e DATA_PATH=/app/artefacts/HousingData.csv -e MLFLOW_TRACKING_URI=http://host.docker.internal:5000 -v ./artefacts:/app/artefacts --rm regression
The following environment variables can be set to configure the training:
| Variable | Default Value | Description |
|---|---|---|
| DATA_PATH | ./artefacts/HousingData.csv |
File path to the raw data CSV data used for training |
| CONFIG_PATH | ./config.yaml |
File path to the model training and other configuration file |
| LOG_LEVEL | INFO |
The logging level for the application. Valid values are DEBUG, INFO, WARNING, ERROR, and CRITICAL. |
| MLFLOW_TRACKING_URI | http://localhost:5000 |
MLFlow tracking URI. Use http://host.docker.internal:5000 if the MLFlow is running within docker container. |
| GITHUB_USERNAME | None | Githuib username. This is needed to pull the data form the dvc repo. |
| GITHUB_PASSWORD | None | Githuib token. This is needed to pull the data form the dvc repo. |
| DVC_REMOTE | s3://artifacts |
Dvc remote |
| DVC_REMOTE_NAME | regression-model-remote |
Dvc remote name. |
| DVC_ENDPOINT_URL | http://minio |
The URL endpoint for the DVC storage backend. This is typically the URL of an S3-compatible service, such as MinIO, used to store and manage datasets and model files. |
| AWS_DEFAULT_REGION | eu-west-2 |
The dvc remote s3 bucket region |
| DVC_ACCESS_KEY_ID | None | Access key id for dvc remote. Optional. Not needed if using IAM based access for dvc remote. |
| DVC_SECRET_ACCESS_KEY | None | secret access key for dvc remote. Optional. Not needed if using IAM based access for dvc remote. |
| DEPLOY_AS_CODE | False | Should manual intervention or evaluation is needed to register the trained model or not. |
| DEPLOY_MODEL_NAME | house_price_prediction |
The name with which the model will be registered. This name and alias will be used for deployment. |
| DEPLOY_MODEL_ALIAS | champion |
The alias to be added to the model. This alias along with the name will be used for deployment. |
Ensure that you have the project requirements already set up by following the Model training instructions
- Ensure
pytestis installed.poetry installwill install it as a dev dependency. -
- For integration tests, set up the dependencies (MLFlow) by running,
docker-compose up -d
- For integration tests, set up the dependencies (MLFlow) by running,
- Run the tests with
poetry run pytest ./tests