This script is designed to load data from a CSV file into an MS SQL Server database table. It performs various checks, including header row count, business date comparison, and target table existence, to ensure data integrity and reliability.
- Name: Soorya Prakash Parthiban
- GitHub: drdataSpp
27 May, 2023
This Python script requires a parameter file to be passed when calling it. The parameter file should be in JSON format and contain necessary configuration values for the script to run.
- V1 (2023-05-27): Pushed draft files with Python libraries.
- V2 (2023-05-27): Added Python code to check the existence of the source CSV data file and parameter files & folders.
- V3 (2023-05-28): Added a LOGS folder to create .txt logs on failure. Implemented row count checks and business date checks.
- V4 (2023-05-28): Added code to create a processed data frame and establish an SQL Server connection using pyodbc.
- V4.1 (2023-05-28): Added SQL Truncate and Load Queries.
- V4.2 (2023-05-28): Added ROLLBACK & COMMIT if the correct data is inserted into the Target Table.
- V4.3 (2023-05-28): Added an archive feature when the table is successfully loaded.
- V5 (2023-05-30): Removed .txt Parameter Sets and added JSON Parameter sets.
- V5.1 (2023-05-30): Added an IF TABLE EXISTS clause before loading data.
- V5.2 (2023-05-30): Removed hardcoded values - now picking values from the JSON Parameter file.
- V6 (2023-06-02): Updated folder and file names.
- V6.1 (2023-06-02): Added a feature to pass a parameter file when calling the Python script.
- V6.2 (2023-06-02): Generating success message logs if data is loaded correctly.
To run this script, follow these steps:
-
Install the required Python libraries. You can use the following command to install them:
pip install numpy pandas pyodbc
-
Prepare a JSON parameter file with the required values. The parameter file should contain the following fields:
{
"PROJECT_PARENT_PATH": "<parent_folder_path>",
"FOLDER_NAME": "<source_folder_name>",
"FILE_NAME": "<source_file_name>",
"SQL_SERVER_NAME": "<sql_server_name>",
"SQL_DATABASE_NAME": "<database_name>",
"SQL_TARGET_TABLE": "<table_name>"
}-
Execute the script by running the following command:
Python Terminal command: python script_name.py parameter_file.json GIT BASH command : ./run_pelt_script.sh parameter_file.json
Before running the script, ensure that you have the following:
- The source CSV data file exists in the specified folder.
- The parameter file is present in the correct location and follows the JSON format.
The script will perform checks and load the data into the specified SQL Server target table if all conditions are met.
Contributions to this project are welcome. If you find any issues or have suggestions for improvements, please feel free to submit a pull request or open an issue.
Credits to ChatGPT for generating this README file 😉 and to all informative youtubers 🧑🏽💻.



