A Machine Learning project designed to predict used car prices in the UAE market. This repository demonstrates a complete data science workflow: from exploratory data analysis and logarithmic feature engineering to the deployment of a standalone inference engine.
The UAE car market presents a unique challenge for predictive modeling. Factors such as high luxury car volume, extreme weather conditions affecting vehicle longevity, and the significant price gap between GCC Specs and International Imports create a complex pricing environment.
This project focuses on the "Predictable Market" segment (cars priced ≤ 400,000 AED) to build a robust estimation tool for micro-creators and educators in the automotive space.
research.ipynb: The "Lab." Contains data cleaning, visualization, residual analysis, and model training.engine.py: The "Product." A standalone script that loads the trained model to predict prices for new inputs.uae_used_cars_10k.csv: The raw dataset of 10,000 vehicle listings.mappings.pkl&car_model.pkl: Serialized "knowledge" and model weights used by the engine.requirements.txt: List of necessary Python libraries.
The model is trained on a dataset of over 10,000 used car listings from the UAE/Gulf market.
- Source: https://www.kaggle.com/datasets/mohamedsaad254/uae-used-cars-analysis-full-project-v1-0
- Features: Make, Model, Year, Mileage, Region (GCC/Import), and Price.
Standard linear models struggle with car data because depreciation is exponential, not linear.
- Target Transformation: We modeled the
Log_Priceto stabilize variance. - Mileage Linearization: Applied
np.log1p(Mileage)to treat the first 10,000km of wear as more significant than the difference between 190,000km and 200,000km.
With hundreds of unique Makes and Models, standard "One-Hot Encoding" would create too many columns. Instead, we used Target Encoding (mapping each brand to its average log-price) to keep the model lightweight and fast.
We utilized a Random Forest Regressor with 300 estimators. This ensemble method was chosen for its ability to capture non-linear interactions between Age, Mileage, and Brand Value without overfitting.
During our Residual Analysis, we identified a distinct "Funnel" shape (Heteroscedasticity) in the errors.
Diagnostic: The model currently has a MAPE (Mean Absolute Percentage Error) of ~52%. Our analysis confirms that this error is primarily driven by Latent Variables—specifically the lack of a "Spec" (GCC vs. Import) column in the dataset. This insight provides a clear roadmap for Version 2.0.
pip install -r requirements.txtThis project is the foundation for a suite of automotive intelligence tools. The following milestones are planned to increase model accuracy and commercial utility:
- Spec Identification: Integrate a "Regional Spec" classifier (GCC vs. North American vs. Japanese) to resolve the primary source of price variance.
- Condition Analysis: Utilize Natural Language Processing (NLP) on the
Descriptioncolumn to extract key value-drivers like "Full Agency Contract," "First Owner," or "Major Accident History." - Market Sentiment: Scrape daily listing volume to include a "Market Demand" index for specific models.
- Hyperparameter Optimization: Implement
OptunaorGridSearchCVto fine-tune the Random Forest depth and leaf nodes. - XGBoost Implementation: Benchmark the current model against Gradient Boosting machines to see if they better handle the high-cardinality categorical data.
- Real-Time API: Wrap
engine.pyin a FastAPI container to allow external websites to request price audits. - Web Dashboard: Build a Streamlit or React frontend that visualizes depreciation curves for users.
- Automated Retraining: Set up a GitHub Action to retrain the model monthly as new UAE market data becomes available.
To contribute to this project, please set up your local environment using these steps:
git clone [https://github.com/Moazzam-Matin/Gulf_Auto_Price_Intelligence.git](https://github.com/Moazzam-Matin/Gulf_Auto_Price_Intelligence.git)
cd Gulf_Auto_Price_Intelligence- A huge thank you to the Kaggle community for providing the raw UAE car market dataset that made this project possible.
- Special thanks to the contributors and open-source developers whose libraries (Scikit-Learn, Pandas, Joblib) power this engine.
