London Real Estate Development | Predictions & Recommendations
Sherlock Homes is a data-driven Real Estate project focused on the London market. By combining digitalization with construction insights, we developed Watson V1.0, a machine learning model that predicts dwelling prices and acts as a strategic advisor to identify high-ROI renovation opportunities.
- Web App: Sherlock Homes App
- Tableau Dashboard: London Real Estate Analysis
- Project Presentation: Canva Slides
- Market Value: The London residential market is valued at approximately £2.64 trillion.
- Stability: After a correction in 2023 due to mortgage rate surges, the market has stabilized.
- Current State: As of late 2025 data, the average property price is £652,000, with median prices around £505,000.
The dataset aggregates two primary sources (filtered for data > 2023):
- London House Price Data (Kaggle): Transactional data, rooms, tenure, energy rates.
- London Building Stock Model (London Datastore): Conservation area status, building age bands.
Processing:
- Cleaned ~4.3M raw rows down to ~80,000 high-quality rows for training.
- Feature Engineering: Added neighbourhood median values (£/m²) and size buckets.
- Handling Missing Data: Null room counts replaced by the median of similar properties; outliers (e.g., short leaseholds <18 years) removed.
We experimented with several algorithms to optimize prediction accuracy. LightGBM was selected as the final model due to its superior performance and speed.
| Model | Algorithm |
|
MAPE | Status |
|---|---|---|---|---|
| ML-RF | Random Forest | 79.5% | 25.20% | Tested |
| ML-XG | XGBoost | 88.7% | 16.96% | Tested |
| ML-LGBM | LightGBM | 89.2% | 16.72% | Deployed (Watson) |
Model Specs:
- Target: Logarithmic transformation of Selling Price.
- Encoders: One-Hot (Property Type, Tenure), Label (Energy), Target Encoding (Location).
The Streamlit app (watsonV1_0.py) allows users to input property details and receive:
- Valuation: Estimated market value with a confidence interval (25%-75% variance).
- Sherlock's Strategic Advisor:
- Feasibility Check: Ensures recommended layouts (e.g., adding a bedroom) fit within the floor area.
- Market Reality Check: Verifies that the proposed layout actually exists among the nearest 50 neighbors.
- ROI Logic: Only recommends changes if the predicted value uplift is >10%.
1_Data/: Contains the processed dataset (df_cleaned.csv).2_Notebooks/:Sherlock_Homes-Analysis.ipynb: Exploratory Data Analysis (EDA).LBSM_datacleaning.ipynb: Cleaning external building stock data.Sherlock_Homes - ML-RF...: Random Forest training.Sherlock_Homes - ML-XG...: XGBoost training.Sherlock_Homes - ML-LGBM: Final LightGBM model training.Watson_LGBM.pkl: The pickled model used by the app.
3_App/:watsonV1_0.py: The main application script.requirements.txt: List of dependencies.SH2.png: Assets/Images.
To run the application locally:
-
Clone the repository:
git clone https://github.com/YourUsername/SherlockHomes.git cd SherlockHomes -
Install dependencies: (Note: The requirements file is located inside the
3_Appfolder)pip install -r 3_App/requirements.txt
-
Run the App:
streamlit run 3_App/watsonV1_0.py
António Galvão