Chevron Chefs

Pair Plots
Correlation Heat Map
TaiPy

Inspiration

In the chaotic realm of a hackathon, four fearless data scientists boldly established their base camp in the heart of their coding battleground—the 3rd-floor Duncan Hall kitchenette. We became accustomed to harmony between bytes and bites. With laptops open, we worked on our projects into the late hours of the night, deciding to name ourselves the Chevron Chefs.

What it does

The primary objective of the project is to develop models to predict peak oil rate given oil well data. In a real-world setting, this capability would be crucial for asset development teams. It enables them to make informed decisions before wells become operational.

How we built it

Visualize Given Data: Loaded dataset using Pandas in Python. Used Seaborn to explore the data through visualizations. Checked for patterns, correlations, and outliers.
Create Baseline Models: Split our dataset into training and testing sets. Build simple baseline models without feature engineering to understand the initial performance.
Feature engineering: Extracted key features and filled in empty columns while maintaining a large sample of data to work with.
Imputation: Implemented a Linear Regression model alongside Imputation to fill missing values.
Random Forest: Implemented a Random Forest model using scikit-learn to train the model on the training data using 25% of the available data. Evaluated the model's performance on the testing data using appropriate metrics.

Challenges we ran into

A significant challenge we encountered was the presence of large amounts of null values in the specific columns: average_stage_length, average_proppant_per_stage, average_frac_fluid_per_stage, and number_of_stages. These null values amounted to 90% of the entire sample size. This data gap poses obstacles to modeling, analysis, and accuracy. In our case, it appears that after dropping these columns and other non significant ones, and utilizing imputation to replace the rest of the NaNs, we achieved better results.

Accomplishments that we're proud of

We are proud that our project can be useful in real-world setting. We also take pride in the fact that we overcame our challenges in innovative ways.

What we learned

From our mentors, we gained a lot of practical knowledge about oil production and rigs. We also learned about different methods to deal with NaN values besides dropping those rows. As we progressed through the track, we realized that a lower RMSE does not always mean the model is "more accurate" than one with a higher RMSE. The low RMSE can be from overfitting, which we actively avoided in this project, seen in all of our different techniques.

What's next for Chevron Chefs

See y'all next year!

Built With

github
jupyter-notebook
numpy
pandas
python
r-studio
sci-kit-learn
seaborn
starknet
taipy
vs-code
xgboost

Submitted to

Rice Datathon 2024
- Winner Best Use of Starknet

Created by

I worked on visualizations, cleaning data, application knowledge, drawing conclusions, and the presentation.

Zahra Bukhari
Rohan Chaudhary
Hey there I am a second year Computer Science and Mathematics Student at the University of Houston.
Private user
Kishan Yerneni