A collection of notebooks, projects, examples, and resources for learning and applying data science concepts inspired by IBM's Data Science coursework and practical exercises. This repository is organized to make it easy to follow hands-on tutorials, reproduce experiments, and build small end-to-end projects.
Owner: ADVAIT135
- About
- Repository structure
- Getting started
- Usage
- Notebooks & Projects
- Data
- Contributing
- License
- Contact
This repository is intended to host educational and practical content for data science workflows including:
- Exploratory data analysis (EDA)
- Data cleaning and feature engineering
- Statistical modeling and machine learning (scikit-learn, XGBoost, etc.)
- Model evaluation and visualization
- Short projects and capstone-style examples
- Jupyter notebooks demonstrating concepts step-by-step
Use this repo to learn, experiment, and as a starting point for small data science projects.
A typical layout (adapt to actual contents in this repo):
notebooks/— Jupyter notebooks (.ipynb) for lessons, experiments, and demosdata/— small sample datasets used by notebooks (not for large data)src/— reusable Python modules and helper scriptsreports/— generated reports, figures, and export artifactsrequirements.txt— Python package requirements for pip installsenvironment.yml— conda environment specification (optional)README.md— this fileLICENSE— license for the repository (if present)tests/— unit / integration tests (optional)
If some of these files or folders are missing, create them as needed or update this README accordingly.
- Python 3.8+ (recommended)
- Git
- Optional: Anaconda/Miniconda if you prefer conda environments
- JupyterLab or Jupyter Notebook for interactive work
-
Clone the repo
git clone https://github.com/ADVAIT135/IBM-DATA-SCIENCE.git cd IBM-DATA-SCIENCE -
Create a virtual environment and install dependencies
python -m venv .venv source .venv/bin/activate # macOS / Linux .venv\Scripts\activate # Windows (PowerShell: .\.venv\Scripts\Activate.ps1) pip install --upgrade pip pip install -r requirements.txt -
Start Jupyter
jupyter lab # or jupyter notebook
If there is an environment.yml:
conda env create -f environment.yml
conda activate ibm-data-science
jupyter lab
- Open JupyterLab or Jupyter Notebook and navigate to
notebooks/. - Execute cells top-to-bottom to reproduce analyses.
- If a notebook requires data in
data/, ensure files are present (see Data section).
Automated execution (headless):
# execute a notebook and write output notebook
jupyter nbconvert --to notebook --execute notebooks/example.ipynb --output notebooks/example-executed.ipynb
-
Python scripts (utility modules) live in
src/. Run with:python src/some_script.py -
If tests exist (pytest):
pip install -r requirements-dev.txt # if provided pytest -q
- Pin dependency versions in
requirements.txtfor reproducibility. - Where randomness affects results, set random seeds inside notebooks/scripts (e.g.,
np.random.seed(42),random.seed(42), and framework-specific seeds).
Each notebook should include at minimum:
- Problem statement / objective
- Data source and short description
- Code cells split into logical steps (load, clean, explore, model, evaluate)
- Clear visualizations and conclusions
- Dependencies listed in the notebook metadata or a corresponding cell
Suggested naming convention:
notebooks/XX-brief-title.ipynbwhereXXis a two-digit ordering number, e.g.01-exploratory-data-analysis.ipynb
Small sample datasets can be placed in data/. For large datasets, prefer external links and provide instructions to download. Never commit large or sensitive datasets to the repository.
Example:
data/sample.csv— small anonymized sample used for demos- For external datasets, include a
DATA_SOURCES.md(or a section in this README) listing download links and any required preprocessing steps.
Contributions are welcome. Suggested workflow:
- Fork the repository.
- Create a feature branch:
git checkout -b feature/my-change - Make changes, add tests if relevant.
- Commit and push:
git push origin feature/my-change - Open a pull request explaining the change.
Please follow a consistent style (PEP8 for Python). Add or update documentation and notebooks as needed. If you add new dependencies, update requirements.txt or environment.yml.
Consider adding:
CONTRIBUTING.mdCODE_OF_CONDUCT.md
This project is provided under the terms of the GPL-3.0 license. See the LICENSE file for details.
Repository owner: ADVAIT135
If you find issues, please open an issue in this repository. For questions or suggestions, you can also open a discussion or reach out via GitHub.
Thank you for using the IBM-DATA-SCIENCE repository — happy data exploring and modeling!