Skip to content

ADVAIT135/IBM-DATA-SCIENCE

Repository files navigation

IBM-DATA-SCIENCE

Certificate

A collection of notebooks, projects, examples, and resources for learning and applying data science concepts inspired by IBM's Data Science coursework and practical exercises. This repository is organized to make it easy to follow hands-on tutorials, reproduce experiments, and build small end-to-end projects.

Owner: ADVAIT135


Table of Contents


About

This repository is intended to host educational and practical content for data science workflows including:

  • Exploratory data analysis (EDA)
  • Data cleaning and feature engineering
  • Statistical modeling and machine learning (scikit-learn, XGBoost, etc.)
  • Model evaluation and visualization
  • Short projects and capstone-style examples
  • Jupyter notebooks demonstrating concepts step-by-step

Use this repo to learn, experiment, and as a starting point for small data science projects.


Repository structure

A typical layout (adapt to actual contents in this repo):

  • notebooks/ — Jupyter notebooks (.ipynb) for lessons, experiments, and demos
  • data/ — small sample datasets used by notebooks (not for large data)
  • src/ — reusable Python modules and helper scripts
  • reports/ — generated reports, figures, and export artifacts
  • requirements.txt — Python package requirements for pip installs
  • environment.yml — conda environment specification (optional)
  • README.md — this file
  • LICENSE — license for the repository (if present)
  • tests/ — unit / integration tests (optional)

If some of these files or folders are missing, create them as needed or update this README accordingly.


Getting started

Prerequisites

  • Python 3.8+ (recommended)
  • Git
  • Optional: Anaconda/Miniconda if you prefer conda environments
  • JupyterLab or Jupyter Notebook for interactive work

Install (pip)

  1. Clone the repo

    git clone https://github.com/ADVAIT135/IBM-DATA-SCIENCE.git
    cd IBM-DATA-SCIENCE
    
  2. Create a virtual environment and install dependencies

    python -m venv .venv
    source .venv/bin/activate   # macOS / Linux
    .venv\Scripts\activate      # Windows (PowerShell: .\.venv\Scripts\Activate.ps1)
    pip install --upgrade pip
    pip install -r requirements.txt
    
  3. Start Jupyter

    jupyter lab
    # or
    jupyter notebook
    

Install (conda)

If there is an environment.yml:

conda env create -f environment.yml
conda activate ibm-data-science
jupyter lab

Usage

Running notebooks

  • Open JupyterLab or Jupyter Notebook and navigate to notebooks/.
  • Execute cells top-to-bottom to reproduce analyses.
  • If a notebook requires data in data/, ensure files are present (see Data section).

Automated execution (headless):

# execute a notebook and write output notebook
jupyter nbconvert --to notebook --execute notebooks/example.ipynb --output notebooks/example-executed.ipynb

Running scripts & tests

  • Python scripts (utility modules) live in src/. Run with:

    python src/some_script.py
    
  • If tests exist (pytest):

    pip install -r requirements-dev.txt   # if provided
    pytest -q
    

Reproducing results

  • Pin dependency versions in requirements.txt for reproducibility.
  • Where randomness affects results, set random seeds inside notebooks/scripts (e.g., np.random.seed(42), random.seed(42), and framework-specific seeds).

Notebooks & Projects

Each notebook should include at minimum:

  • Problem statement / objective
  • Data source and short description
  • Code cells split into logical steps (load, clean, explore, model, evaluate)
  • Clear visualizations and conclusions
  • Dependencies listed in the notebook metadata or a corresponding cell

Suggested naming convention:

  • notebooks/XX-brief-title.ipynb where XX is a two-digit ordering number, e.g. 01-exploratory-data-analysis.ipynb

Data

Small sample datasets can be placed in data/. For large datasets, prefer external links and provide instructions to download. Never commit large or sensitive datasets to the repository.

Example:

  • data/sample.csv — small anonymized sample used for demos
  • For external datasets, include a DATA_SOURCES.md (or a section in this README) listing download links and any required preprocessing steps.

Contributing

Contributions are welcome. Suggested workflow:

  1. Fork the repository.
  2. Create a feature branch: git checkout -b feature/my-change
  3. Make changes, add tests if relevant.
  4. Commit and push: git push origin feature/my-change
  5. Open a pull request explaining the change.

Please follow a consistent style (PEP8 for Python). Add or update documentation and notebooks as needed. If you add new dependencies, update requirements.txt or environment.yml.

Consider adding:

  • CONTRIBUTING.md
  • CODE_OF_CONDUCT.md

License

This project is provided under the terms of the GPL-3.0 license. See the LICENSE file for details.

Contact

Repository owner: ADVAIT135

If you find issues, please open an issue in this repository. For questions or suggestions, you can also open a discussion or reach out via GitHub.


Thank you for using the IBM-DATA-SCIENCE repository — happy data exploring and modeling!

About

This repository consists of all the assignments that were a part of IBM's certification named IBM Data Science that is offered and hosted on Coursera

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors