Skip to content

rcwang2024/Notes_on_Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python for Data Science, Machine Learning, and Computational Biology

A comprehensive collection of 20 Jupyter notebooks covering Python programming from fundamentals to advanced topics in data science, machine learning, and computational biology.

📚 Complete Curriculum Structure

Part 1: Python Fundamentals (01-05)

Core Python programming concepts essential for scientific computing.

  • 01_Python_Basics_DataTypes.ipynb - Data types, variables, operators, strings, lists, tuples, dictionaries, sets
  • 02_Python_ControlFlow.ipynb - Conditional statements, loops, list comprehensions, generators, exception handling
  • 03_Python_Functions.ipynb - Function definition, parameters, scope, lambda functions, decorators, recursion
  • 04_Python_Classes_OOP.ipynb - Object-oriented programming, inheritance, encapsulation, polymorphism
  • 05_Python_Modules_FileIO.ipynb - Modules, packages, file I/O, directory operations, context managers

Part 2: Scientific Computing Core (06-08)

Essential libraries for numerical computing and data manipulation.

  • 06_NumPy_Fundamentals.ipynb - Arrays, broadcasting, linear algebra, random sampling, vectorization
  • 07_Pandas_Basics.ipynb - Series, DataFrames, data selection, cleaning, basic operations
  • 08_Pandas_Advanced.ipynb - GroupBy, merge/join, pivot tables, time series, multi-index

Part 3: Data Visualization (09)

Creating informative and publication-quality visualizations.

  • 09_Matplotlib_Seaborn_Visualization.ipynb - Matplotlib basics, Seaborn statistical plots, customization

Part 4: Scientific & Statistical Computing (10-11)

Advanced numerical methods and statistical analysis.

  • 10_SciPy_Scientific_Computing.ipynb - Statistics, optimization, interpolation, signal processing
  • 11_Statistical_Analysis.ipynb - Probability distributions, hypothesis testing, confidence intervals

Part 5: Machine Learning (12)

Comprehensive machine learning with scikit-learn.

  • 12_Scikit_Learn_MachineLearning.ipynb - Classification, regression, clustering, model selection, pipelines

Part 6: Specialized Topics (13-14)

Domain-specific applications and tools.

  • 13_Text_Processing_Regex.ipynb - Regular expressions, text cleaning, pattern matching
  • 14_Biopython_Computational_Biology.ipynb - Sequence analysis, file parsing, bioinformatics workflows

Part 7: Professional Python (15-17) ⭐ NEW

Professional development and production-ready code.

  • 15_Python_Standard_Library.ipynb - collections, itertools, functools, datetime, pathlib
  • 16_Advanced_Data_IO.ipynb - Pickle, Parquet, HDF5, SQL databases, REST APIs
  • 17_Testing_Debugging.ipynb - pytest, unittest, pdb debugging, logging, best practices

Part 8: Advanced Topics (18-20) ⭐ NEW

Cutting-edge tools and techniques for modern data science.

  • 18_Deep_Learning_Basics.ipynb - TensorFlow/Keras, neural networks, transfer learning
  • 19_Performance_Parallelization.ipynb - Profiling, multiprocessing, joblib, numba optimization
  • 20_Web_Scraping_APIs.ipynb - requests, BeautifulSoup, API integration, web data collection

🎯 Learning Paths

For Complete Beginners

Start with Part 1 (01-05) to build a solid Python foundation, then progress through Parts 2-4.

For Data Scientists

Focus on Parts 2-4 (06-11) for data manipulation, visualization, and statistics. Add Part 7 (15-17) for professional skills.

For Machine Learning Practitioners

Review Part 2 (06-08) for data preprocessing, Part 5 (12) for classical ML, then Part 8 (18-20) for deep learning and production.

For Computational Biologists

Complete Parts 1-2 (01-08) for fundamentals, then focus on Part 6 (13-14) for specialized bioinformatics tools.

Full Professional Track

Complete all 20 notebooks in order for comprehensive coverage: 01 → 20

💡 Notebook Features

  • Theory + Practice: Each notebook combines conceptual explanations with hands-on examples
  • Progressive Complexity: Topics build from basic to advanced concepts
  • Real-world Examples: Practical workflows and complete analysis pipelines
  • Well-commented Code: Clear explanations using ###++++++++++ section markers
  • Interactive Output: All cells configured to display results
  • Production-Ready: Professional best practices throughout

🚀 Getting Started

Prerequisites

# Create conda environment
conda create -n python_ds python=3.9
conda activate python_ds

# Install core packages
conda install numpy pandas matplotlib seaborn scipy scikit-learn jupyterlab

# Install additional packages
conda install -c conda-forge biopython h5py pyarrow
pip install pytest requests beautifulsoup4 tensorflow joblib numba

Running Notebooks

# Start Jupyter Lab
jupyter lab

# Or Jupyter Notebook
jupyter notebook

📖 Usage Tips

  1. Follow the order: Notebooks are numbered for progressive learning
  2. Run all cells: Execute cells sequentially to understand the flow
  3. Experiment: Modify examples to deepen understanding
  4. Use as reference: Return to notebooks for syntax and method references
  5. Check NOTEBOOK_INDEX.md: Quick reference for all topics

🔧 Package Versions

Notebooks are compatible with:

  • Python 3.8+
  • NumPy 1.20+
  • Pandas 1.3+
  • Matplotlib 3.3+
  • Seaborn 0.11+
  • SciPy 1.7+
  • Scikit-learn 1.0+
  • TensorFlow 2.8+
  • Biopython 1.79+

📊 Curriculum Statistics

Metric Value
Total Notebooks 20
Total Parts 8
Code Cells ~450+
Topics Covered 100+
Estimated Time 50-70 hours

🎓 What You'll Learn

Python Programming

✅ Core syntax, data structures, control flow ✅ Functions, decorators, generators ✅ Object-oriented programming ✅ Standard library mastery

Data Science

✅ NumPy for numerical computing ✅ Pandas for data manipulation ✅ Statistical analysis and hypothesis testing ✅ Data visualization

Machine Learning

✅ Classical ML algorithms (scikit-learn) ✅ Deep learning basics (TensorFlow/Keras) ✅ Model selection and evaluation ✅ Production pipelines

Professional Skills

✅ Testing and debugging ✅ Performance optimization ✅ Database and API integration ✅ Web scraping and data collection

Specialized Topics

✅ Bioinformatics with Biopython ✅ Text processing and NLP basics ✅ Time series analysis ✅ Parallel processing

📝 Documentation

  • README.md (this file) - Complete curriculum overview
  • NOTEBOOK_INDEX.md - Quick reference guide with topic index
  • COVERAGE_ANALYSIS.md - What's covered and why
  • MIGRATION_GUIDE.md - Change history and organization

🤝 Contributing

Feel free to:

  • Report issues or errors
  • Suggest improvements
  • Add new examples
  • Request additional topics

📄 License

These educational materials are provided for learning purposes.

🌟 Highlights

This curriculum covers:

  • Complete Python fundamentals from scratch
  • Industry-standard tools (NumPy, Pandas, scikit-learn, TensorFlow)
  • Professional practices (testing, debugging, optimization)
  • Real-world applications (APIs, databases, web scraping)
  • Specialized domains (bioinformatics, NLP, deep learning)

🎯 Assessment

After completing this curriculum, you will be able to:

  • Write production-quality Python code
  • Perform comprehensive data analysis
  • Build and deploy machine learning models
  • Work with various data sources (SQL, APIs, files)
  • Optimize code for performance
  • Debug and test professional applications
  • Apply Python to specialized domains

Current Version: 2.0 (20 notebooks - Complete Professional Edition)

Last Updated: 2025-10-06

Status: ✅ Complete comprehensive curriculum


Happy Learning! 🎓

For questions or feedback, please open an issue in this repository.

About

Project/Learning materials by Ruichao Wang

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors