Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Deep Learning Crash Course

Early Access - Use Code PREORDER for 25% Off
by Benjamin Midtvedt, Jesús Pineda, Henrik Klein Moberg, Harshith Bachimanchi, Joana B. Pereira, Carlo Manzo, Giovanni Volpe
No Starch Press, San Francisco (CA), 2025
ISBN-13: 9781718503922
https://nostarch.com/deep-learning-crash-course


  1. Dense Neural Networks for Classification

  2. Dense Neural Networks for Regression

  3. Convolutional Neural Networks for Image Analysis

  4. Encoders–Decoders for Latent Space Manipulation

  5. U-Nets for Image Transformation

  6. Self-Supervised Learning to Exploit Symmetries

  7. Recurrent Neural Networks for Timeseries Analysis

  8. Attention and Transformers for Sequence Processing
    Introduces attention mechanisms, transformer models, and vision transformers (ViT) for natural language processing (NLP) including improved text translation and sentiment analysis, and image classification.

  • Code 8-1: Understanding Attention
    Builds attention modules from scratch (dot-product, trainable dot-product, additive) and applies them to toy examples, visualizing attention maps that show which tokens focus on which other tokens. It illustrates how pre-trained embeddings (GloVe) can highlight semantic relationships (like she-her) even without fine-tuning. It clarifies the difference between non-learnable and learnable key/value embeddings.

  • Code 8-A: Translating with Attention
    Improves the seq2seq model from Chapter 7 with dot-product cross-attention to focus on the most relevant parts of source sentences during translation. It demonstrates how attention helps align multi-word phrases and resolves ambiguities. The model surpasses the earlier RNN-based approach by dynamically highlighting crucial source tokens, proving "attention is all you need" for better translations.

  • Code 8-B: Performing Sentiment Analysis with a Transformer
    Implements an encoder-only Transformer using multi-head self-attention and feedforward layers to classify the sentiment of IMDB reviews as positive or negative. Details the entire pipeline: tokenizing, building a vocabulary, batching sequences with masks, stacking multiple Transformer blocks, and adding a dense top for binary classification. The approach yields strong sentiment prediction accuracy, highlighting the parallel processing benefits of Transformers over RNN-based models.

  • Code 8-C: Classifying Images with a Vision Transformer
    Shows how Transformers can replace convolutions for image tasks by splitting images into patch embeddings. The ViT model is trained from scratch on CIFAR-10, using CutMix to address its weaker inductive biases compared to CNNs. Achieves notable performance when fine-tuned and especially excels if using a pretrained backbone. This underscores ViT’s flexibility and potential to rival or outperform CNNs on visual data.

  1. Generative Adversarial Networks for Image Synthesis

  2. Diffusion Models for Data Representation and Exploration

  3. Graph Neural Networks for Relational Data Analysis

  4. Active Learning for Continuous Learning

  5. Reinforcement Learning for Strategy Optimization

  6. Reservoir Computing for Predicting Chaos