CNN

Convolutional Neural Network (CNN)

A Convolutional Neural Network (CNN) analyzes images by learning filters that detect features like edges and textures. These filters build increasingly complex patterns layer by layer, enabling CNNs to recognize objects and understand spatial relationships, making them ideal for image-related tasks.

Figure 1: A Convolutional Neural Network

$\hspace{10pt}$ Why would you prefer a CNN over a Fully Connected Neural Network?

A CNN is preferable over a Fully Connected Neural Network when working with image data or other types of spatially structured data. This preference stems from several key advantages:

Parameter Efficiency: CNNs significantly reduce the number of parameters by sharing weights across spatial dimensions, making them more efficient and less prone to overfitting, especially in large-scale image processing tasks.

Feature Learning: CNNs automatically learn hierarchical feature representations from raw data, starting from low-level features like edges to high-level features like objects.

Translational Invariance: Through techniques like pooling and convolution, CNNs can recognize patterns regardless of their position in the input, making them highly effective for tasks like object detection and image classification.

Key Concepts

1. Filter

A filter, or kernel, is a small matrix of weights in a CNN that detects specific characteristics within the input data, such as edges, textures, or patterns. During the convolution operation, the filter slides across the input (e.g., an image), performing element-wise multiplication and summing the results. Each filter is designed to capture unique features from the image. Figure 3 illustrates how six different filters extract distinct characteristics from the same image.

When an image is passed through a convolutional layer, the spatial dimensions of the output are determined using the following:

$$O = ⌊(I - K + 2P) / S⌋ + 1$$

I: Input size (height or width)
K: Kernel (filter) size
P: Padding (number of pixels added to each side of the input)
S: Stride (step size of the kernel)

Figure 2: CNN Filter

Figure 3: CNN Filters Producing Different Outputs

2. Stride

Stride refers to the step size with which the convolution filter moves across the input data. A stride of 1 means the filter moves one pixel at a time, while a stride of 2 or more means the filter skips pixels as it moves. Larger strides reduce the spatial dimensions of the output but may result in a loss of information.

Figure 4: Convolutional Stride

3. Padding

Padding involves adding extra pixels around the input data to control the spatial dimensions of the output. This is often done to preserve the original size of the input after convolution.

Figure 5: Padding the Input Data Before Applying Filters

4. Pooling

Pooling is a downsampling operation that reduces the spatial dimensions of the input volume, thereby decreasing the computational load and helping to achieve spatial invariance.

Figure 6: Pooling Layers

Name		Name	Last commit message	Last commit date
parent directory ..
img		img
MNIST.ipynb		MNIST.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Convolutional Neural Network (CNN)

$\hspace{10pt}$ Why would you prefer a CNN over a Fully Connected Neural Network?

Key Concepts

1. Filter

2. Stride

3. Padding

4. Pooling

FilesExpand file tree

CNN

Directory actions

More options

Directory actions

More options

Latest commit

History

CNN

Folders and files

parent directory

README.md

Convolutional Neural Network (CNN)

$\hspace{10pt}$ Why would you prefer a CNN over a Fully Connected Neural Network?

Key Concepts

1. Filter

2. Stride

3. Padding

4. Pooling