Summary of the concepts of Filters, Strides and Padding

Admin
Jan 13, 2023
6 min read

By Dr Mabrouka Abuhmida

In machine learning, a filter is a small matrix of weights used to transform the input data by applying a dot product with the input and the filter and adding a bias term. Filters are typically used in convolutional neural networks (CNNs), a type of neural network architecture particularly well-suited for image classification tasks.

The stride is the number of pixels the filter moves when applied to the input data. For example, if the input data is a matrix of size 28x28 and the filter is 3x3, a stride of 1 means that the filter will be applied to every pixel in the input matrix, resulting in an output matrix that is 26x26. A stride of 2 means that the filter will skip every other pixel when applied, resulting in an output matrix of 13x13.

Here is an example of how to compute the output size of a convolutional layer given the input size, filter size, and stride:

Input size: H x W

Filter size: F

Stride: S

Output size: (H – F + 1) / S x (W – F + 1) / S

For example, if the input size is 28x28, the filter size is 3x3, and the stride is 1, the output size would be (28 – 3 + 1) / 1 x (28 – 3 + 1) / 1 = 26x26.

Here is a link to the code on GitHub

Convolutional layers with filters and strides are a key component of convolutional neural networks (CNNs), a type of neural network architecture particularly well-suited for image classification tasks. Filters allow a CNN to automatically learn spatial hierarchies of features from the input data, which can be very useful for tasks such as object recognition in images.

One of the main benefits of using filters and strides in a CNN is that they allow the network to learn local patterns in the data while reducing the input size. This can be particularly useful for tasks where the input data is large and high-dimensional, as it can help reduce the amount of computation required and improve the model's efficiency.

Filters and strides are best used in tasks where the input data has some form of spatial structure, such as images or audio signals. They are particularly useful for tasks where the local patterns in the data are important for making predictions, as the filters can learn to identify these patterns and use them to classify the input data.

Spatial structure refers to how the elements of a data set are arranged in space. In machine learning, spatial structure is often present in data sets with some form of two-dimensional or three-dimensional structure, such as images, audio signals, and videos.

For example, in an image, the pixels are arranged in a two-dimensional grid, and the relationships between the pixels can be used to extract useful features for image classification. Similarly, in an audio signal, the samples are arranged in a one-dimensional sequence. The relationships between the samples can be used to extract features for tasks such as speech recognition.

Other examples of data sets with spatial structure include maps, satellite imagery, and medical images such as MRI scans. In these cases, the data is arranged in a three-dimensional space, and the relationships between the elements can extract useful features for tasks such as image segmentation and object recognition.

Padding is a technique used in convolutional neural networks (CNNs) to control the size of the output produced by a convolutional layer. It involves adding extra pixels, or padding, around the edges of the input data before applying the filters.

There are two main types of padding: Valid padding and same padding. Valid padding means that no padding is added to the input data, so the output size is smaller than the input size. The same padding means that the padding is added so that the output size is the same as the input size.

The amount of padding to add can be calculated using the following formula:

Padding = ((Output size – 1) x Stride) + Filter size – Input size

For example, if the input size is 28x28, the filter size is 3x3, the stride is 1, and we want to use the Same padding to produce an output of the same size, we can calculate the amount of padding as follows:

Padding = ((28 – 1) x 1) + 3 – 28 = 2

This means we need to add 2 pixels of padding on each side of the input data, resulting in an input size of 32x32. The filter can then be applied to the padded input data, resulting in an output of 28x28.

Padding is often used in CNNs to preserve the spatial dimensions of the input data, as it allows the filters to be applied to the edges of the input data without reducing the output size. It can also be used to control the output size to match the dimensions of other layers in the network or to ensure that the output has the desired size for a particular task.

Here is an example of how you can use padding and a stride of 1 with a 3x3 filter in Python:

import numpy as np

# Input data with shape (1, 28, 28)
input_data = np.random.rand(1, 28, 28)

# 3x3 filter
filter_data = np.random.rand(3, 3)

# Zero padding of size 1 on each side
padding = ((1 - 1) * 1) + 3 - 28 = 1
padded_input = np.pad(input_data, ((0,0), (1,1), (1,1)), 'constant')

# Output with shape (1, 28, 28)
output = np.zeros((1, 28, 28))

# Loop over the padded input data and apply the filter
for i inrange(28):
for j inrange(28):
        output[:, i, j] = np.sum(padded_input[:, i:i+3, j:j+3] * filter_data)

print(output)

This code first creates a 3x3 filter and an input array of size (1, 28, 28). It then adds a padding of size 1 on each side of the input data using the np.pad function. The output array is initialized to zeros, and a loop is used to apply the filter to each position in the padded input data, resulting in an output of size (1, 28, 28).

You can adjust the stride and filter size as needed by changing the values of the stride and filter_data variables, and you can control the amount of padding by adjusting the padding size in the np.pad function.

Padding is also implemented in the well-known APIs such as Keras; Here is an example of how you can use padding and a stride of 1 with a 3x3 filter in Keras, a popular deep-learning library for Python:

import tensorflow as tf
import numpy as np

# Input data with shape (1, 28, 28, 1)
input_data = tf.keras.layers.Input(shape=(28, 28, 1))

# 3x3 filter
x = tf.keras.layers.Conv2D(filters=1, kernel_size=(3, 3), strides=1, padding='same')(input_data)

# Create the model
model = tf.keras.models.Model(inputs=input_data, outputs=x)

# Generate some random input data
input_data = np.random.rand(1, 28, 28, 1)

# Apply the model to the input data
output = model(input_data)

print(output)

This code defines an input layer with shape (1, 28, 28, 1), a convolutional layer with a 3x3 filter, and a stride of 1. The padding=’same’ argument specifies that Same padding should be used so that the output will have the same size as the input. The model is then applied to random input data, and the output is printed to the console.

You can adjust the stride and filter size as needed by changing the values of the stride and kernel_size arguments, and you can control the amount of padding by adjusting the value of the padding argument.

In Keras, the padding argument controls the amount of padding added to the input data before applying the filters in a convolutional layer. The padding argument has two main options: ‘valid’ and ‘same’.

‘valid’ padding means that no padding is added to the input data, so the output size is smaller than the input size. This can be useful if you want to reduce the output size or if you are trying to reduce the complexity of the model.

‘same’ padding means that the padding is added so that the output size is the same as the input size. This can be useful if you want to preserve the spatial dimensions of the input data, as it allows the filters to be applied to the edges of the input data without reducing the size of the output.

For example, if you have input data of size (1, 28, 28, 1) and a 3x3 filter with a stride of 1, using ‘valid’ padding would result in an output of size (1, 26, 26, 1), while using ‘same’ padding would result in an output of size (1, 28, 28, 1).

Here is an example of how to use ‘valid’ padding in Keras:

# Input data with shape (1, 28, 28, 1)
input_data = tf.keras.layers.Input(shape=(28, 28, 1))

# 3x3 filter with valid padding
x = tf.keras.layers.Conv2D(filters=1, kernel_size=(3, 3), strides=1, padding='valid')(input_data)

And here is an example of how to use ‘same’ padding:


# Input data with shape (1, 28, 28, 1)
input_data = tf.keras.layers.Input(shape=(28, 28, 1))

# 3x3 filter with same padding
x = tf.keras.layers.Conv2D(filters=1, kernel_size=(3, 3), strides=1, padding='same')(input_data)

Dr. Mabrouka Abuhmida

Educator

Summary of the concepts of Filters, Strides and Padding

Recent Posts

Comments