Разница между «ядром» и «фильтром» в CNN

37

В чем разница между терминами «ядро» и «фильтр» в контексте сверточных нейронных сетей?

ryguy
источник
5
Они одинаковые! Фильтр или ядро ​​- это простая группа весов, разделяемая по всему входному пространству.
yasin.yazici

Ответы:

42

In the context of convolutional neural networks, kernel = filter = feature detector.


Here is a great illustration from Stanford's deep learning tutorial (also nicely explained by Denny Britz).

enter image description here

The filter is the yellow sliding window, and its value is:

[101010101]
Franck Dernoncourt
источник
2
Is there a specific name for the filter? Like a "Blur" or an "Edge detect"
Pratik Khadloya
2
@PratikKhadloya no, those names are given subjectively by researchers based on what the filter appears to be detecting.
shadowtalker
1
@Frank Dernoncourt according to this link towardsdatascience.com/… they are different when we have a couple of channels. actually when there are multiple channel Each filter actually happens to be a collection of kernels, with there being one kernel for every single input channel to the layer, and each kernel being unique. I would be happy to have your idea
sariii
6

A feature map is the same as a filter or "kernel" in this particular context. The weights of the filter determine what specific features are detected.

So for example, Franck has provided a great visual. Notice that his filter/feature-detector has x1 along the diagonal elements and x0 along all the other elements. This kernel weighting would thus detect pixels in the image that have a value of 1 along the image's diagonals.

Observe that the resulting convolved feature shows values of 4 wherever the image has a "1" along the diagonal values of the 3x3 filter (thus detecting the filter in that specific 3x3 section of the image), and lower values of 2 in the areas of the image where that filter didn't match as strongly.

JohnnyUtah
источник
5

How about we use the term "kernel" for a 2D array of weights, and the term "filter" for the 3D structure of multiple kernels stacked together? The dimension of a filter is k×k×C (assuming square kernels). Each one of the C kernels that compose a filter will be convolved with one of the C channels of the input (input dimensions Hin×Hin×C, for example a 32×32 RGB image). It makes sense to use a different word to describe a 2D array of weights and a different for the 3D structure of the weights, since the multiplication happens between 2D arrays and then the results are summed to calculate the 3D operation.

Currently there is a problem with the nomenclature in this field. There are many terms describing the same thing and even terms used interchangeably for different concepts! Take as an example the terminology used to describe the output of a convolution layer: feature maps, channels, activations, tensors, planes, etc...

Based on wikipedia, "In image processing, a kernel, is a small matrix".

Based on wikipedia, "A matrix is a rectangular array arranged in rows and columns".

If a kernel is a rectangular array, then it cannot be the 3D structure of the weights, which in general is of k1×k2×C dimensions.

Well, I cant argue that this is the best terminology, but it is better than just use the terms "kernel" and "filter" interchangeably. Moreover, we do need a word to describe the concept of the distinct 2D arrays that form a filter.

Jim
источник
1

The existing answers are excellent and comprehensively answer the question. Just want to add that filters in Convolutional networks are shared across the entire image (i.e., the input is convolved with the filter, as visualized in Franck's answer). The receptive field of a particular neuron are all input units that affect the neuron in question. The receptive field of a neuron in a Convolutional network is generally smaller than the receptive field of a neuron in a Dense network courtesy of shared filters(also called parameter sharing).

Parameter sharing confers a certain benefit on CNNs, namely a property termed equivariance to translation. This is to say that if the input is perturbed or translated, the output is also modified in the same manner. Ian Goodfellow provides a great example in the Deep Learning Book regarding how practitioners can capitalize on equivariance in CNNs:

When processing time-series data, this means that convolution produces a sort of timeline that shows when different features appear in the input.If we move an event later in time in the input, the exact same representation of it will appear in the output, just later. Similarly with images, convolution creates a 2-D map of where certain features appear in the input. If we move the object in the input, its representation will move the same amount in the output. This is useful for when we know that some function of a small number of neighboring pixels is useful when applied to multiple input locations. For example, when processing images, it is useful to detect edges in the first layer of a convolutional network. The same edges appear more or less everywhere in the image, so it is practical to share parameters across the entire image.

Pranav Vempati
источник