Qual è la differenza tra Conv1D e Conv2D?

19

Stavo esaminando i documenti di convoluzione di keras e ho trovato due tipi di convulsioni Conv1D e Conv2D. Ho fatto qualche ricerca sul web e questo è ciò che capisco di Conv1D e Conv2D; Conv1D viene utilizzato per le sequenze e Conv2D utilizza per le immagini.

Ho sempre pensato che le reti neruali di convoluzione fossero usate solo per immagini e visualizzare la CNN in questo modo

Un'immagine viene considerata come una matrice di grandi dimensioni, quindi un filtro scorrerà su questa matrice e calcolerà il prodotto punto. Questo credo che ciò che Keras menziona come Conv2D. Se Conv2D funziona in questo modo, qual è il meccanismo di Conv1D e come possiamo immaginarne il meccanismo?

— Eka
fonte

2

Dai un'occhiata a questa risposta . Spero che sia di aiuto.

— discente101

4

La convoluzione è un'operazione matematica in cui "riassumi" un tensore o una matrice o un vettore in uno più piccolo. Se la matrice di ingresso è unidimensionale allora riassumere lungo che sulle dimensioni, e se un tensore ha n dimensioni allora si potrebbe riassumere lungo tutte le dimensioni n. Conv1D e Conv2D riassumono (convolvono) lungo una o due dimensioni.

b_{i} = \sum_{j = m - 1}^{0} a_{i + j} * w_{j}

$b_i=\sum_{j=m-1}^0 a_{i+j}*w_j$

i = [1, n - m + 1]

$i=[1,n-m+1]$

$w_i=1/n$

[\begin{matrix} a : & a_{1} & a_{2} & a_{3} \\ w : & 1 / 2 & 1 / 2 \\ w : & 1 / 2 & 1 / 2 \end{matrix}] = [\begin{matrix} b : & \frac{a_{1} + a_{2}}{2} & \frac{a_{2} + a_{3}}{2} \end{matrix}]

$\begin{bmatrix} a:&a_1 & a_2 & a_3\\ w:&1/2 & 1/2&\\ w:&&1/2 & 1/2\\ \end{bmatrix}=\begin{bmatrix} b:&\frac{a_1+a_2} 2 & \frac{a_2+a_3} 2 \end{bmatrix}$

b_{i k l} = \sum_{j_{1} = m_{1} - 1 j_{2} = m_{2} - 1 j_{3} = m_{4} - 1}^{0} a_{i + j_{1}, k + j_{2}, l + j_{3}} * w_{j_{1} j_{2} j_{3}}

$b_{ikl}=\sum_{j_1=m_1-1\\j_2=m_2-1\\j_3=m_4-1}^{0} a_{i+j_1,k+j_2,l+j_3}*w_{j_1j_2j_3}$

i = [1, n_{1} - m_{1} + 1], k = [1, n_{2} - m_{2} + 1], l = [1, n_{3} - m_{3} + 1]

$i=[1,n_1-m_1+1],k=[1,n_2-m_2+1],l=[1,n_3-m_3+1]$

— Aksakal
fonte

3

Questa convoluzione 1d fa risparmiare sui costi, funziona allo stesso modo ma assume un array a 1 dimensione che crea una moltiplicazione con gli elementi. Se vuoi visualizzare pensare a una matrice di righe o colonne, ovvero una singola dimensione quando moltiplichiamo, otteniamo una matrice della stessa forma ma di valori più bassi o più alti, quindi aiuta a massimizzare o minimizzare l'intensità dei valori.

Questa immagine potrebbe aiutarti,

Per i dettagli, consultare https://www.youtube.com/watch?v=qVP574skyuM

— Reeves
fonte

1

Userò una prospettiva Pytorch, tuttavia la logica rimane la stessa.

Quando si utilizza Conv1d (), dobbiamo tenere presente che molto probabilmente lavoreremo con input bidimensionali come sequenze di DNA con codifica a caldo o immagini in bianco e nero.

L'unica differenza tra i più convenzionali Conv2d () e Conv1d () è che quest'ultimo utilizza un kernel monodimensionale come mostrato nella figura seguente.

Qui, l'altezza dei dati di input diventa la "profondità" (o in_channels) e le nostre righe diventano le dimensioni del kernel. Per esempio,

import torch
import torch.nn as nn

tensor = torch.randn(1,100,4)
output = nn.Conv1d(in_channels =100,out_channels=1,kernel_size=1,stride=1)(tensor)
#output.shape == [1,1,4]

Possiamo vedere che il kernel si estende automaticamente all'altezza dell'immagine (proprio come in Conv2d () la profondità del kernel si estende automaticamente sui canali dell'immagine) e quindi tutto ciò che ci resta da dare è la dimensione del kernel rispetto all'intervallo di le file.

Dobbiamo solo ricordare che se stiamo assumendo un input bidimensionale, i nostri filtri diventano le nostre colonne e le nostre righe diventano le dimensioni del kernel.

— Erick Platero
fonte

L'immagine è stata presa da questa domanda precedente: stackoverflow.com/questions/48859378/…

— Erick Platero

0

Vorrei spiegare la differenza visivamente e in dettaglio (commenti in codice) e con un approccio molto semplice.

Per prima cosa controlliamo Conv2D in TensorFlow .

c1 = [[0, 0, 1, 0, 2], [1, 0, 2, 0, 1], [1, 0, 2, 2, 0], [2, 0, 0, 2, 0], [2, 1, 2, 2, 0]]
c2 = [[2, 1, 2, 1, 1], [2, 1, 2, 0, 1], [0, 2, 1, 0, 1], [1, 2, 2, 2, 2], [0, 1, 2, 0, 1]]
c3 = [[2, 1, 1, 2, 0], [1, 0, 0, 1, 0], [0, 1, 0, 0, 0], [1, 0, 2, 1, 0], [2, 2, 1, 1, 1]]
data = tf.transpose(tf.constant([[c1, c2, c3]], dtype=tf.float32), (0, 2, 3, 1))
# we transfer [batch, in_channels, in_height, in_width] to [batch, in_height, in_width, in_channels]
# where batch = 1, in_channels = 3 (c1, c2, c3 or the x[:, :, 0], x[:, :, 1], x[:, :, 2] in the gif), in_height and in_width are all 5(the sizes of the blue matrices without padding) 
f2c1 = [[0, 1, -1], [0, -1, 0], [0, -1, 1]]
f2c2 = [[-1, 0, 0], [1, -1, 0], [1, -1, 0]]
f2c3 = [[-1, 1, -1], [0, -1, -1], [1, 0, 0]]
filters = tf.transpose(tf.constant([[f2c1, f2c2, f2c3]], dtype=tf.float32), (2, 3, 1, 0))
# we transfer the [out_channels, in_channels, filter_height, filter_width] to [filter_height, filter_width, in_channels, out_channels]
# out_channels is 1(in the gif it is 2 since here we only use one filter W1), in_channels is 3 because data has three channels(c1, c2, c3), filter_height and filter_width are all 3(the sizes of the filter W1)
# f2c1, f2c2, f2c3 are the w1[:, :, 0], w1[:, :, 1] and w1[:, :, 2] in the gif
output = tf.squeeze(tf.nn.conv2d(data, filters, strides=2, padding=[[0, 0], [1, 1], [1, 1], [0, 0]]))
# this is just the o[:,:,1] in the gif
# <tf.Tensor: id=93, shape=(3, 3), dtype=float32, numpy=
# array([[-8., -8., -3.],
#        [-3.,  1.,  0.],
#        [-3., -8., -5.]], dtype=float32)>

E Conv1D è un caso speciale di Conv2D come affermato in questo paragrafo dal documento TensorFlow di Conv1D .

Internamente, questa operazione rimodella i tensori di input e invoca tf.nn.conv2d. Ad esempio, se data_format non inizia con "NC", un tensore di forma [batch, in_width, in_channels] viene rimodellato in [batch, 1, in_width, in_channels] e il filtro viene rimodellato in [1, filter_width, in_channels, out_channels]. Il risultato viene quindi rimodellato in [batch, out_width, out_channels] (dove out_width è una funzione del passo e del riempimento come in conv2d) e restituito al chiamante.

Vediamo come possiamo trasferire Conv1D anche un problema Conv2D. Poiché Conv1D viene solitamente utilizzato negli scenari NLP, possiamo illustrarlo nel seguente problema NLP.

cat = [0.7, 0.4, 0.5]
sitting = [0.2, -0.1, 0.1]
there = [-0.5, 0.4, 0.1]
dog = [0.6, 0.3, 0.5]
resting = [0.3, -0.1, 0.2]
here = [-0.5, 0.4, 0.1]
sentence = tf.constant([[cat, sitting, there, dog, resting, here]]
# sentence[:,:,0] is equivalent to x[:,:,0] or c1 in the first example and the same for sentence[:,:,1] and sentence[:,:,2]
data = tf.reshape(sentence), (1, 1, 6, 3))
# we reshape [batch, in_width, in_channels] to [batch, 1, in_width, in_channels] according to the quote above
# each dimension in the embedding is a channel(three in_channels)
f3c1 = [0.6, 0.2]
# equivalent to f2c1 in the first code snippet or w1[:,:,0] in the gif
f3c2 = [0.4, -0.1]
# equivalent to f2c2 in the first code snippet or w1[:,:,1] in the gif
f3c3 = [0.5, 0.2]
# equivalent to f2c3 in the first code snippet or w1[:,:,2] in the gif
# filters = tf.constant([[f3c1, f3c2, f3c3]])
# [out_channels, in_channels, filter_width]: [1, 3, 2]
# here we have also only one filter and also three channels in it. please compare these three with the three channels in W1 for the Conv2D in the gif
filter1D = tf.transpose(tf.constant([[f3c1, f3c2, f3c3]]), (2, 1, 0))
# shape: [2, 3, 1] for the conv1d example
filters = tf.reshape(filter1D, (1, 2, 3, 1))  # this should be expand_dim actually
# transpose [out_channels, in_channels, filter_width] to [filter_width, in_channels, out_channels]] and then reshape the result to [1, filter_width, in_channels, out_channels] as we described in the text snippet from Tensorflow doc of conv1doutput
output = tf.squeeze(tf.nn.conv2d(data, filters, strides=(1, 1, 2, 1), padding="VALID"))
# the numbers for strides are for [batch, 1, in_width, in_channels] of the data input
# <tf.Tensor: id=119, shape=(3,), dtype=float32, numpy=array([0.9       , 0.09999999, 0.12      ], dtype=float32)>

Facciamolo usando Conv1D (anche in TensorFlow):

output = tf.squeeze(tf.nn.conv1d(sentence, filter1D, stride=2, padding="VALID"))
# <tf.Tensor: id=135, shape=(3,), dtype=float32, numpy=array([0.9       , 0.09999999, 0.12      ], dtype=float32)>
# here stride defaults to be for the in_width

Possiamo vedere che il 2D in Conv2D significa che ogni canale nell'input e il filtro è bidimensionale (come vediamo nell'esempio gif) e 1D in Conv1D significa che ogni canale nell'input e il filtro è 1 dimensionale (come vediamo nel gatto e cane esempio NLP).

— Lerner Zhang
fonte