Dal punto di vista statistico: trasformata di Fourier vs regressione con base di Fourier


13

Sto cercando di capire se la trasformata discreta di Fourier fornisce la stessa rappresentazione di una curva come una regressione usando la base di Fourier. Per esempio,

library(fda)
Y=daily$tempav[,1] ## my data
length(Y) ## =365

## create Fourier basis and estimate the coefficients
mybasis=create.fourier.basis(c(0,365),365)  
basisMat=eval.basis(1:365,mybasis)
regcoef=coef(lm(Y~basisMat-1))

## using Fourier transform
fftcoef=fft(Y)

## compare
head(fftcoef)
head(regcoef)

FFT fornisce un numero complesso, mentre la regressione fornisce un numero reale.

Trasmettono le stesse informazioni? Esiste una mappa uno a uno tra le due serie di numeri?

(Gradirei se la risposta fosse scritta dal punto di vista dello statistico anziché dal punto di vista dell'ingegnere. Molti materiali online che posso trovare hanno un gergo ingegneristico ovunque, il che li rende meno appetibili per me.)


Non ho familiarità con lo snippet di codice, quindi non posso dire se il seguente problema si applica lì. Tuttavia, in genere la base DFT è definita in termini di frequenze integrali ("numero intero"), mentre una "base di Fourier" generale per la regressione potrebbe utilizzare rapporti di frequenza arbitrari (ad esempio includendo irrazionali, almeno in aritmetica continua). Questo potrebbe anche essere di interesse.
GeoMatt22,

Penso che tutti trarrebbero beneficio se scrivessi la tua domanda in termini matematici (al contrario dei frammenti di codice). Qual è il problema di regressione che risolvi? Quali sono le funzioni di base di Fourier che usi? Sarai sorpreso di come miglioreranno le risposte alla tua domanda.
Yair Daon,

Risposte:


15

Sono gli stessi Ecco come...

Fare una regressione

Di 'che si adatta al modello dove t = 1 , , N e n = piano ( N / ( a ) cos ( b ) - sin ( a ) sin ( b )

yt=j=1nAjcos(2πt[j/N]+ϕj)
t=1,,N . Questo non è adatto per la regressione lineare, quindi, invece, usi una certa trigonometria ( cos ( a + b ) = cosn=floor(N/2)cos(a+b)=cos(a)cos(b)sin(a)sin(b) ) e si adatta al modello equivalente:
yt=j=1nβ1,jcos(2πt[j/N])+β2,jsin(2πt[j/N]).
Esecuzione di regressione lineare su tutte le frequenze di Fourier ti dà un sacco ( 2 n ) di beta: { β i , j } , i = 1 , 2 . Per qualsiasi j , se si desidera calcolare manualmente la coppia, è possibile utilizzare:{j/N:j=1,,n}2n{β^i,j}i=1,2j

e β 2,j=Σ N t = 1 ytsin(2πt[j/N])

β^1,j=t=1Nytcos(2πt[j/N])t=1Ncos2(2πt[j/N])
Queste sono formule di regressione standard.
β^2,j=t=1Nytsin(2πt[j/N])t=1Nsin2(2πt[j/N]).

Fare una trasformata discreta di Fourier

Quando si esegue una trasformata di Fourier, si calcola, per :j=1,,n

d(j/N)=N1/2t=1Nytexp[2πit[j/N]]=N1/2(t=1Nytcos(2πt[j/N])it=1Nytsin(2πt[j/N])).

Questo è un numero complesso (nota la ). Per capire perché questa uguaglianza è valida, tieni presente che e i x = cos ( x ) + i sin ( x ) , cos ( - x ) = cos ( x ) e sin ( - xieix=cos(x)+isin(x)cos(x)=cos(x) .sin(x)=sin(x)

Per ogni , prendere il quadrato del coniugato complesso ti dà il " periodogramma :"j

|d(j/N)|2=N1(t=1Nytcos(2πt[j/N]))2+N1(t=1Nytsin(2πt[j/N]))2.
In R, il calcolo di questo vettore sarebbe I <- abs(fft(Y))^2/length(Y), il che è un po 'strano, perché devi ridimensionarlo.

Inoltre è possibile definire il " periodogramma in scala "

P(j/N)=(2Nt=1Nytcos(2πt[j/N]))2+(2Nt=1Nytsin(2πt[j/N]))2.
P(j/N)=4N|d(j/N)|2. In R this would be P <- (4/length(Y))*I[(1:floor(length(Y)/2))].

The Connection Between the Two

It turns out the connection between the regression and the two periodograms is:

P(j/N)=β^1,j2+β^2,j2.
Why? Because the basis you chose is orthogonal/orthonormal. You can show for each j that t=1Ncos2(2πt[j/N])=t=1Nsin2(2πt[j/N])=N/2. Plug that in to the denominators of your formulas for the regression coefficients and voila.

Source: https://www.amazon.com/Time-Analysis-Its-Applications-Statistics/dp/144197864X


1
+1 for the answer and the source. It would also be good if you can demonstrate the result with the R objects I posted.
qoheleth

@qoheleth I'll leave that to you. Just be weary of how fft() doesn't scale the way I wrote (I already mentioned this), that I haven't proved anything with intercepts, and that create.fourier.basis() scales the basis functions weirdly.
Taylor

6

They are strongly related. Your example is not reproducible because you didn't include your data, thus I'll make a new one. First of all, let's create a periodic function:

T <- 10
omega <- 2*pi/T
N <- 21
x <- seq(0, T, len = N)
sum_sines_cosines <- function(x, omega){
    sin(omega*x)+2*cos(2*omega*x)+3*sin(4*omega*x)+4*cos(4*omega*x)
}
Yper <- sum_sines_cosines(x, omega)
Yper[N]-Yper[1] # numerically 0

x2 <- seq(0, T, len = 1000)
Yper2 <- sum_sines_cosines(x2, omega)
plot(x2, Yper2, col = "red", type = "l", xlab = "x", ylab = "Y")
points(x, Yper)

enter image description here

Now, let's create a Fourier basis for regression. Note that, with N=2k+1, it doesn't really make sense to create more than N2 basis functions, i.e., N3=2(k1) non-constant sines and cosines, because higher frequency components are aliased on such a grid. For example, a sine of frequency kω is indistinguishable from a costant (sine): consider the case of N=3, i.e., k=1. Anyway, if you want to double check, just change N-2 to N in the snippet below and look at the last two columns: you'll see that they're actually useless (and they create issues for the fit, because the design matrix is now singular).

# Fourier Regression with fda
library(fda)
mybasis <- create.fourier.basis(c(0,T),N-2)
basisMat <- eval.basis(x, mybasis)
FDA_regression <- lm(Yper ~ basisMat-1)
FDA_coef <-coef(FDA_regression)
barplot(FDA_coef)

enter image description here

Note that the frequencies are exactly the right ones, but the amplitudes of nonzero components are not (1,2,3,4). The reason is that the fda Fourier basis functions are scaled in a weird way: their maximum value is not 1, as it would be for the usual Fourier basis 1,sinωx,cosωx,. It's not 1π either, as it would have been for the orthonormal Fourier basis, 12π,sinωxπ,cosωxπ,.

# FDA basis has a weird scaling
max(abs(basisMat))
plot(mybasis)

enter image description here

You clearly see that:

  1. the maximum value is less than 1π
  2. the Fourier basis (truncated to the first N2 terms) contains a constant function (the black line), sines of increasing frequency (the curves which are equal to 0 at the domain boundaries) and cosines of increasing frequency (the curves which are equal to 1 at the domain boundaries), as it should be

Simply scaling the Fourier basis given by fda, so that the usual Fourier basis is obtained, leads to regression coefficients having the expected values:

basisMat <- basisMat/max(abs(basisMat))
FDA_regression <- lm(Yper ~ basisMat-1)
FDA_coef <-coef(FDA_regression)
barplot(FDA_coef, names.arg = colnames(basisMat), main = "rescaled FDA coefficients")

enter image description here

Let's try fft now: note that since Yper is a periodic sequence, the last point doesn't really add any information (the DFT of a sequence is always periodic). Thus we can discard the last point when computing the FFT. Also, the FFT is just a fast numerical algorithm to compute the DFT, and the DFT of a sequence of real or complex numbers is complex. Thus, we really want the moduluses of the FFT coefficients:

# FFT
fft_coef <- Mod(fft(Yper[1:(N-1)]))*2/(N-1)

We multiply by 2N1 in order to have the same scaling as with the Fourier basis 1,sinωx,cosωx,. If we didn't scale, we would still recover the correct frequencies, but the amplitudes would all be scaled by the same factor with respect to what we found before. Let's now plot the fft coefficients:

fft_coef <- fft_coef[1:((N-1)/2)]
terms <- paste0("exp",seq(0,(N-1)/2-1))
barplot(fft_coef, names.arg = terms, main = "FFT coefficients")

enter image description here

Ok: the frequencies are correct, but note that now the basis functions are not sines and cosines any more (they're complex exponentials expniωx, where with i I denote the imaginary unit). Note also that instead than a set of nonzero frequencies (1,2,3,4) as before, we got a set (1,2,5). The reason is that a term xnexpniωx in this complex coefficient expansion (thus xn is complex) corresponds to two real terms ansin(nωx)+bncos(nωx) in the trigonometric basis expansion, because of the Euler formula expix=cosx+isinx. The modulus of the complex coefficient is equal to the sum in quadrature of the two real coefficients, i.e., |xn|=an2+bn2. As a matter of fact, 5=33+42.


1
thanks DeltaIV, the data daily comes with the fda package.
qoheleth

@qoheleth I didn't know. This evening I will modify my answer using your dataset, and I will clarify a couple points.
DeltaIV
Utilizzando il nostro sito, riconosci di aver letto e compreso le nostre Informativa sui cookie e Informativa sulla privacy.
Licensed under cc by-sa 3.0 with attribution required.