С точки зрения статистики: преобразование Фурье против регрессии с базисом Фурье

13

Я пытаюсь понять, дает ли дискретное преобразование Фурье такое же представление кривой, как регрессия с использованием базиса Фурье. Например,

library(fda)
Y=daily$tempav[,1] ## my data
length(Y) ## =365

## create Fourier basis and estimate the coefficients
mybasis=create.fourier.basis(c(0,365),365)  
basisMat=eval.basis(1:365,mybasis)
regcoef=coef(lm(Y~basisMat-1))

## using Fourier transform
fftcoef=fft(Y)

## compare
head(fftcoef)
head(regcoef)

БПФ дает комплексное число, тогда как регрессия дает действительное число.

Они передают ту же информацию? Есть ли карта один к одному между двумя наборами чисел?

(Я был бы признателен, если бы ответ был написан с точки зрения статистики, а не с точки зрения инженера. Многие онлайн-материалы, которые я могу найти, содержат повсеместно технический жаргон, что делает их менее приемлемыми для меня.)

Qoheleth
источник
Я не знаком с вашим фрагментом кода, поэтому не могу сказать, имеет ли место следующая проблема. Однако, как правило, основа ДПФ определяется в терминах интегральных («целых чисел») частот, тогда как общий «базис Фурье» для регрессии может использовать произвольные отношения частот (например, включая иррациональные числа, по крайней мере, в непрерывной арифметике). Это также может представлять интерес.
GeoMatt22
Я думаю, что всем будет полезно, если вы напишите свой вопрос в математических терминах (в отличие от фрагментов кода). Какую проблему регрессии вы решаете? Какие базисные функции Фурье вы используете? Вы будете удивлены тем, как улучшатся ответы на ваш вопрос.
Яир

Ответы:

15

Они одинаковые. Вот как...

Делать регрессию

Допустим, вы подходите для модели где t = 1 , , N и n = этаж ( N / 2 ) . Это не подходит для линейной регрессии, поэтому вместо этого вы используете некоторую тригонометрию ( cos ( a + b ) = cos

yt=j=1nAjcos(2πt[j/N]+ϕj)
t=1,,Nn=floor(N/2) ) и соответствует эквивалентной модели: y t = n j = 1 β 1 , j cos ( 2 π t [ j / N ] ) + β 2 , j sin ( 2 π t [ j / N ] ) .cos(a+b)=cos(a)cos(b)sin(a)sin(b)
yt=j=1nβ1,jcos(2πt[j/N])+β2,jsin(2πt[j/N]).
Запуск линейной регрессии на всех частотах Фурье дает вам кучу ( 2 н ) из беты: { β я , J } , я = 1 , 2 . Для любого j , если вы хотите вычислить пару вручную, вы можете использовать:{j/N:j=1,,n}2n{β^i,j}i=1,2j

и β 2,J=Σ N т = 1 утгреха(2πт[J/N])

β^1,j=t=1Nytcos(2πt[j/N])t=1Ncos2(2πt[j/N])
Это стандартные формулы регрессии.
β^2,j=t=1Nytsin(2πt[j/N])t=1Nsin2(2πt[j/N]).

Выполнение дискретного преобразования Фурье

Когда вы запускаете преобразование Фурье, вы вычисляете для :j=1,,n

d(j/N)=N1/2t=1Nytexp[2πit[j/N]]=N1/2(t=1Nytcos(2πt[j/N])it=1Nytsin(2πt[j/N])).

ieix=cos(x)+isin(x)cos(x)=cos(x)sin(x)=sin(x)

j

|d(j/N)|2=N1(t=1Nytcos(2πt[j/N]))2+N1(t=1Nytsin(2πt[j/N]))2.
In R, calculating this vector would be I <- abs(fft(Y))^2/length(Y), which is sort of weird, because you have to scale it.

Also you can define the "scaled periodogram"

P(j/N)=(2Nt=1Nytcos(2πt[j/N]))2+(2Nt=1Nytsin(2πt[j/N]))2.
Clearly P(j/N)=4N|d(j/N)|2. In R this would be P <- (4/length(Y))*I[(1:floor(length(Y)/2))].

The Connection Between the Two

It turns out the connection between the regression and the two periodograms is:

P(j/N)=β^1,j2+β^2,j2.
Why? Because the basis you chose is orthogonal/orthonormal. You can show for each j that t=1Ncos2(2πt[j/N])=t=1Nsin2(2πt[j/N])=N/2. Plug that in to the denominators of your formulas for the regression coefficients and voila.

Source: https://www.amazon.com/Time-Analysis-Its-Applications-Statistics/dp/144197864X

Taylor
источник
1
+1 for the answer and the source. It would also be good if you can demonstrate the result with the R objects I posted.
qoheleth
@qoheleth I'll leave that to you. Just be weary of how fft() doesn't scale the way I wrote (I already mentioned this), that I haven't proved anything with intercepts, and that create.fourier.basis() scales the basis functions weirdly.
Taylor
6

They are strongly related. Your example is not reproducible because you didn't include your data, thus I'll make a new one. First of all, let's create a periodic function:

T <- 10
omega <- 2*pi/T
N <- 21
x <- seq(0, T, len = N)
sum_sines_cosines <- function(x, omega){
    sin(omega*x)+2*cos(2*omega*x)+3*sin(4*omega*x)+4*cos(4*omega*x)
}
Yper <- sum_sines_cosines(x, omega)
Yper[N]-Yper[1] # numerically 0

x2 <- seq(0, T, len = 1000)
Yper2 <- sum_sines_cosines(x2, omega)
plot(x2, Yper2, col = "red", type = "l", xlab = "x", ylab = "Y")
points(x, Yper)

enter image description here

Now, let's create a Fourier basis for regression. Note that, with N=2k+1, it doesn't really make sense to create more than N2 basis functions, i.e., N3=2(k1) non-constant sines and cosines, because higher frequency components are aliased on such a grid. For example, a sine of frequency kω is indistinguishable from a costant (sine): consider the case of N=3, i.e., k=1. Anyway, if you want to double check, just change N-2 to N in the snippet below and look at the last two columns: you'll see that they're actually useless (and they create issues for the fit, because the design matrix is now singular).

# Fourier Regression with fda
library(fda)
mybasis <- create.fourier.basis(c(0,T),N-2)
basisMat <- eval.basis(x, mybasis)
FDA_regression <- lm(Yper ~ basisMat-1)
FDA_coef <-coef(FDA_regression)
barplot(FDA_coef)

enter image description here

Note that the frequencies are exactly the right ones, but the amplitudes of nonzero components are not (1,2,3,4). The reason is that the fda Fourier basis functions are scaled in a weird way: their maximum value is not 1, as it would be for the usual Fourier basis 1,sinωx,cosωx,. It's not 1π either, as it would have been for the orthonormal Fourier basis, 12π,sinωxπ,cosωxπ,.

# FDA basis has a weird scaling
max(abs(basisMat))
plot(mybasis)

enter image description here

You clearly see that:

  1. the maximum value is less than 1π
  2. the Fourier basis (truncated to the first N2 terms) contains a constant function (the black line), sines of increasing frequency (the curves which are equal to 0 at the domain boundaries) and cosines of increasing frequency (the curves which are equal to 1 at the domain boundaries), as it should be

Simply scaling the Fourier basis given by fda, so that the usual Fourier basis is obtained, leads to regression coefficients having the expected values:

basisMat <- basisMat/max(abs(basisMat))
FDA_regression <- lm(Yper ~ basisMat-1)
FDA_coef <-coef(FDA_regression)
barplot(FDA_coef, names.arg = colnames(basisMat), main = "rescaled FDA coefficients")

enter image description here

Let's try fft now: note that since Yper is a periodic sequence, the last point doesn't really add any information (the DFT of a sequence is always periodic). Thus we can discard the last point when computing the FFT. Also, the FFT is just a fast numerical algorithm to compute the DFT, and the DFT of a sequence of real or complex numbers is complex. Thus, we really want the moduluses of the FFT coefficients:

# FFT
fft_coef <- Mod(fft(Yper[1:(N-1)]))*2/(N-1)

We multiply by 2N1 in order to have the same scaling as with the Fourier basis 1,sinωx,cosωx,. If we didn't scale, we would still recover the correct frequencies, but the amplitudes would all be scaled by the same factor with respect to what we found before. Let's now plot the fft coefficients:

fft_coef <- fft_coef[1:((N-1)/2)]
terms <- paste0("exp",seq(0,(N-1)/2-1))
barplot(fft_coef, names.arg = terms, main = "FFT coefficients")

enter image description here

Ok: the frequencies are correct, but note that now the basis functions are not sines and cosines any more (they're complex exponentials expniωx, where with i I denote the imaginary unit). Note also that instead than a set of nonzero frequencies (1,2,3,4) as before, we got a set (1,2,5). The reason is that a term xnexpniωx in this complex coefficient expansion (thus xn is complex) corresponds to two real terms ansin(nωx)+bncos(nωx) in the trigonometric basis expansion, because of the Euler formula expix=cosx+isinx. The modulus of the complex coefficient is equal to the sum in quadrature of the two real coefficients, i.e., |xn|=an2+bn2. As a matter of fact, 5=33+42.

DeltaIV
источник
1
thanks DeltaIV, the data daily comes with the fda package.
qoheleth
@qoheleth I didn't know. This evening I will modify my answer using your dataset, and I will clarify a couple points.
DeltaIV