13

Я пытаюсь понять, дает ли дискретное преобразование Фурье такое же представление кривой, как регрессия с использованием базиса Фурье. Например,

library(fda)
Y=daily$tempav[,1] ## my data
length(Y) ## =365

## create Fourier basis and estimate the coefficients
mybasis=create.fourier.basis(c(0,365),365)  
basisMat=eval.basis(1:365,mybasis)
regcoef=coef(lm(Y~basisMat-1))

## using Fourier transform
fftcoef=fft(Y)

## compare
head(fftcoef)
head(regcoef)

БПФ дает комплексное число, тогда как регрессия дает действительное число.

Они передают ту же информацию? Есть ли карта один к одному между двумя наборами чисел?

(Я был бы признателен, если бы ответ был написан с точки зрения статистики, а не с точки зрения инженера. Многие онлайн-материалы, которые я могу найти, содержат повсеместно технический жаргон, что делает их менее приемлемыми для меня.)

fourier-transform functional-data-analysis Qoheleth
источник

Я не знаком с вашим фрагментом кода, поэтому не могу сказать, имеет ли место следующая проблема. Однако, как правило, основа ДПФ определяется в терминах интегральных («целых чисел») частот, тогда как общий «базис Фурье» для регрессии может использовать произвольные отношения частот (например, включая иррациональные числа, по крайней мере, в непрерывной арифметике). Это также может представлять интерес.

GeoMatt22

Я думаю, что всем будет полезно, если вы напишите свой вопрос в математических терминах (в отличие от фрагментов кода). Какую проблему регрессии вы решаете? Какие базисные функции Фурье вы используете? Вы будете удивлены тем, как улучшатся ответы на ваш вопрос.

Яир

15

Они одинаковые. Вот как...

Делать регрессию

Допустим, вы подходите для модели где и . Это не подходит для линейной регрессии, поэтому вместо этого вы используете некоторую тригонометрию (

y_{t} = \sum_{j = 1}^{n} A_{j} \cos (2 π t [j / N] + ϕ_{j})

$y_t = \sum_{j=1}^n A_j \cos(2 \pi t [j/N] + \phi_j)$

t = 1, \dots, N

$t=1,\ldots,N$

n = floor (N / 2)

$n = \text{floor}(N/2)$

) и соответствует эквивалентной модели:

\cos (a + b) = \cos (a) \cos (b) - \sin (a) \sin (b)

$\cos(a + b) = \cos(a)\cos(b) - \sin(a)\sin(b)$

y_{t} = \sum_{j = 1}^{n} β_{1, j} \cos (2 π t [j / N]) + β_{2, j} \sin (2 π t [j / N]) .

$y_t = \sum_{j=1}^n \beta_{1,j} \cos(2 \pi t [j/N]) + \beta_{2,j}\sin(2 \pi t [j/N]).$ Запуск линейной регрессии на всех частотах Фурье

дает вам кучу (

) из

,

. Для любого

, если вы хотите вычислить пару вручную, вы можете использовать:

{j / N : j = 1, \dots, n}

$\{j/N : j = 1, \ldots, n \}$

2 n

$2n$

{{\hat{β}}_{i, j}}

$\{\hat{\beta}_{i,j}\}$

i = 1, 2

$i=1,2$

j

$j$

и

{\hat{β}}_{1, j} = \frac{\sum_{t = 1}^{N} y_{t} \cos (2 π t [j / N])}{\sum_{t = 1}^{N} \cos^{2} (2 π t [j / N])}

$\hat{\beta}_{1,j} = \frac{\sum_{t=1}^N y_t \cos(2 \pi t [j/N])}{\sum_{t=1}^N \cos^2(2 \pi t [j/N])}$

Это стандартные формулы регрессии.

{\hat{β}}_{2, j} = \frac{\sum_{t = 1}^{N} y_{t} \sin (2 π t [j / N])}{\sum_{t = 1}^{N} \sin^{2} (2 π t [j / N])} .

$\hat{\beta}_{2,j} = \frac{\sum_{t=1}^N y_t \sin(2 \pi t [j/N])}{\sum_{t=1}^N \sin^2(2 \pi t [j/N])}.$

Выполнение дискретного преобразования Фурье

Когда вы запускаете преобразование Фурье, вы вычисляете для : $j=1, \ldots, n$

\begin{aligned} d (j / N) & = N^{- 1 / 2} \sum_{t = 1}^{N} y_{t} \exp [- 2 π i t [j / N]] \\ = N^{- 1 / 2} (\sum_{t = 1}^{N} y_{t} \cos (2 π t [j / N]) - i \sum_{t = 1}^{N} y_{t} \sin (2 π t [j / N])) . \end{aligned}

$\begin{align*} d(j/N) &= N^{-1/2}\sum_{t=1}^N y_t \exp[- 2 \pi i t [j/N]] \\ &= N^{-1/2}\left(\sum_{t=1}^N y_t\cos(2 \pi t [j/N]) - i \sum_{t=1}^N y_t\sin(2 \pi t [j/N])\right) . \end{align*}$

$i$ $e^{ix} = \cos(x) + i\sin(x)$ $\cos(-x) = \cos(x)$ $\sin(-x) = -\sin(x)$

$j$

| d (j / N) |^{2} = N^{- 1} {(\sum_{t = 1}^{N} y_{t} \cos (2 π t [j / N]))}^{2} + N^{- 1} {(\sum_{t = 1}^{N} y_{t} \sin (2 π t [j / N]))}^{2} .

$|d(j/N)|^2 = N^{-1}\left(\sum_{t=1}^N y_t\cos(2 \pi t [j/N])\right)^2 + N^{-1}\left(\sum_{t=1}^N y_t\sin(2 \pi t [j/N])\right)^2.$ In R, calculating this vector would be I <- abs(fft(Y))^2/length(Y), which is sort of weird, because you have to scale it.

Also you can define the "scaled periodogram"

P (j / N) = {(\frac{2}{N} \sum_{t = 1}^{N} y_{t} \cos (2 π t [j / N]))}^{2} + {(\frac{2}{N} \sum_{t = 1}^{N} y_{t} \sin (2 π t [j / N]))}^{2} .

$P(j/N) = \left(\frac{2}{N}\sum_{t=1}^N y_t \cos(2 \pi t [j/N]) \right)^2 + \left(\frac{2}{N}\sum_{t=1}^N y_t \sin(2 \pi t [j/N]) \right)^2.$ Clearly

P (j / N) = \frac{4}{N} | d (j / N) |^{2}

$P(j/N) = \frac{4}{N}|d(j/N)|^2$ . In R this would be P <- (4/length(Y))*I[(1:floor(length(Y)/2))].

The Connection Between the Two

It turns out the connection between the regression and the two periodograms is:

P (j / N) = {\hat{β}}_{1, j}^{2} + {\hat{β}}_{2, j}^{2} .

$P(j/N) = \hat{\beta}_{1,j}^2 + \hat{\beta}_{2,j}^2.$ Why? Because the basis you chose is orthogonal/orthonormal. You can show for each

j

$j$ that

\sum_{t = 1}^{N} \cos^{2} (2 π t [j / N]) = \sum_{t = 1}^{N} \sin^{2} (2 π t [j / N]) = N / 2

$\sum_{t=1}^N \cos^2(2 \pi t [j/N]) = \sum_{t=1}^N \sin^2(2 \pi t [j/N]) = N/2$ . Plug that in to the denominators of your formulas for the regression coefficients and voila.

Source: https://www.amazon.com/Time-Analysis-Its-Applications-Statistics/dp/144197864X

Taylor
источник

1

+1 for the answer and the source. It would also be good if you can demonstrate the result with the R objects I posted.

qoheleth

@qoheleth I'll leave that to you. Just be weary of how fft() doesn't scale the way I wrote (I already mentioned this), that I haven't proved anything with intercepts, and that create.fourier.basis() scales the basis functions weirdly.

Taylor

6

They are strongly related. Your example is not reproducible because you didn't include your data, thus I'll make a new one. First of all, let's create a periodic function:

T <- 10
omega <- 2*pi/T
N <- 21
x <- seq(0, T, len = N)
sum_sines_cosines <- function(x, omega){
    sin(omega*x)+2*cos(2*omega*x)+3*sin(4*omega*x)+4*cos(4*omega*x)
}
Yper <- sum_sines_cosines(x, omega)
Yper[N]-Yper[1] # numerically 0

x2 <- seq(0, T, len = 1000)
Yper2 <- sum_sines_cosines(x2, omega)
plot(x2, Yper2, col = "red", type = "l", xlab = "x", ylab = "Y")
points(x, Yper)

Now, let's create a Fourier basis for regression. Note that, with $N = 2k+1$ , it doesn't really make sense to create more than $N-2$ basis functions, i.e., $N-3=2(k-1)$ non-constant sines and cosines, because higher frequency components are aliased on such a grid. For example, a sine of frequency $k\omega$ is indistinguishable from a costant (sine): consider the case of $N=3$ , i.e., $k=1$ . Anyway, if you want to double check, just change N-2 to N in the snippet below and look at the last two columns: you'll see that they're actually useless (and they create issues for the fit, because the design matrix is now singular).

# Fourier Regression with fda
library(fda)
mybasis <- create.fourier.basis(c(0,T),N-2)
basisMat <- eval.basis(x, mybasis)
FDA_regression <- lm(Yper ~ basisMat-1)
FDA_coef <-coef(FDA_regression)
barplot(FDA_coef)

Note that the frequencies are exactly the right ones, but the amplitudes of nonzero components are not (1,2,3,4). The reason is that the fda Fourier basis functions are scaled in a weird way: their maximum value is not 1, as it would be for the usual Fourier basis ${1,\sin{ \omega x},\cos{ \omega x},\dots}$ . It's not $\frac{1}{\sqrt \pi}$ either, as it would have been for the orthonormal Fourier basis, ${\frac{1}{\sqrt {2\pi}},\frac{\sin{ \omega x}}{\sqrt \pi},\frac{\cos{ \omega x}}{\sqrt \pi},\dots}$ .

# FDA basis has a weird scaling
max(abs(basisMat))
plot(mybasis)

You clearly see that:

the maximum value is less than $\frac{1}{\sqrt \pi}$
the Fourier basis (truncated to the first $N-2$ terms) contains a constant function (the black line), sines of increasing frequency (the curves which are equal to 0 at the domain boundaries) and cosines of increasing frequency (the curves which are equal to 1 at the domain boundaries), as it should be

Simply scaling the Fourier basis given by fda, so that the usual Fourier basis is obtained, leads to regression coefficients having the expected values:

basisMat <- basisMat/max(abs(basisMat))
FDA_regression <- lm(Yper ~ basisMat-1)
FDA_coef <-coef(FDA_regression)
barplot(FDA_coef, names.arg = colnames(basisMat), main = "rescaled FDA coefficients")

Let's try fft now: note that since Yper is a periodic sequence, the last point doesn't really add any information (the DFT of a sequence is always periodic). Thus we can discard the last point when computing the FFT. Also, the FFT is just a fast numerical algorithm to compute the DFT, and the DFT of a sequence of real or complex numbers is complex. Thus, we really want the moduluses of the FFT coefficients:

# FFT
fft_coef <- Mod(fft(Yper[1:(N-1)]))*2/(N-1)

We multiply by $\frac{2}{N-1}$ in order to have the same scaling as with the Fourier basis ${1,\sin{ \omega x},\cos{ \omega x},\dots}$ . If we didn't scale, we would still recover the correct frequencies, but the amplitudes would all be scaled by the same factor with respect to what we found before. Let's now plot the fft coefficients:

fft_coef <- fft_coef[1:((N-1)/2)]
terms <- paste0("exp",seq(0,(N-1)/2-1))
barplot(fft_coef, names.arg = terms, main = "FFT coefficients")

Ok: the frequencies are correct, but note that now the basis functions are not sines and cosines any more (they're complex exponentials $\exp{ni\omega x}$ , where with $i$ I denote the imaginary unit). Note also that instead than a set of nonzero frequencies (1,2,3,4) as before, we got a set (1,2,5). The reason is that a term $x_n\exp{ni\omega x}$ in this complex coefficient expansion (thus $x_n$ is complex) corresponds to two real terms $a_n sin(n\omega x)+b_n cos(n\omega x)$ in the trigonometric basis expansion, because of the Euler formula $\exp{ix}=\cos{x}+i\sin{x}$ . The modulus of the complex coefficient is equal to the sum in quadrature of the two real coefficients, i.e., $|x_n|=\sqrt{a_n^2+b_n^2}$ . As a matter of fact, $5=\sqrt{3^3+4^2}$ .

DeltaIV
источник

1

thanks DeltaIV, the data daily comes with the fda package.

qoheleth

@qoheleth I didn't know. This evening I will modify my answer using your dataset, and I will clarify a couple points.

DeltaIV

С точки зрения статистики: преобразование Фурье против регрессии с базисом Фурье

Ответы:

Делать регрессию

Выполнение дискретного преобразования Фурье

The Connection Between the Two