Аналогия корреляции Пирсона для 3 переменных

17

Меня интересует, является ли "корреляция" трех переменных чем-то, и если что, что бы это было?

Коэффициент корреляции моментов произведения Пирсона

\frac{E {(X - μ_{X}) (Y - μ_{Y})}}{\sqrt{V a r (X) V a r (Y)}}

$\frac{\mathrm{E}\{(X-\mu_X)(Y-\mu_Y)\}}{\sqrt{\mathrm{Var}(X)\mathrm{Var}(Y)}}$

Теперь вопрос для 3 переменных:

\frac{E {(X - μ_{X}) (Y - μ_{Y}) (Z - μ_{Z})}}{\sqrt{V a r (X) V a r (Y) V a r (Z)}}

$\frac{\mathrm{E}\{(X-\mu_X)(Y-\mu_Y)(Z-\mu_Z)\}} {\sqrt{\mathrm{Var}(X)\mathrm{Var}(Y)\mathrm{Var}(Z)}}$

что-нибудь?

В R это выглядит как нечто интерпретируемое:

> a <- rnorm(100); b <- rnorm(100); c <- rnorm(100)
> mean((a-mean(a)) * (b-mean(b)) * (c-mean(c))) / (sd(a) * sd(b) * sd(c))
[1] -0.3476942

Мы обычно смотрим на корреляцию между 2 переменными при фиксированном значении третьей переменной. Может кто-нибудь уточнить?

correlation pearson-r PascalVKooten
источник

2

1) В вашей двумерной формуле Пирсона, если «E» (означает в вашем коде) подразумевает деление на n, то st. отклонения также должны основываться на n (не n-1). 2) Пусть все три переменные будут одной и той же переменной. В этом случае мы ожидаем, что корреляция будет 1 (как в двумерном случае), но увы ...

ttnphns

Для тривариатного нормального распределения оно равно нулю, независимо от того, каковы корреляции.

Рэй Купман

1

Я действительно думаю, что название выиграло бы от изменения на «Аналогия корреляции Пирсона для 3 переменных» или подобного - это сделало бы ссылки здесь более информативными

Silverfish

1

@ Серебряная рыба, я согласен! Я обновил название, спасибо.

PascalVKooten

12

Это является действительно что - то. Чтобы выяснить это, нам нужно изучить то, что мы знаем о самой корреляции.

Корреляционная матрица векторнозначной случайной величины являются ковариационной матрицей, или просто «дисперсии» стандартизированной версии . Таким образом, каждый заменяется своей измененной версией, измененной в масштабе. $\mathbf{X}=(X_1,X_2,\ldots,X_p)$ $\mathbf{X}$ $X_i$
Ковариация и является ожиданием произведения их центрированных версий. То есть, записывая и , мы имеем $X_i$ $X_j$ $X^\prime_i = X_i - E[X_i]$ $X^\prime_j = X_j - E[X_j]$

$Cov (X_{i}, X_{j}) = E [X_{i}^{'} X_{j}^{'}] .$ $\operatorname{Cov}(X_i,X_j) = E[X^\prime_i X^\prime_j].$
Дисперсия , которую я напишу , не является одним числом. Это массив значений $\mathbf{X}$ $\operatorname{Var}(\mathbf{X})$
$Var (X)_{i j} = Cov (X_{i}, X_{j}) .$ $\operatorname{Var}(\mathbf{X})_{ij}=\operatorname{Cov}(X_i,X_j).$
Способ думать о ковариации для намеченного обобщения состоит в том, чтобы считать это тензор . Это означает, что это целый набор величин , проиндексированных и диапазоне от до , значения которых изменяются особенно простым предсказуемым образом, когда подвергается линейному преобразованию. В частности, пусть будет другой векторной случайной величиной, определенной как $v_{ij}$ $i$ $j$ $1$ $p$ $\mathbf{X}$ $\mathbf{Y}=(Y_1,Y_2,\ldots,Y_q)$

$Y_{i} = \sum_{j = 1}^{p} a_{i}^{j} X_{j} .$ $Y_i = \sum_{j=1}^p a_i^{\,j}X_j.$
Константы (и-индексы-не является степенью) образуютмассив $a_i^{\,j}$ $i$ $j$ $j$ $q\times p$ ,и. Линейность ожидания подразумевает $\mathbb{A} = (a_i^{\,j})$ $j=1,\ldots, p$ $i=1,\ldots, q$

$Var (Y)_{i j} = \sum a_{i}^{k} a_{j}^{l} Var (X)_{k l} .$ $\operatorname{Var}(\mathbf Y)_{ij} = \sum a_i^{\,k}a_j^{\,l}\operatorname{Var}(\mathbf X)_{kl} .$
В матричной записи

$Var (Y) = A Var (X) A^{'} .$ $\operatorname{Var}(\mathbf Y) = \mathbb{A}\operatorname{Var}(\mathbf X) \mathbb{A}^\prime .$
Все компоненты самом деле являются одномерными дисперсиями из-за поляризационной идентичности $\operatorname{Var}(\mathbf{X})$

$4 Cov (X_{i}, X_{j}) = Var (X_{i} + X_{j}) - Var (X_{i} - X_{j}) .$ $4\operatorname{Cov}(X_i,X_j) = \operatorname{Var}(X_i+X_j) - \operatorname{Var}(X_i-X_j).$
Это говорит нам о том, что если вы понимаете дисперсии одномерных случайных величин, вы уже понимаете ковариации двумерных переменных: они являются «просто» линейными комбинациями дисперсий.

Выражение в вопросе совершенно аналогично: переменные были стандартизированы, как в . Мы можем понять, что это представляет, рассматривая, что это означает для любой переменной, стандартизированной или нет. Мы заменили бы каждый его центрированной версией, как в , и сформировали бы величины, имеющие три индекса, $X_i$ $(1)$ $X_i$ $(2)$

μ_{3} (X)_{i j k} = E [X_{i}^{'} X_{j}^{'} X_{k}^{'}] .

$\mu_3(\mathbf{X})_{ijk} = E[X_i^\prime X_j^\prime X_k^\prime].$

Это центральные (многомерные) моменты степени $3$ . Как и в , они образуют тензор: когда , то $(4)$ $\mathbf{Y} = \mathbb{A}\mathbf{X}$

μ_{3} (Y)_{i j k} = \sum_{l, m, n} a_{i}^{l} a_{j}^{m} a_{k}^{n} μ_{3} (X)_{l m n} .

$\mu_3(\mathbf{Y})_{ijk} = \sum_{l,m,n} a_i^{\,l}a_j^{\,m}a_k^{\,n} \mu_3(\mathbf{X})_{lmn}.$

Индексы в этой тройной сумме варьируются по всем комбинациям целых чисел от до . $1$ $p$

Аналог поляризационной идентичности

\begin{aligned} 24 μ_{3} (X)_{i j k} = \\ μ_{3} (X_{i} + X_{j} + X_{k}) - μ_{3} (X_{i} - X_{j} + X_{k}) - μ_{3} (X_{i} + X_{j} - X_{k}) + μ_{3} (X_{i} - X_{j} - X_{k}) . \end{aligned}

$\eqalign{&24\mu_3(\mathbf{X})_{ijk} = \\ &\mu_3(X_i+X_j+X_k) - \mu_3(X_i-X_j+X_k) - \mu_3(X_i+X_j-X_k) + \mu_3(X_i-X_j-X_k).}$

С правой стороны, относится к (одномерному) центральному третьему моменту: ожидаемое значение куба центрированной переменной. Когда переменные стандартизированы, этот момент обычно называют асимметричным . Соответственно, мы можем думать о как являющийся многомерный перекос из . Это тензор ранга три (то есть с тремя индексами), значения которого являются линейными комбинациями асимметрии различных сумм и разностей . Если бы мы должны были искать интерпретации, то мы думаем об этих компонентах в качестве средств измерения в $\mu_3$ $\mu_3(\mathbf{X})$ $\mathbf{X}$ $X_i$ $p$ Размеры независимо от того, измеряется асимметрия в одном измерении. Во многих случаях,

Первые моменты измеряют местоположение распределения;
Вторые моменты (матрица дисперсии-ковариации) измеряют ее разброс ;
Стандартизированные вторые моменты (корреляции) показывают, как разброс изменяется в мерном пространстве; и $p$
Стандартизированные третий и четвертый моменты взяты, чтобы измерить форму распределения относительно его распространения.

$\mu_3$

Ссылка

Алан Стюарт и Дж. Кит Орд, Продвинутая теория статистики Кендалла, пятое издание, том 1: теория распределения ; Глава 3, Моменты и кумулянты . Издательство Оксфордского университета (1987).

Appendix: Proof of the Polarization Identity

Let $x_1,\ldots, x_n$ be algebraic variables. There are $2^n$ ways to add and subtract all $n$ of them. When we raise each of these sums-and-differences to the $n^\text{th}$ power, pick a suitable sign for each of those results, and add them up, we will get a multiple of $x_1x_2\cdots x_n$ .

More formally, let $S=\{1,-1\}^n$ be the set of all $n$ -tuples of $\pm 1$ , so that any element $s\in S$ is a vector $s=(s_1,s_2,\ldots,s_n)$ whose coefficients are all $\pm 1$ . The claim is

\begin{matrix} (1) & 2^{n} n! x_{1} x_{2} \dots x_{n} = \sum_{s \in S} s_{1} s_{2} \dots s_{n} (s_{1} x_{1} + s_{2} x_{2} + \dots + s_{n} x_{n})^{n} . \end{matrix}

$2^n n!\, x_1x_2\cdots x_n = \sum_{s\in S} \color{red}{s_1s_2\cdots s_n}(s_1x_1+s_2x_2+\cdots+s_nx_n)^n.\tag{1}$

Indeed, the Multinomial Theorem states that the coefficient of the monomial $x_1^{i_1}x_2^{i_2}\cdots x_n^{i_n}$ (where the $i_j$ are nonnegative integers summing to $n$ ) in the expansion of any term on the right hand side is

(\binom{n}{i_{1}, i_{2}, \dots, i_{n}}) s_{1}^{i_{1}} s_{2}^{i_{2}} \dots s_{n}^{i_{n}} .

$\binom{n}{i_1,i_2,\ldots,i_n}s_1^{i_1}s_2^{i_2}\cdots s_n^{i_n}.$

In the sum $(1)$ , the coefficients involving $x_1^{i_1}$ appear in pairs where one of each pair involves the case $s_1=1$ , with coefficient proportional to $\color{red}{s_1}$ times $s_1^{i_1}$ , equal to $1$ , and the other of each pair involves the case $s_1=-1$ , with coefficient proportional to $\color{red}{-1}$ times $(-1)^{i_1}$ , equal to $(-1)^{i_1+1}$ . They cancel in the sum whenever $i_1+1$ is odd. The same argument applies to $i_2, \ldots, i_n$ . Consequently, the only monomials that occur with nonzero coefficients must have odd powers of all the $x_i$ . The only such monomial is $x_1x_2\cdots x_n$ . It appears with coefficient $\binom{n}{1,1,\ldots,1}=n!$ in all $2^n$ terms of the sum. Consequently its coefficient is $2^nn!$ , QED.

We need take only half of each pair associated with $x_1$ : that is, we can restrict the right hand side of $(1)$ to the terms with $s_1=1$ and halve the coefficient on the left hand side to $2^{n-1}n!$ . That gives precisely the two versions of the Polarization Identity quoted in this answer for the cases $n=2$ and $n=3$ : $2^{2-1}2! = 4$ and $2^{3-1}3!=24$ .

Of course the Polarization Identity for algebraic variables immediately implies it for random variables: let each $x_i$ be a random variable $X_i$ . Take expectations of both sides. The result follows by linearity of expectation.

whuber
источник

Well done on explaining so far! Multivariate skewness kind of makes sense. Could you perhaps add an example that would show the importance of this multivariate skewness? Either as an issue in a statistical models, or perhaps more interesting, what real life case would be subject to multivariate skewness :)?

PascalVKooten

3

Hmmm. If we run...

a <- rnorm(100);
b <- rnorm(100);
c <- rnorm(100)
mean((a-mean(a))*(b-mean(b))*(c-mean(c)))/
  (sd(a) * sd(b) * sd(c))

it does seem to center on 0 (I haven't done a real simulation), but as @ttnphns alludes, running this (all variables the same)

a <- rnorm(100)
mean((a-mean(a))*(a-mean(a))*(a-mean(a)))/
  (sd(a) * sd(a) * sd(a))

also seems to center on 0, which certainly makes me wonder what use this could be.

Peter Flom - Reinstate Monica
источник

2

The nonsense apparently comes from the fact that sd or variance is a function of squaring, as is covariance. But with 3 variables, cubing occurs in the numerator while denominator remains based on originally squared terms

ttnphns

2

Is that the root of it (pun intended)? Numerator and denominator have the same dimensions and units, which cancel, so that alone doesn't make the measure poorly formed.

Ник Кокс

3

@Nick That's right. This is simply one of the multivariate central third moments. It is one component of a rank-three tensor giving the full set of third moments (which is closely related to the order-3 component of the multivariate cumulant generating function). In conjunction with the other components it could be of some use in describing asymmetries (higher-dimensional "skewness") in the distribution. It's not what anyone would call a "correlation," though: almost by definition, a correlation is a second-order property of the standardized variable.

whuber

1

If You need to calculate "correlation" between three or more variables, you could not use Pearson, as in this case it will be different for different order of variables have a look here. If you are interesting in linear dependency, or how well they are fitted by 3D line, you may use PCA, obtain explained variance for first PC, permute your data and find probability, that this value may be to to random reasons. I've discuss something similar here (see Technical details below).

Matlab code

% Simulate our experimental data
x=normrnd(0,1,100,1);
y=2*x.*normrnd(1,0.1,100,1);
z=(-3*x+1.5*y).*normrnd(1,2,100,1);
% perform pca
[loadings, scores,variance]=pca([x,y,z]);
% Observed Explained Variance for first principal component
OEV1=variance(1)/sum(variance)
% perform permutations
permOEV1=[];
for iPermutation=1:1000
    permX=datasample(x,numel(x),'replace',false);
    permY=datasample(y,numel(y),'replace',false);
    permZ=datasample(z,numel(z),'replace',false);
    [loadings, scores,variance]=pca([permX,permY,permZ]);
    permOEV1(end+1)=variance(1)/sum(variance);
end

% Calculate p-value
p_value=sum(permOEV1>=OEV1)/(numel(permOEV1)+1)

zlon
источник

Аналогия корреляции Пирсона для 3 переменных

Ответы:

Ссылка

Appendix: Proof of the Polarization Identity