MLE требует данных iid? Или просто независимые параметры?

16

Оценка параметров с использованием оценки максимального правдоподобия (MLE) включает в себя оценку функции правдоподобия, которая отображает вероятность появления выборки (X) в значения (x) в пространстве параметров (θ) при заданном семействе распределения (P (X = x | θ). ) по возможным значениям θ (примечание: я прав в этом?). Все примеры, которые я видел, включают вычисление P (X = x | θ) путем взятия произведения F (X), где F - это распределение с локальным значение для θ и X является выборкой (вектором).

Поскольку мы просто умножаем данные, значит ли это, что данные будут независимыми? Например, не могли бы мы использовать MLE для подгонки данных временных рядов? Или параметры просто должны быть независимыми?

maximum-likelihood Феликс
источник

14

Функция правдоподобия , определенные как вероятность события (набор данных ) как функция от параметров модели $E$ ${\bf x}$ $\theta$

L (θ; x) \propto P (Event E; θ) = P (observing x; θ) .

${\mathcal L}(\theta;{\bf x})\propto {\mathbb P}(\text{Event }E;\theta)= {\mathbb P}(\text{observing } {\bf x};\theta).$

Поэтому нет предположения о независимости наблюдений. В классическом подходе нет определения независимости параметров, поскольку они не являются случайными величинами; некоторые связанные понятия могут быть идентифицируемостью , ортогональностью параметров и независимостью оценок максимального правдоподобия (которые являются случайными величинами).

Несколько примеров,

(1). Дискретный случай . является образец (независимый) дискретных наблюдений с , то ${\bf x}=(x_1,...,x_n)$ ${\mathbb P}(\text{observing } x_j ; \theta)>0$

L (θ; x) \propto \prod_{j = 1}^{n} P (observing x_{J}; θ),

${\mathcal L}(\theta;{\bf x})\propto \prod_{j=1}^n{\mathbb P}(\text{observing } x_j ; \theta).$

В частности, если , с известным , имеем $x_j\sim \text{Binomial}(N,\theta)$ $N$

L (θ; x) \propto \prod_{j = 1}^{n} θ^{x_{j}} (1 - θ)^{N - x_{j}} .

${\mathcal L}(\theta;{\bf x})\propto \prod_{j=1}^n \theta^{x_j}(1-\theta)^{N-x_j}.$

(2). Непрерывное приближение . Пусть быть образцом из непрерывного случайной величины , с распределением и плотностью , с измерением ошибки , это, вы наблюдаете множество . потом ${\bf x}=(x_1,...,x_n)$ $X$ $F$ $f$ $\epsilon$ $(x_j-\epsilon,x_j+\epsilon)$

\begin{array}{rcl} L (θ; x) \propto \prod_{j = 1}^{n} P [observing (x_{j} - ϵ, x_{j} + ϵ); θ] = \prod_{j = 1}^{n} [F (x_{j} + ϵ; θ) - F (x_{j} - ϵ; θ)] \end{array}

$\begin{eqnarray*} {\mathcal L}(\theta;{\bf x})\propto \prod_{j=1}^n {\mathbb P}[\text{observing } (x_j-\epsilon,x_j+\epsilon);\theta] = \prod_{j=1}^n[F(x_j+\epsilon;\theta)-F(x_j-\epsilon;\theta)] \end{eqnarray*}$

При мало, это может быть аппроксимировано ( с использованием среднего значения теоремы) путем $\epsilon$

\begin{array}{rcl} L (θ; x) \propto \prod_{j = 1}^{n} f (x_{j}; θ) \end{array}

$\begin{eqnarray*} {\mathcal L}(\theta;{\bf x})\propto \prod_{j=1}^n f(x_j;\theta) \end{eqnarray*}$

Для примера с нормальным случаем взгляните на это .

(3). Зависимая и марковская модель . Предположим , что представляет собой набор наблюдений , возможно , зависимых и пусть быть совместной плотности , то ${\bf x}=(x_1,...,x_n)$ $f$ ${\bf x}$

\begin{array}{rcl} L (θ; x) \propto f (x; θ) . \end{array}

$\begin{eqnarray*} {\mathcal L}(\theta;{\bf x})\propto f({\bf x}; \theta). \end{eqnarray*}$

Если дополнительно выполнено свойство Маркова , то

\begin{array}{rcl} L (θ; x) \propto f (x; θ) = f (x_{1}; θ) \prod_{j = 1}^{n - 1} f (x_{j + 1} | x_{j}; θ) . \end{array}

$\begin{eqnarray*} {\mathcal L}(\theta;{\bf x})\propto f({\bf x}; \theta) = f(x_1;\theta)\prod_{j=1}^{n-1} f(x_{j+1} \vert x_j ;\theta). \end{eqnarray*}$

Take also a look at this.

Community
источник

3

From the you write the likelihood function as a product, you are implicitly assuming a dependence structure among the observations. So for MLE one needs two assumptions (a) one on the distribution of each individual outcome and (b) one on the dependence among the outcomes.

10

(+1) Very good question.

Minor thing, MLE stands for maximum likelihood estimate (not multiple), which means that you just maximize the likelihood. This does not specify that the likelihood has to be produced by IID sampling.

If the dependence of the sampling can be written in the statistical model, you just write the likelihood accordingly and maximize it as usual.

The one case worth mentioning when you do not assume dependence is that of the multivariate Gaussian sampling (in time series analysis for example). The dependence between two Gaussian variables can be modelled by their covariance term, which you incoroporate in the likelihood.

To give a simplistic example, assume that you draw a sample of size $2$ from correlated Gaussian variables with same mean and variance. You would write the likelihood as

\frac{1}{2 π σ^{2} \sqrt{1 - ρ^{2}}} \exp (- \frac{z}{2 σ^{2} (1 - ρ^{2})}),

$\frac{1}{2\pi\sigma^2\sqrt{1-\rho^2}}\exp\left(-\frac{z}{2\sigma^2(1-\rho^2)}\right),$

where $z$ is

z = (x_{1} - μ)^{2} - 2 ρ (x_{1} - μ) (x_{2} - μ) + (x_{2} - μ)^{2} .

$z = (x_1-\mu)^2-2\rho(x_1-\mu)(x_2-\mu)+(x_2-\mu)^2.$

This is not the product of the individual likelihoods. Still, you would maximize this with parameters $(\mu, \sigma, \rho)$ to get their MLE.

gui11aume
источник

2

These are good answers and examples. The only thing I would add to see this in simple terms is that likelihood estimation only requires that a model for the generation of the data be specified in terms of some unknown parameters be described in functional form.

Michael R. Chernick

(+1) Absolutely true! Do you have an example of model that cannot be specified in those terms?

gui11aume

@gu11aume I think you are referring to my remark. I would say that I was not giving a direct answer to the question. The answwer to the question is yes because there are examples that can be shown where the likelihood function can be expressed when the data are genersted by dependent random variables.

Michael R. Chernick

2

Examples where this cannot be done would be where the data are given without any description of the data generating mechanism or the model is not presented in a parametric form such as when you are given two iid data sets and are asked to test whether they come from the same distribution where you only specify that the distributions are absolutely continuous.

Michael R. Chernick

4

Of course, Gaussian ARMA models possess a likelihood, as their covariance function can be derived explicitly. This is basically an extension of gui11ame's answer to more than 2 observations. Minimal googling produces papers like this one where the likelihood is given in the general form.

Another, to an extent, more intriguing, class of examples is given by multilevel random effect models. If you have data of the form

y_{i j} = x_{i j}^{'} β + u_{i} + ϵ_{i j},

$y_{ij} = x_{ij}'\beta + u_i + \epsilon_{ij},$ where indices

j

$j$ are nested in

i

$i$ (think of students

j

$j$ in classrooms

i

$i$ , say, for a classic application of multilevel models), then, assuming

ϵ_{i j} ⊥ u_{i}

$\epsilon_{ij} \perp u_i$ , the likelihood is

\ln L \sim \sum_{i} \ln \int \prod_{j} f (y_{i j} | β, u_{i}) d F (u_{i})

$\ln L \sim \sum_i \ln \int \prod_j f(y_{ij}|\beta,u_i) {\rm d}F(u_i)$ and is a sum over the likelihood contributions defined at the level of clusters, not individual observations. (Of course, in the Gaussian case, you can push the integrals around to produce an analytic ANOVA-like solution. However, if you have say a logit model for your response

y_{i j}

$y_{ij}$ , then there is no way out of numerical integration.)

StasK
источник

2

Stask and @gui11aume, these three answers are nice but I think they miss a point: what about the consistency of the MLE for dependent data ?

Stéphane Laurent

MLE требует данных iid? Или просто независимые параметры?

Ответы: