Существует ряд надежных оценок масштаба . Ярким примером является медианой абсолютное отклонение , которое относится к стандартному отклонению , как . В байесовской структуре существует ряд способов надежной оценки местоположения примерно нормального распределения (скажем, нормального, загрязненного выбросами), например, можно предположить, что данные распределены как при распределении, так и при распределении Лапласа. Теперь мой вопрос:
Какова будет байесовская модель для измерения масштаба приблизительно нормального распределения надежным способом, устойчивым в том же смысле, что и MAD или аналогичные надежные оценки?
Как и в случае с MAD, было бы неплохо, если бы байесовская модель могла приблизиться к SD нормального распределения в случае, когда распределение данных фактически распределено нормально.
редактировать 1:
Типичный пример модели , которая устойчива к загрязнению / выбросам при предположении данных являюсь примерно нормально используем при распределении , как:
Где - среднее значение, - масштаб, а - степень свободы. С помощью соответствующих априорий на и , , будет оценкой среднего , что будет устойчивыми к выбросам. Тем не менее, не будет состоятельной оценкой СД , как зависит от , . Например, если будет зафиксировано на 4.0, а модель, приведенная выше, будет соответствовать огромному количеству выборок из распределение тогда s будет около 0,82. То, что я ищу, - это модель, которая является надежной, как t-модель, но для SD вместо (или в дополнение к) среднего значения.
редактировать 2:
Здесь следует закодированный пример в R и JAGS того, как упомянутая выше t-модель является более устойчивой по отношению к среднему.
# generating some contaminated data
y <- c( rnorm(100, mean=10, sd=10),
rnorm(10, mean=100, sd= 100))
#### A "standard" normal model ####
model_string <- "model{
for(i in 1:length(y)) {
y[i] ~ dnorm(mu, inv_sigma2)
}
mu ~ dnorm(0, 0.00001)
inv_sigma2 ~ dgamma(0.0001, 0.0001)
sigma <- 1 / sqrt(inv_sigma2)
}"
model <- jags.model(textConnection(model_string), list(y = y))
mcmc_samples <- coda.samples(model, "mu", n.iter=10000)
summary(mcmc_samples)
### The quantiles of the posterior of mu
## 2.5% 25% 50% 75% 97.5%
## 9.8 14.3 16.8 19.2 24.1
#### A (more) robust t-model ####
library(rjags)
model_string <- "model{
for(i in 1:length(y)) {
y[i] ~ dt(mu, inv_s2, nu)
}
mu ~ dnorm(0, 0.00001)
inv_s2 ~ dgamma(0.0001,0.0001)
s <- 1 / sqrt(inv_s2)
nu ~ dexp(1/30)
}"
model <- jags.model(textConnection(model_string), list(y = y))
mcmc_samples <- coda.samples(model, "mu", n.iter=1000)
summary(mcmc_samples)
### The quantiles of the posterior of mu
## 2.5% 25% 50% 75% 97.5%
##8.03 9.35 9.99 10.71 12.14
источник
Ответы:
Bayesian inference in a T noise model with an appropriate prior will give a robust estimate of location and scale. The precise conditions that the likelihood and prior need to satisfy are given in the paper Bayesian robustness modelling of location and scale parameters by Andrade and O'Hagan (2011). The estimates are robust in the sense that a single observation cannot make the estimates arbitrarily large, as demonstrated in figure 2 of the paper.
When the data is normally distributed, the SD of the fitted T distribution (for fixedν ) does not match the SD of the generating distribution. But this is easy to fix.
Let σ be the standard deviation of the generating distribution and let s be the standard deviation of the fitted T distribution.
If the data is scaled by 2, then from the form of the likelihood we know that s must scale by 2.
This implies that s=σf(ν) for some fixed function f .
This function can be computed numerically by simulation from a standard normal. Here is the code to do this:
For example, atν=4 I get f(ν)=1.18 .
The desired estimator is then σ^=s/f(ν) .
источник
As you are asking a question about a very precise problem (robust estimation), I will offer you an equally precise answer. First, however, I will begin be trying to dispel an unwarranted assumption. It is not true that there is a robust bayesian estimate of location (there are bayesian estimators of locations but as I illustrate below they are not robust and, apparently, even the simplest robust estimator of location is not bayesian) . In my opinion, the reasons for the absence of overlap between the 'bayesian' and 'robust' paradigm in the location case goes a long way in explaining why there also are no estimators of scatter that are both robust and bayesian .
Actually, no. The resulting estimates will only be robust in a very weak sense of the word robust. However, when we say that the median is robust to outliers we mean the word robust in a much stronger sense. That is, in robust statistics, the robustness of the median refers to the property that if you compute the median on a data-set of observations drawn from a uni-modal, continuous model and then replace less than half of these observations by arbitrary values, the value of the median computed on the contaminated data is close to the value you would have had had you computed it on the original (uncontaminated) data-set. Then, it is easy to show that the estimation strategy you propose in the paragraph I quoted above is definitely not robust in the sense of how the word is typically understood for the median.
I'm wholly unfamiliar with Bayesian analysis. However, I was wondering what is wrong with the following strategy as it seems simple, effective and yet has not been considered in the other answers. The prior is that the good part of the data is drawn from a symmetric distributionF and that the rate of contamination is less than half. Then, a simple strategy would be to:
EDIT:
Thanks to the OP for providing a self contained R code to conduct a bonna fide bayesian analysis of the problem.
the code below compares the the bayesian approach suggested by the O.P. to it's alternative from the robust statistics literature (e.g. the fitting method proposed by Gauss for the case where the data may contain as much asn/2−2
outliers and the distribution of the good part of the data is Gaussian).
central part of the data isN(1000,1) :
Add some amount of contaminants:
the index w takes value 1 for the outliers. I begin with the approach suggested by the O.P.:
I get:
and:
(quiet far thus from the target values)
For the robust method,
one gets:
(very close to the target values)
The second result is much closer to the real values. But it gets worst. If we classify as outliers those observations for which the estimatedz -score is larger than F is Gaussian) then the bayesian approach finds that all the observations are outliers (the robust procedure, in contrast, flags all and only the outliers as such). This also implies that if you were to run a usual (non-robust) bayesian analysis on the data not classified as outliers by the robust procedure, you should do fine (e.g. fulfil the objectives stated in your question).t distribution fitted to contaminated data cannot be depended upon to reveal the outliers.
th
(remember that the prior is thatThis is just an example, but it's actually fairly straightforward to show that (and it can done formally, see for example, in chapter 2 of [1]) the parameters of a student
источник
In bayesian analysis using the inverse Gamma distribution as a prior for the precision (the inverse of the variance) is a common choice. Or the inverse Wishart distribution for multivariate models. Adding a prior on the variance improves robustness against outliers.
There is a nice paper by Andrew Gelman: "Prior distributions for variance parameters in hierarchical models" where he discusses what good choices for the priors on the variances can be.
источник
A robust estimator for the location parameterμ of some dataset of size N is obtained when one assigns a Jeffreys prior to the variance σ2 of the normal distribution, and computes the marginal for μ , yielding a t distribution with N degrees of freedom.
Similarly, if you want a robust estimator for the standard deviationσ of some data D , we can do the following:
First, we suppose that the data is normally distributed when its mean and standard deviation are known. Therefore,
источник
I have followed the discussion from the original question. Rasmus when you say robustness I am sure you mean in the data (outliers, not miss-specification of distributions). I will take the distribution of the data to be Laplace distribution instead of a t-distribution, then as in normal regression where we model the mean, here we will model the median (very robust) aka median regression (we all know). Let the model be:
Of course our goal is to estimate model parameters. We expect our priors to be vague to have an objective model. The model at hand has a posterior of the formf(β,σ,Y,X) . Giving β a normal prior with large variance makes such a prior vague and a chis-squared prior with small degrees of freedom to mimic a jeffrey's prior(vague prior) is given to to σ2 . With a Gibbs sampler what happens? normal prior+laplace likehood=???? we do know. Also chi-square prior +laplace likelihood=??? we do not know the distribution. Fortunately for us there is a theorem in (Aslan,2010) that transforms a laplace likelihood to a scale mixture of normal distributions which then enable us to enjoy the conjugate properties of our priors. I think the whole process described is fully robust in terms of outliers. In a multivariate setting chi-square becomes a a wishart distribution, and we use multivariate laplace and normal distributions.
источник
Suppose that you haveK groups and you want to model the distribution of their sample variances, perhaps in relation to some covariates x . That is, suppose that your data point for group k∈1…K is Var(yk)∈[0,∞) . The question here is, "What is a robust model for the likelihood of the sample variance?" One way to approach this is to model the transformed data ln[Var(yk)] as coming from a t distribution, which as you have already mentioned is a robust version of the normal distribution. If you don't feel like assuming that the transformed variance is approximately normal as n→∞ , then you could choose a probability distribution with positive real support that is known to have heavy tails compared to another distribution with the same location. For example, there is a recent answer to a question on Cross Validated about whether the lognormal or gamma distribution has heavier tails, and it turns out that the lognormal distribution does (thanks to @Glen_b for that contribution). In addition, you could explore the half-Cauchy family.
Similar reasoning applies if instead you are assigning a prior distribution over a scale parameter for a normal distribution. Tangentially, the lognormal and inverse-gamma distributions are not advisable if you want to form a boundary avoiding prior for the purposes of posterior mode approximation because they peak sharply if you parameterize them so that the mode is near zero. See BDA3 chapter 13 for discussion. So in addition to identifying a robust model in terms of tail thickness, keep in mind that kurtosis may matter to your inference, too.
I hope this helps you as much as your answer to one of my recent questions helped me.
источник