Если p-значение точно равно 1 (1.0000000), каковы должны быть пределы доверительного интервала, чтобы поддерживать истинность нулевой гипотезы? [закрыто]

12

Это чисто гипотетический вопрос. Очень распространенным утверждением является то, что никогда не соответствует действительности, это просто вопрос размера выборки.H0

Предположим, что для реального нет абсолютно никакой измеримой разницы между двумя средними ( ), взятыми из нормально распределенной популяции (как для и для приблизительно ). Мы предполагаем на группу и используем -test. Это будет означать, что значение равно что указывает на то, что нет абсолютно никакого расхождения с . Это будет означать, что статистика теста равна . Средняя разница между группами будет . Каковы будут пределы доверительного интервала для среднего различия в этом случае? Будут ли ониμ1=μ2μ=0σ=1N=16tp1.00000H00095%[0.0,0.0] ?

Главный вопрос в моем вопросе состоял в том, что когда мы можем действительно сказать, что истинно, то есть в этом случае? Или когда в частых рамках мы можем действительно сказать «без разницы» при сравнении двух средств?H0μ1=μ2

arkiaamu
источник
1
Я бы сказал, что на это уже ответили здесь stats.stackexchange.com/questions/275677/… , но я не настаиваю на этом.
Тим
1
У меня проблемы с придумыванием способа получить с положительной дисперсией населения. p=1
Дейв
3
«Мы предполагаем, что N = 16 на группу, и мы используем t-критерий. Это будет означать, что значение p равно 1,00000, что указывает на то, что нет абсолютно никакого расхождения с H0». Почему вы утверждаете, что что-то (что означает «это»?) Означает, что значение p равно 1. Чаще всего значение p равномерно распределено, когда H_0 истинно, а p = 1 происходит почти никогда.
Секст Эмпирик
2
@MartijnWeterings абсолютно верен - то, что вы выбираете два фактически идентичных распределения, не означает, что при сравнении их вы получите значение p, равное 1. По определению, в 5% случаев вы получите значение p ниже 0,05.
Ядерный Ван

Ответы:

16

Доверительный интервал для t-теста имеет вид , где и - примерные средние значения, - критическое значение при заданной , а - стандартная ошибка разности средних. Если , то . Таким образом, формула просто , а пределы - просто { ,x¯1x¯2±tcrit,αsx¯1x¯2x¯1x¯2tcrit,αtαsx¯1x¯2p=1.0x¯1x¯2=0±tcrit,αsx¯1x¯2tcrit,αsx¯1x¯2tcrit,αsx¯1x¯2}.

I'm not sure why you would think the limits would be {0,0}. The critical t value is not zero and the standard error of the mean difference is not zero.

Noah
источник
10

Being super-lazy, using R to solve the problem numerically rather than doing the calculations by hand:

Define a function that will give normally distributed values with a mean of (almost!) exactly zero and a SD of exactly 1:

rn2 <- function(n) {r <- rnorm(n); c(scale(r)) }

Run a t-test:

t.test(rn2(16),rn2(16))

    Welch Two Sample t-test

data:  rn2(16) and rn2(16)
t = 1.7173e-17, df = 30, p-value = 1
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.7220524  0.7220524
sample estimates:
   mean of x    mean of y 
6.938894e-18 8.673617e-19 

The means are not exactly zero because of floating-point imprecision.

More directly, the CIs are ± sqrt(1/8)*qt(0.975,df=30); the variance of each mean is 1/16, so the pooled variance is 1/8.

Ben Bolker
источник
8

The CI can have any limits, but it is centered exactly around zero

For a two-sample T-test (testing for a difference in the means of two populations), a p-value of exactly one corresponds to the case where the observed sample means are exactly equal. (The sample variances can take on any values.) To see this, note that the p-value function for the test is is:

pp(x,y)=P(|X¯Y¯SY/nY+SY/nY||x¯y¯sY/nY+sY/nY|).

Thus, setting x¯=y¯ yields:

p(x,y)=P(|X¯Y¯SY/nY+SY/nY|0)=1.

Now, suppose you form the standard (approximate) confidence interval using the Welch-Satterwaite approximation. In this case, assuming that x¯=y¯ (to give an exact p-value of one) gives the confidence interval:

CI(1α)=[0±sXnX+tDF,α/2sYnY],

where the degrees-of-freedom DF is determined by the Welch-Satterwaite approximation. Depending on the observed sample variances in the problem, the confidence interval can be any finite interval centered around zero. That is, the confidence interval can have any limits, so long as it is centered exactly around zero.


Of course, if the underlying data actually come from a continuous distribution, this event occurs with probability zero, but let's assume it happens.

Ben - Reinstate Monica
источник
The question says "σ estimated =1".
Acccumulation
That condition is not necessary to get a p-value of one, so I have dropped it.
Ben - Reinstate Monica
3

It is difficult to have a cogent philosophical discussion about things that have 0 probability of happening. So I will show you some examples that relate to your question.

If you have two enormous independent samples from the same distribution, then both samples will still have some variability, the pooled 2-sample t statistic will be near, but not exactly 0, the P-value will be distributed as Unif(0,1), and the 95% confidence interval will be very short and centered very near 0.

An example of one such dataset and t test:

set.seed(902)
x1 = rnorm(10^5, 100, 15)  
x2 = rnorm(10^5, 100, 15)
t.test(x1, x2, var.eq=T)

        Two Sample t-test

data:  x1 and x2
t = -0.41372, df = 2e+05, p-value = 0.6791
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.1591659  0.1036827
sample estimates:
mean of x mean of y 
 99.96403  99.99177 

Here are summarized results from 10,000 such situations. First, the distribution of P-values.

set.seed(2019)
pv = replicate(10^4, 
   t.test(rnorm(10^5,100,15),rnorm(10^5,100,15),var.eq=T)$p.val)
mean(pv)
[1] 0.5007066   # aprx 1/2
hist(pv, prob=T, col="skyblue2", main="Simulated P-values")
 curve(dunif(x), add=T, col="red", lwd=2, n=10001)

enter image description here

Next the test statistic:

set.seed(2019)  # same seed as above, so same 10^4 datasets
st = replicate(10^4, 
       t.test(rnorm(10^5,100,15),rnorm(10^5,100,15),var.eq=T)$stat)
mean(st)
[1] 0.002810332  # aprx 0
hist(st, prob=T, col="skyblue2", main="Simulated P-values")
 curve(dt(x, df=2e+05), add=T, col="red", lwd=2, n=10001)

enter image description here

And so on for the width of the CI.

set.seed(2019)
w.ci = replicate(10^4, 
        diff(t.test(rnorm(10^5,100,15),
         rnorm(10^5,100,15),var.eq=T)$conf.int)) 
mean(w.ci)
[1] 0.2629603

It is almost impossible to get a P-value of unity doing an exact test with continuous data, where assumptions are met. So much so, that a wise statistician will ponder what might have gone wrong upon seeing a P-value of 1.

For example, you might give the software two identical large samples. The programming will carry on as if these are two independent samples, and give strange results. But even then the CI will not be of 0 width.

set.seed(902)
x1 = rnorm(10^5, 100, 15)  
x2 = x1
t.test(x1, x2, var.eq=T)

        Two Sample t-test

data:  x1 and x2
t = 0, df = 2e+05, p-value = 1
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval: 
 -0.1316593  0.1316593
sample estimates:
mean of x mean of y 
 99.96403  99.96403 
BruceET
источник
That's all fair enough, however, because the normal distribution is continuous, the probability for any specific example is zero, doesn't matter whether u1=u2 or u1-u2=-0.977 or whatever. I was tempted to comment along the lines of "this will never happen and chances are something wnet wrong in that case" as well, but then I thought, no, it makes some sense to say, assume this has happened, accepting that this has probability zero as any specific example.
Lewian
1
This is the right answer to the wrong question
David
1
@ David. Possibly so. If you can state what you believe to be the right question and suggest an answer, that might be helpful all around. I attempted only to address a few of what I thought were several misconceptions.
BruceET
The OP stated " A very common statement is that H0 is never true." @BruceET's answer demonstrates WHY H0 can never be accepted. The closer H0 comes to being true, the more uniformly random P becomes, that means a P between (0.98,0.99) is just as likely as a P between (0.1, 0.2) when H0 is true.
Ron Jensen - We are all Monica
1

The straightforward answer (+1 to Noah) will explain that the confidence interval for the mean difference may still be of nonzero length because it depends on the observed variation in the sample in a different way than the p-value does.

However you might still wonder why it is like that. Since it is not soo strange to imagine that a high p-value also means a small confidence interval. After all they both correspond to something that is close to a confirmation of the null hypothesis. So why is this thought not correct?

A high p-value is not the same as a small confidence interval.

  • The p-value is an indicator of how extreme a particular observation is (extreme given some hypothesis) by expressing how probable it is to observe a given deviation. It is an expression of the observed effect size in relation to the accuracy of the experiment (a large observed effect size might not mean very much when the experiment is such 'inaccurate' that these observations are not extreme from a statistical/probabilistic point of view). When you observe a p-value of 1 then this (only) means that you observed zero effect because the probability to observe such zero result or larger is equal to 1 (but this is not the same as that there is zero effect).

    Sidenote: Why p-values? The p-value expresses the actual observed effect size in relation to the expected effect sizes (probabilities). This is relevant because experiments might, by design, generate observations of some relevant effect size by pure chance due to common fluctuations in data/onservations. Requiring that an observation/experiment has a low p-value means that the experiment has a high precision - that is: the observed effect size is less often/likely due to chance/fluctuations (and it might be likely due to a true effect).

    Sidenote: for continuous variables this p-value equal to 1 occurs almost never because it is an event that has zero measure (E.g. for a normal distributed variable XN(0,1) you have P(X=0)=0). But for a discrete variable or discretized continuous variable it can be the case (at least the probability is nonzero).

  • The confidence interval might be seen as the range of values for which an α level hypothesis test would succeed (for which the p-value is above α).

    You should note that a high p-value is not (neccesarily) a proof/support/whatever for the null hypothesis. The high p-value only means that the observation is not remarkable/extreme for a given null hypothesis, but this might just as well be the case for the alternative hypothesis (ie the result is in accordance with both hypotheses yes/no effect). This typically occurs when the data does not carry much information (eg high noise or small sample).

Example: Imagine you have a bag of coins for which you have fair and unfair coins and you want to classify a certain coin by flipping it 20 times. (say the coin is a bernoulli variable with p0.5 for fair coins and pU(0,1) for unfair coins. In this case, when you observe 10 heads and 10 tails, then you might say the p-value is equal to 1, but I guess that it is obvious that an unfair coin might just as well create this result and we should not rule out the possibility that the coin is unfair.

Sextus Empiricus
источник
1

Main point in my question was that when can we really say that H0 is true, i.e. μ1=μ2 in this case?

No, because "absence of evidence is not evidence of absence." Probability can be thought as an extension of logic, with added uncertainties, so imagine for a moment that instead of real numbers on unit interval, the hypothesis test would return only the binary values: 0 (false) or 1 (true). In such case, the basic rules of logic apply, as in the following example:

  • If it rained outside, then the ground being wet is likely.
  • The ground is wet.
  • Therefore, it rained outside.

The ground could very well be wet because it rained. Or it could be due to a sprinkler, someone cleaning their gutters, a water main broke, etc. More extreme examples can be found in the link above.

As about confidence interval, if your sample is large, and μ1μ20, then the confidence interval for the difference would become extremely narrow, but non-zero. As noticed by others, you could observe things like exact ones and zeros, but rather because of the floating-point precision limitations.

Even if you observed p=1 and the ±0 confidence interval, you still need to keep in mind that the test gives you only the approximate answer. When doing hypothesis testing, we not only make the assumption that H0 is true, but also make a number of other assumptions, like that the samples are independent and come from normal distribution, what is never the case for real-world data. The test gives you an approximate answer, to ill-posed question, so it cannot "prove" the hypothesis, it can just say "under those unreasonable assumptions, this would be unlikely".

Tim
источник
0

Nothing stops you from using standard t- or Gauss-formulae for computing the confidence interval - all informations needed are given in your question. p=1 doesn't mean that there's anything wrong with that. Note that p=1 does not mean that you can be particularly sure that the H0 is true. Random variation is still present and if u0=u1 can happen under the H0, it can also happen if the true value of u0 is slightly different from the true u1, so there will be more in the confidence interval than just equality.

Lewian
источник
I did some editing, I hope it's more defined now.
arkiaamu
OK, I removed references to what was ill-defined in the earlier version. The question has in the meantime been answered properly by others.
Lewian
Please use MathJax notation
David
0

A very common statement is that H0 is never true, it's just a matter of sample size.

Not among people who know what they're talking about, and are speaking precisely. Traditional hypothesis testing never concludes that the null is true, but whether the null is true or not is separate from whether the null is concluded to be true.

This would mean that p-value is 1.00000

For a two-tailed test, yes.

indicating that there is absolutely no discrepancy from H0.

H0 is a statement about the distribution. The mode of the distribution given in H0 is 0, so there's no discrepancy between the observation and the mode of the distribution, but it's not quite correct to say there's no discrepancy from H0. No individual result would be a discrepancy, because any value could come from the distribution. Each p-value is equally likely. Getting a p-value of exactly .01 is just as likely as getting a p-value of exactly 1 (apart from discretization issues). If you had a bunch of independent samples, and their distribution didn't match what H0 predicts, that would much more legitimately be called a "discrepancy" than would merely seeing a single sample whose mean doesn't match the mode.

What would be the limits of 95% confidence interval for the mean difference in this case?

To first approximation, the limits of a 95% confidence interval are about twice the applicable standard deviation. There is no discontinuity at zero. If you find a function f(ϵ) that finds the 95% confidence interval for a difference in means of ϵ, you can simply take limϵ0f(ϵ) to find the confidence interval for a mean difference of zero.

Main point in my question was that when can we really say that H0 is true, i.e. μ1=μ2 in this case?

We can say whatever we want. However, saying that a test shows the null to be true is not consistent with traditional hypothesis testing, regardless of the results. And doing so is not well-founded from an evidenciary standpoint. The alternative hypothesis, that the means are not the same, encompasses all possible difference in means. The alternative hypothesis is "The difference in means is 1, or 2, or 3, or .5, or .1, ..." We can posit an arbitrarily small difference in means, and that will be consistent with the alternative hypothesis. And with an arbitrarily small difference, the probability given that mean is arbitrarily close to the probability given the null. Also, the alternative hypothesis encompasses not only the possibility that the parameters of the distributions, such as the mean, are different, but that there's an entirely different distribution. For instance, the alternative hypothesis encompasses "The two samples will always have a difference in means that this is either exactly 1 or exactly 0, with probability .5 for each". The results are more consistent with that then they are with the null.

Acccumulation
источник