Итак, у меня есть набор данных процентов, например, так:
100 / 10000 = 1% (0.01)
2 / 5 = 40% (0.4)
4 / 3 = 133% (1.3)
1000 / 2000 = 50% (0.5)
Я хочу найти стандартное отклонение в процентах, но взвешенное для их объема данных. т.е. первая и последняя точки данных должны доминировать в расчете.
Как я могу это сделать? И есть ли простой способ сделать это в Excel?
Ответы:
Формула для взвешенного стандартного отклонения является:
где
Remember that the formula for weighted mean is:
Use the appropriate weights to get the desired result. In your case I would suggest to useNumber of cases in segmentTotal number of cases .
To do this in Excel, you need to calculate the weighted mean first. Then calculate the(xi−x¯∗)2 in a separate column. The rest must be very easy.
источник
The formulae are available various places, including Wikipedia.
The key is to notice that it depends on what the weights mean. In particular, you will get different answers if the weights are frequencies (i.e. you are just trying to avoid adding up your whole sum), if the weights are in fact the variance of each measurement, or if they're just some external values you impose on your data.
In your case, it superficially looks like the weights are frequencies but they're not. You generate your data from frequencies, but it's not a simple matter of having 45 records of 3 and 15 records of 4 in your data set. Instead, you need to use the last method. (Actually, all of this is rubbish--you really need to use a more sophisticated model of the process that is generating these numbers! You apparently do not have something that spits out Normally-distributed numbers, so characterizing the system with the standard deviation is not the right thing to do.)
In any case, the formula for variance (from which you calculate standard deviation in the normal way) with "reliability" weights is
wherex∗=∑wixi/∑wi is the weighted mean.
You don't have an estimate for the weights, which I'm assuming you want to take to be proportional to reliability. Taking percentages the way you are is going to make analysis tricky even if they're generated by a Bernoulli process, because if you get a score of 20 and 0, you have infinite percentage. Weighting by the inverse of the SEM is a common and sometimes optimal thing to do. You should perhaps use a Bayesian estimate or Wilson score interval.
источник
Column
G
are weights, ColumnH
are valuesисточник
If we treat weights like probabilities, then we build them as follows:
Next, obviously the weighted mean is
источник
источник