Absolute deviation
From Wikipedia, the free encyclopedia
In statistics,
the absolute deviation of an element of a data set
is the absolute difference between that element and a given point. Typically the point
from which the deviation is measured is a measure of central tendency, most often the median or sometimes the mean of the data set.
Di = | xi − m(X)
|
where
Di
is the absolute deviation,
xi
is the data element
and m(X) is the chosen measure of central tendency of the data set—sometimes the mean (), but most
often the median.
Measures
of dispersion
Average
absolute deviation
The average absolute deviation,
or simply average deviation of a data set is the average
of the absolute deviations and is a summary statistic of statistical dispersion or variability. It is also called the mean absolute
deviation, but this is easily confused with the median
absolute deviation.
The average absolute deviation of a
set {x1, x2, ..., xn} is
The choice of measure of central tendency, m(X), has a marked effect on the value of the average deviation.
For example, for the data set {2, 2, 3, 4, 14}:
Measure
of central tendency m(X)
|
Average
absolute deviation
|
Mean = 5
|
|
Median = 3
|
|
Mode = 2
|
|
The average absolute deviation from
the median is less than or equal to the average absolute deviation from the
mean. In fact, the average absolute deviation from the median is always less
than or equal to the average absolute deviation from any other fixed number.
The average absolute deviation from
the mean is less than or equal to the standard deviation; one way of proving this relies on Jensen's inequality.
For the normal or "Gaussian" distribution, the ratio of mean
absolute deviation to standard deviation is .[1] Thus if X is a normally
distributed random variable with expected value 0 then
In other words, for a Gaussian, mean
absolute deviation is about 0.8 times the standard deviation.
Mean
absolute deviation
The mean absolute deviation
(MAD) is the mean absolute deviation from the mean. A related
quantity, the mean absolute error (MAE), is a common measure of forecast error
in time series analysis, where this measures the average absolute deviation of
observations from their forecasts.
Although the term mean deviation
is used as a synonym for mean absolute deviation, to be precise it is not the
same; in its strict interpretation (namely, omitting the absolute value
operation), the mean deviation of any data set from its mean is always zero.
Median
absolute deviation
The median absolute deviation (also
MAD) is the median absolute deviation from the median. It is a
robust estimator of dispersion.
For the example {2, 2, 3, 4, 14}: 3
is the median, so the absolute deviations from the median are {1, 1, 0, 1, 11}
(or reordered as {0, 1, 1, 1, 11}) with a median absolute deviation of 1, in
this case unaffected by the value of the outlier 14.
Maximum
absolute deviation
The maximum absolute deviation
about a point is the maximum of the absolute deviations of a sample from that
point. It is realized by the sample maximum
or sample minimum and cannot be less than half the range.
Minimization
The measures of statistical
dispersion derived from absolute deviation characterize various measures of
central tendency as minimizing dispersion: The median is the measure of
central tendency most associated with the absolute deviation, in that
L2 norm statistics
L1 norm statistics
the median minimizes average absolute deviation,
L∞ norm statistics
for example, the midhinge
(average of first and third quartiles)
which minimizes the median absolute deviation of the whole distribution,
also minimizes the maximum absolute deviation of the distribution after
the top and bottom 25% have been trimmed off.
Estimation
The mean absolute deviation of a
sample is a biased estimator of the mean absolute deviation of the population. In order
for the absolute deviation to be an unbiased estimator, the expected value
(average) of all the sample absolute deviations must equal the population
absolute deviation. However, it does not. For the population 1,2,3 the
population absolute deviation is 2/3. The average of all the sample standard
deviations of size 3 that can be drawn from the population is 40/81. Therefore
the absolute deviation is a biased estimator.
See
also
- Deviation (statistics)
- Errors and residuals in statistics
- Least absolute deviations
- Loss function
- Median absolute deviation
No comments:
Post a Comment