 ### 22.2 Outliers, robustness and the influence of single points

Now let’s have a closer look at the influence of a single point and the squares. Again, we take a randomly selected linear model and perform the estimation (black line). Then, we add an additional point A (blue cross) and estimate again (blue line). At first, A lies on the estimation line (at x=-1). Therefore, it does not change the estimation result. But you can move the point and see what influence a single point has on the estimation result.

If you move point A to the edge, you will notice that even a single point can change the estimation result completely and can even turn it into the opposite! A point that is located far to the side receives disproportionately much weight due to the square criterion. Such a point is called an outlier. Estimators which are not so sensitive to individual points are called robust, for example: Median.

The table below shows the true value of the parameters, the values estimated with least-squares-regression, and the KQ-estimation with the additional point A.

wahr geschätzt geschätzt mit A
α
β

If you move A, the estimated parameters and the estimation line (blue) will change. If you move A as far away as possible from the original estimate (black), the curve might even tilt, i.e. a positive slope could turn into a negative one. Thus, a single point, when it lies far to the side, can turn a positively estimated relation into a negative one. It is therefore important to identify such "outliers" in the data and to check separately whether they are valid values or errors (measurement errors, device defects, transmission errors, writing errors, etc.).

Below are the values of the residuals. Certainly also interesting is the following comparison of the residual sum of squares with the residual of A in relation to the old estimate without A, i.e. we illustrate the weight that point A would have on the estimate. It is noticeable that the influence of the single point A, when it is far to the side, is always much higher in the quadratic criterion than in the absolute values. Therefore, the least-squares-estimation reacts stronger to outliers. Outliers can usually be recognized well in the residual plot, because they stick out.

Summe ohne A A bezüglich der alten Schätzung Verhältnis in Prozent
absolute Residuen

wahr geschätzt neues Modell mit A
Durchschnitt der Residuen
Durchschnitt der Absolutbeträge der Residuen