WB-ILIAS | Weiterbildung und offene Bildungsressourcen

R

Boxplot

If the independent data is a factor (e.g. gender), using the standard plot() function will not work. Rather, you can use the function boxplot(). Boxplot is a graph visualizing the distribution of numeric variables as dependent on a limited number of categories. It shows the median, the first and third quartile as well as the range of values not deemed outliers (within 1.5 times IQR from the median) The boxplot understands the formula notation: thus rather than splitting the data manually into the sets, you can tell boxplot() which variable to show and which one to use as the splitting criterion.

For example, in order to see how do the reading scores depend on the gender of the participants, the command boxplot(data$reading_skill ~ data$gender) should be used. This will produce a boxplot such as the one below:

This plot visualizes that though there is slightly more variation in the scores of men, there is only a small difference between men and women in terms of the median of their scores. To decide whether it is worth further investigation, you would need to know the size of the sample - the number of participants. If this boxplot comes from 10 women and 10 men, it is unlikely that such a small difference is caused by any real trend, more likely it is due to chance. If the same boxplot is produced by scores of 100,000 women and 100,000 men, even such a small difference may suggest a true difference between the groups.

qtitle

Median: the value separating the higher half of a data sample from the lower half.It is the "middle" value. For example, in the data set {1,1,5,6,8,8,9}, the median is 6, the fourth largest, and also the fourth smallest, number in the sample.
Quartiles: are the three points that divide the ordered data set into four equal groups, each group comprising a quarter of the data
IQR: Or "interquartile range" - the difference between 3rd quartile and 1st quartile



Bisher wurde noch kein Kommentar abgegeben.