R
Previewing the data
In order to get a first impression of the data, you can use several methods: view the whole data, view a part of the data, or view an automatic summary of the data. You can also learn a lot about the data by visualizing it. This will be discussed in a separate section in this chapter.
To view the whole data, you just need to type in the name of the variable you used to store the data. This works equally well for individual values, vectors or data frames. The disadvantage of that is that the data is displayed in its entirety, making it often hard to navigate.
Thus, it is often worth viewing only a section of the data. For this purpose, the functions head() or tail() can be used. By using them with the variable you intend to inspect, they will display the first/last 6 rows of the data and the column names. This is an efficient way to verify that the data was imported correctly.
Alternatively, you can use the function str() to provide you with information about the structure of the data, showing you the data types assigned to the individual columns as well as the number of observations and variables.
The function summary() then summarizes the individual columns for you, depending on their data type. For factors, it shows the most frequent values and their frequency, for numbers it gives you information about the distribution of the numerical values (smallest/largest value observed, mean, median, 1st and 3rd quantile). It allows you to learn whether the distribution may be skewed or worth closer inspection.
If you have a set of factors in your data and want to see whether they are distributed evenly in various groups (e.g. whether you have more highly educated women than men in your sample), you can use the function ftable(). It uses the formula notation dependent ~ independent: the variable to the left of the formula operator ~ is the one you want to predict while anything on the right is used for the prediction or division of the data. Thus, ftable(data$education ~ data$gender) will show you the education levels of the participants with respect to their gender.
If you are interested in the percentages rather than the exact numbers, you can wrap ftable() in prop.table(). This will provide you with the ratios the individual combinations represent. Most of the time, you will want to specify the axis for the calculation of the ratios by typing in 1 (rows) or 2 (columns) after the data, e.g. prop.table(ftable(data$education ~ data$gender), 2).