R
R data types
So far, we have only dealt with individual numbers. Yet the real power of R is in handling many numbers at the same time. Below, you will learn the most important types of data R distinguishes.
Individual variables in R can be numbers (-6, 55, 1.156 or 8.7e-15), strings (“male”, “University of Freiburg”, “IDA 2016”), factors (1,2,”C1”, “B2”), logicals (TRUE, FALSE), vector or data frames. While numbers and strings should be fairly straightforward to understand, some explanation is needed for the other types.
Factors
A factor can look like a string or a number. Importantly, however, factors have only a limited range of values. For example, if you divide your participants into three groups, 1, 2 and 3, the label names are numbers. Nevertheless, it does not make sense to treat them as such – the scores observed in group 2 will not necessarily be the mean of the scores in groups 1 and 3. A factor is a datatype that does this – it expresses that a label has only a limited range of values. Depending on your labels, R can recognize the values as factors automatically, e.g. if you have groups A, B and C. If the names of your groups are numbers, this may fail. In that case, you need to convert them manually, using the function factor().
Logicals
A logical is a value expressing the result of a logical comparison and can be either TRUE or FALSE. The comparison can be made using the standard logical operators:< less than> greater than<= less than or equal>= greater than or equal== equal to!= not equal to| entry wise or& entry wise and
These operators are used in the form value1 operator value2, e.g. 7 > 5
Vectors
A vector is a list of values which are stored under one name. Thus 1,2,3,4 and 5 can be stored as individual values a, b, c, d, e or as one vector of length 5. This can be done by combining these values using the function c() before assigning the vector a name with the usual assignment operator, e.g. a <- c(1,2,3).
If you use two vectors in an arithmetic operation, e.g.:u <- c(1,2) v <- c(4,5)?u + v
R will try to perform the operation element-wise, i.e. 1+4 and 2+5. If the length of the vectors does not match, R will attempt to broadcast the vectors. Thus:u <- c(1,2,3,4)v <- c(4,3)u + v
Will lead to R repeatedly drawing values from the shorter vector to match the longer one. Here, 1+4, 2+3, 3+4, and 4+3 will be performed. This will only work if the length of the longer vector can be divided by the length of the shorter vector without a decimal result.
Data frames
Data frames are used to store several values of (possibly) different data types within one variable. In order to create a data frame, the function data.frame() is used.
For example, if we have three participants and for each of them we counted the number of rhotic and non-rhotic pronunciations they used in an interview, we can store our results like this:participants <- c("Mary", "Ali", "John")rhotic <- c(50,37,0)nonrhotic <- c(1,31,24)data <- data.frame(participants, rhotic, nonrhotic)
If you now type in data, the data frame will be displayed:
The column names are automatically inferred from the variables you used to create the data frame. If you want a column to have a different name, you can change it inside the data.frame() function.data <- data.frame(first_name=participants, rhotic, non-rhotic=nonrhotic)