WB-ILIAS | Weiterbildung und offene Bildungsressourcen

R

Data filtering - vectors

Most of the time when dealing with our data, we do not want to use it as the full table. Rather, we are interested in comparing subsections of the data to each other: do the values in the age column predict the values in the verb_ommission column? Or what is the average age of people with high school education in our sample? This section will teach you how to do this.

The way you select certain data points depends to a degree on your data. Clearly, if the variable is used to store a single string, number, factor or logical, it can be retrieved simply by typing its name. In the case of vectors or data frames, this would retrieve the whole vector or data frame.

To retrieve a specific item from a list, simply use its index number in square brackets after the list’s name. Typing

v <- c(17,45,307)
v[2]

will return


[1] 45

You can also use the assignment operator to replace the value retrieved in this way:

v <- c(1,1,0)
v[3] <- 1

EXERCISE:

Create a vector containing the values 1,2,3,”NA”,5. Now, replace the value “NA” with the number 4.

Using the logical operators, you can also retrieve all items of a vector that fulfill the condition specified in the square brackets. You can chain the individual masks in order to specify the filter. The following command will retrieve all elements of vector v that are smaller than 100 AND larger than 30.

v[v<100&v>30]

Using the assignment sign followed by a single value will replace everything that would be retrieved by this filter.

EXERCISE:

Create a vector, containing the values 10, 13, 75, 14, 50, 185, 78. Then, replace all values lower than 50 with the string “low”, values equal or larger than 50 with the string “high”.

If the assignment sign after your filtering pattern is followed by a vector, that vector will be broadcasted on the selected items, if possible – that is if the number of items to be replaced is a multiple of the length of the vector after the assignment sign. If the length of the vector to broadcast is not a multiple of the number of items selected by your filter, an error will be raised. Typing

v <- c(0,1,1,0,0,1,0,1)
v[v == 1] <- c("A", "B")

will create vector v with these contents:

[1] "0" "A" "B" "0" "0" "A" "0" "B"

Note that the remaining numbers were also translated into strings by this process, as R does not support mixed data types inside a vector.

You can also use a symbolic expression in order to manipulate all the individual values retrieved by your filter in the same way, rather than replacing them with a single value.

EXERCISE:

Create a vector v containing the values 17, 33, 201, 75, 147. Now, divide all the values larger than 100 by 2.

Lastly, if you use a specific filter several times in your calculations, you may want to save its results under a new name. Remember, however, that this creates a hard copy – changing the original vector will not affect the saved results.



Bisher wurde noch kein Kommentar abgegeben.