Data Manipulation
R has been widely adopted in the scientific community because, among other things, of its powerful tools for data manipulation. This chapter explains how these tools can be used to manipulate genetic and genomic data. It also introduces the data sets used in the case studies to illustrate the methods introduced in the following chapters.
Basic Data Manipulation in R
Subsetting, Replacement, and Deletion
Subsetting, replacement, and deletion are the three basic operations of data manipulation and they are performed efficiently in R with the 1 [ operator. This data manipulation system is very powerful and it is common to use it for data filtering, quality check, or other upstream data management before proper data analyses. The ' [' operator is actually generic and there are methods for all the classes described in the previous chapter. It implies that the same syntax can be used for all types of data (real numbers, integers, bases, genotypes, alleles, etc.)
Table 4.1 shows simple examples to illustrate the different types of indexing. These examples apply to vectors and lists; however, they can also be used with matrices (and arrays) using two or more series of indices separated by comma(s). For instance, x[, 1:2] will select the first and second columns of the matrix x (or x [, , 1:2] for a 3-d array).
The classes "loci" and "DNAbin" are S3 objects. They are classical R objects (vector, matrix, list, ...) with a class attribute so that R “knows” they must be treated specially. The other classes described in the previous chapter are S4 objects: they are made of “slots” which are accessed or modified with the 0 operator. There are other differences between S3 and S4 which are not important here (they are for developers writing functions to manipulate these objects).
Table 4.1
The three types of data indexing with the ‘ [’ operator. The vector x was created with: x <- 1:10; names (x) <- letters [1:10] . The three types of indexing have the same output for each operation
Operation |
Indexing type |
||
Numeric |
Logical |
With names |
|
Subsetting |
x <- x [1:2] |
x <- x[x < 3] |
x <- x[c("a", "b")] |
Replacement |
x[1:2] <- 0 |
x[x < 3] <- 0 |
x[c("a", "b")] <- 0 |
Deletion |
x <- x[-(l:2)] |
x <- x[x > 2] |
x <- x[! names(x) 7.in°/t c("a", "b")] |