% File src/library/base/man/Extract.data.frame.Rd % Part of the R package, https://www.R-project.org % Copyright 1995-2020 R Core Team % Distributed under GPL 2 or later \name{Extract.data.frame} \alias{[.data.frame} \alias{[[.data.frame} \alias{[<-.data.frame} \alias{[[<-.data.frame} % \alias{$.data.frame} \alias{$<-.data.frame} \title{Extract or Replace Parts of a Data Frame} \description{ Extract or replace subsets of data frames. } \usage{ \method{[}{data.frame}(x, i, j, drop = ) \method{[}{data.frame}(x, i, j) <- value \method{[[}{data.frame}(x, ..., exact = TRUE) \method{[[}{data.frame}(x, i, j) <- value % \method{$}{data.frame}(x, name) \method{$}{data.frame}(x, name) <- value } \arguments{ \item{x}{data frame.} \item{i, j, ...}{elements to extract or replace. For \code{[} and \code{[[}, these are \code{numeric} or \code{character} or, for \code{[} only, empty or \code{logical}. Numeric values are coerced to integer as if by \code{\link{as.integer}}. For replacement by \code{[}, a logical matrix is allowed.} \item{name}{ a literal character string or a \link{name} (possibly \link{backtick} quoted).} \item{drop}{logical. If \code{TRUE} the result is coerced to the lowest possible dimension. The default is to drop if only one column is left, but \bold{not} to drop if only one row is left.} \item{value}{a suitable replacement value: it will be repeated a whole number of times if necessary and it may be coerced: see the Coercion section. If \code{NULL}, deletes the column if a single column is selected.} \item{exact}{logical: see \code{\link{[}}, and applies to column names.} } \details{ Data frames can be indexed in several modes. When \code{[} and \code{[[} are used with a single vector index (\code{x[i]} or \code{x[[i]]}), they index the data frame as if it were a list. In this usage a \code{drop} argument is ignored, with a warning. There is no \code{data.frame} method for \code{$}, so \code{x$name} uses the default method which treats \code{x} as a list (with partial matching of column names if the match is unique, see \code{\link{Extract}}). The replacement method (for \code{$}) checks \code{value} for the correct number of rows, and replicates it if necessary. When \code{[} and \code{[[} are used with two indices (\code{x[i, j]} and \code{x[[i, j]]}) they act like indexing a matrix: \code{[[} can only be used to select one element. Note that for each selected column, \code{xj} say, typically (if it is not matrix-like), the resulting column will be \code{xj[i]}, and hence rely on the corresponding \code{[} method, see the examples section. If \code{[} returns a data frame it will have unique (and non-missing) row names, if necessary transforming the row names using \code{\link{make.unique}}. Similarly, if columns are selected column names will be transformed to be unique if necessary (e.g., if columns are selected more than once, or if more than one column of a given name is selected if the data frame has duplicate column names). When \code{drop = TRUE}, this is applied to the subsetting of any matrices contained in the data frame as well as to the data frame itself. The replacement methods can be used to add whole column(s) by specifying non-existent column(s), in which case the column(s) are added at the right-hand edge of the data frame and numerical indices must be contiguous to existing indices. On the other hand, rows can be added at any row after the current last row, and the columns will be in-filled with missing values. Missing values in the indices are not allowed for replacement. For \code{[} the replacement value can be a list: each element of the list is used to replace (part of) one column, recycling the list as necessary. If columns specified by number are created, the names (if any) of the corresponding list elements are used to name the columns. If the replacement is not selecting rows, list values can contain \code{NULL} elements which will cause the corresponding columns to be deleted. (See the Examples.) Matrix indexing (\code{x[i]} with a logical or a 2-column integer matrix \code{i}) using \code{[} is not recommended. For extraction, \code{x} is first coerced to a matrix. For replacement, logical matrix indices must be of the same dimension as \code{x}. Replacements are done one column at a time, with multiple type coercions possibly taking place. Both \code{[} and \code{[[} extraction methods partially match row names. By default neither partially match column names, but \code{[[} will if \code{exact = FALSE} (and with a warning if \code{exact = NA}). If you want to exact matching on row names use \code{\link{match}}, as in the examples. } \section{Coercion}{ The story over when replacement values are coerced is a complicated one, and one that has changed during \R's development. This section is a guide only. When \code{[} and \code{[[} are used to add or replace a whole column, no coercion takes place but \code{value} will be replicated (by calling the generic function \code{\link{rep}}) to the right length if an exact number of repeats can be used. When \code{[} is used with a logical matrix, each value is coerced to the type of the column into which it is to be placed. When \code{[} and \code{[[} are used with two indices, the column will be coerced as necessary to accommodate the value. Note that when the replacement value is an array (including a matrix) it is \emph{not} treated as a series of columns (as \code{\link{data.frame}} and \code{\link{as.data.frame}} do) but inserted as a single column. } \section{Warning}{ The default behaviour when only one \emph{row} is left is equivalent to specifying \code{drop = FALSE}. To drop from a data frame to a list, \code{drop = TRUE} has to be specified explicitly. Arguments other than \code{drop} and \code{exact} should not be named: there is a warning if they are and the behaviour differs from the description here. } \value{ For \code{[} a data frame, list or a single column (the latter two only when dimensions have been dropped). If matrix indexing is used for extraction a vector results. If the result would be a data frame an error results if undefined columns are selected (as there is no general concept of a 'missing' column in a data frame). Otherwise if a single column is selected and this is undefined the result is \code{NULL}. For \code{[[} a column of the data frame or \code{NULL} (extraction with one index) or a length-one vector (extraction with two indices). For \code{$}, a column of the data frame (or \code{NULL}). For \code{[<-}, \code{[[<-} and \code{$<-}, a data frame. } \seealso{ \code{\link{subset}} which is often easier for extraction, \code{\link{data.frame}}, \code{\link{Extract}}. } \examples{ sw <- swiss[1:5, 1:4] # select a manageable subset sw[1:3] # select columns sw[, 1:3] # same sw[4:5, 1:3] # select rows and columns sw[1] # a one-column data frame sw[, 1, drop = FALSE] # the same sw[, 1] # a (unnamed) vector sw[[1]] # the same sw$Fert # the same (possibly w/ warning, see ?Extract) sw[1,] # a one-row data frame sw[1,, drop = TRUE] # a list sw["C", ] # partially matches sw[match("C", row.names(sw)), ] # no exact match try(sw[, "Ferti"]) # column names must match exactly \dontshow{ stopifnot(identical(sw[, 1], sw[[1]]), identical(sw[, 1][1], 80.2), identical(sw[, 1, drop = FALSE], sw[1]), is.data.frame(sw[1 ]), dim(sw[1 ]) == c(5, 1), is.data.frame(sw[1,]), dim(sw[1,]) == c(1, 4), is.list(s1 <- sw[1, , drop = TRUE]), identical(s1$Fertility, 80.2)) tools::assertError(sw[, "Ferti"]) } sw[sw$Fertility > 90,] # logical indexing, see also ?subset sw[c(1, 1:2), ] # duplicate row, unique row names are created sw[sw <= 6] <- 6 # logical matrix indexing sw ## adding a column sw["new1"] <- LETTERS[1:5] # adds a character column sw[["new2"]] <- letters[1:5] # ditto sw[, "new3"] <- LETTERS[1:5] # ditto sw$new4 <- 1:5 sapply(sw, class) sw$new # -> NULL: no unique partial match sw$new4 <- NULL # delete the column sw sw[6:8] <- list(letters[10:14], NULL, aa = 1:5) # update col. 6, delete 7, append sw ## matrices in a data frame A <- data.frame(x = 1:3, y = I(matrix(4:9, 3, 2)), z = I(matrix(letters[1:9], 3, 3))) A[1:3, "y"] # a matrix A[1:3, "z"] # a matrix A[, "y"] # a matrix stopifnot(identical(colnames(A), c("x", "y", "z")), ncol(A) == 3L, identical(A[,"y"], A[1:3, "y"]), inherits (A[,"y"], "AsIs")) ## keeping special attributes: use a class with a ## "as.data.frame" and "[" method; ## "avector" := vector that keeps attributes. Could provide a constructor ## avector <- function(x) { class(x) <- c("avector", class(x)); x } as.data.frame.avector <- as.data.frame.vector `[.avector` <- function(x,i,...) { r <- NextMethod("[") mostattributes(r) <- attributes(x) r } d <- data.frame(i = 0:7, f = gl(2,4), u = structure(11:18, unit = "kg", class = "avector")) str(d[2:4, -1]) # 'u' keeps its "unit" \dontshow{ stopifnot(identical(d[2:4,-1][,"u"], structure(12:14, unit = "kg", class = "avector"))) } } \keyword{array}