\name{sqlite.data.frame} \alias{sqlite.data.frame} \alias{sdf} \title{SQLite Data Frame} \description{ Creates an Sqlite Data Frame (SDF) from ordinary data frames. } \usage{ sqlite.data.frame(x, name=NULL) } \arguments{ \item{x}{The object to be coerced into a data frame which is then stored in a SQLite database. \code{as.data.frame} is called first on x before creating the SDF database. } \item{name}{The internal name of the SDF. If none is provided, a generic name \emph{data} is used (e.g. data1, data2, etc). Each SDF should have a unique internal name and also be a valid R symbol. Numbers are appended to names in case of duplicates, e.g. if name arg is \emph{iris}, and it already exists, then the new SDF will have a name \emph{iris1}. If it still exists, then the name will be \emph{iris2}, and so on. } } \details{ SQLite data frames (SDF's) are data frames whose data are stored in a SQLite database. SQLite is an open source, powerful (considering its size), light weight data base engine. It stores each database (composed of tables, indices, etc.) in a single file. Since a single SDF occupies a whole database, each SDF will be contained in a single file. Each SDF file contains the following tables: \itemize{ \item{sdf\_attributes}{a key-value table that contains the SDF attributes. Currently, only \emph{name} is used representing the SDF's internal name.} \item{sdf\_data}{contains the actual data. Factors and ordered variables are stored as integers. Their levels are stored in other tables. Numeric (real) are stored as double, characters as text and integers as int's. Currently, complex numbers are not supported. Column names correspond exactly to the variable names of the ordinary data frames. E.g. Petal.Length will have a column name Petal.Length in the table. This is possible because SQLite allows almost any kind of column name as long as it is quoted by square brakets ([ ]). You're on your own if you try to be a smartass on this. Also, an extra column named \emph{row name} (with the space between the words), of type text is used to store the data frame row names and is set as the table's primary key. So please don't use \emph{row name} as a variable name.} \item{[factor ] and [ordered ]}{stores the levels and level labels for each factor variable in the SDF. One such table will be created for every factor or ordered var, even if two variables share the same level labels. Besides storing the level data, it is used to mark a column as being a factor.} } SDF's are managed in a workspace separate from R's. When SQLiteDF is loaded, it searches for the file \code{workspace.db} inside the subdirectory \code{.SQLiteDF} in the current working directory. This file contains a list of SDF's created/used in the previous session (i.e. SQLiteDF sessions are automatically saved), including their full and relative path and attach information. Workspace is managed using the SQLite engine by opening \code{workspace.db} as the main database and then attaching (SQLite's attach) the SDF's. Unfortunately, the number of attached databases is limited to 31 (actually 32, but 1 is reserved for the temp db). Therefore, SDF's are \emph{scored} according to the number of times it has been used. When the maximum allowed attachment is reached, the least used attached SDF's is detached and the needed one is attached in its place. On compiling the package, the configure script modifies the bundled SQLite source such that constant controlling the maximum attachments is modified to 31 (default is 10). Back to when SQLiteDF is loaded, after opening \code{workspace.db}, the SDF's stored in the list are sorted according to their number of uses in the previous session and then the first 30 are attached. The relative path is used for finding the SQLite file. If the file cannot be found, it is deleted from the SQLiteDF workspace (with a warning message). The scores are then all reset. A sqlite.data.frame object is a list a single element: \itemize{ \item{iname}{the internal name of the SDF.} } and the following attributes: \itemize{ \item{class}{The S3 class vector \code{c("sqlite.data.frame", "data.frame")}} \item{row.names}{A sqlite.vector of mode character containing the row names of the SDF} } All SDF's created in the session will have their SQLite file stored in the subdirectory .SQLiteDF in the current working directory. SDF's created in the other session can be imported/attached to the current SDF workspace using \code{attachSdf}, which may reside anywhere in the file system. } \value{ A S3 object representing the SDF. The SDF database will be created in the same directory with file name derived by appending the extension \emph{db} to the passed internal name, or the default internal name if none is provided. } \author{Miguel A. R. Manese} \note{ The full path is used to avoid attaching the same db which may have different relative path after the user changes directory after loading SQLiteDF (see \code{attachSdf}). } \seealso{ \code{\link[SQLiteDF]{lsSdf}} \code{\link[SQLiteDF]{getSdf}} \code{\link[SQLiteDF]{attachSdf}} \code{\link[SQLiteDF]{detachSdf}} } \examples{ library(datasets) iris.sdf <- sqlite.data.frame(iris) names(iris.sdf) class(iris.sdf) iris.sdf$Petal.Length[1:10] iris.sdf[["Petal.Length"]][1:10] iris.sdf[1,c(TRUE,FALSE)] #apply(iris.sdf[1:4], 2, mean) } \keyword{data} \keyword{manip} \keyword{classes}