This branch contains modifications to the representation of data in STRSXPs that tries to avoid creation of CHARSXP objects, or at least defer their creation until they are needed. Integers that are converted to strings are initially stored as integers unless/until they are needed, thus avoiding the sprintf as well as creating the CHARSXP. (This is also done for doubles, but that probably isn't all that useful.) Short ASCII strings are stored directly rather than being boxed in CHARSXPs. The hope is that reducing the number of CHARSXPs will reduce pressure on the GC. At this point the main example where this helps is in avoiding creating case labels in lm(). In R-devel, > n <- 10000000 > p <- 5 > x <- matrix(rnorm(n * p), n, p) > y <- rnorm(n) > > system.time(lm(y ~ x)) user system elapsed 34.039 3.584 37.611 > system.time(lm(y ~ x)) user system elapsed 22.645 3.118 25.755 > system.time(lm(y ~ x)) user system elapsed 20.743 2.735 23.471 With these changes, the timings are > system.time(lm(y ~ x)) user system elapsed 8.889 3.422 12.308 > system.time(lm(y ~ x)) user system elapsed 8.688 3.400 12.083 > system.time(lm(y ~ x)) user system elapsed 8.657 3.381 12.035 This is clearly a useful improvement, but it can be had in other ways, so the question is whether there are benefits in other settings. Avoiding CHARSXP creation only for shorter strings may not be enough; an alternative is to initially allocate all string data contiguously, and only create CHARSXPs if elements are modified and new values do not fit. To go further it would be useful to have some benchmarks of code using character data.