Valgrind and the R memory manager
Valgrind is a set of tools for
detecting memory management bugs. Previously it ran only on x86 Linux,
but version 3.0 supports AMD64 Linux and support for FreeBSD and for
PowerPC Linux are under development.
Typically Valgrind is used with unmodified binaries. It runs the
binary in a CPU emulator and tracks memory allocations and
initialisations. This approach is limited when a program does its own
memory management. In R, memory becomes inaccessible to a correctly
functioning program when it is garbage collected, and integer, logical
and numeric vectors are uninitialized when allocated, but Valgrind does
not know this.
Valgrind provides a `client request mechanism' for programs to provide
information about their own memory management. This has been added to
R-devel. There are four levels of instrumentation, governed by the
macro VALGRIND_LEVEL.
- VALGRIND_LEVEL = 0 removes all Valgrind
instrumentation. This is currently the default
- VALGRIND_LEVEL = 1 marks newly allocated numeric, integer,
and logical vectors as uninitialized, as if they had been obtained from
malloc. This allows Valgrind to catch use of uninitialized
variables. It will also catch some pointer protection bugs. If an
unprotected vector is garbage collected and the memory is reallocated to
a numeric, integer, or logical vector, and is read before being written,
Valgrind will report an uninitialized read. Level 1 imposes little
performance penalty on Valgrind and I hope it will become the default
for suitable platforms.
- VALGRIND_LEVEL = 2 also marks the DATAPTR() section of
each node as inaccessible when the node is garbage collected or when a
new page is obtained, and as accessible when it is allocated. This
catches quite a lot of potential bugs, but makes Valgrind run much
more slowly. The performance penalty seems smaller with Valgrind 3.0 on
AMD64.
- VALGRIND_LEVEL = 3 marks the ATTRIB pointer and the first
three bytes of the SEXPREC_HEADER for all nodes, and the three words of
data in non-vector nodes as inaccessible on garbage collection and as
unitialised when first allocated.
There is a configure option to set VALGRIND_LEVEL,
configure --with-valgrind-instrumentation=##
where ## can be 0, 1, 2, or 3. The default is 0. At the moment
there is no configuration check that the platform is compatible with
valgrind when a level > 0 is specified. Any problems will appear at
compile-time. The potential for problems occurs on x86 platforms other
than Win32 and Linux, and on PowerPC platforms other than Linux.
Both levels of instrumentation will catch more bugs when used in
conjunction with gctorture(TRUE). I have added targets
test-Valgrind and test-Vgct to
tests/Makefile. These run the same code as test-Gct
under Valgrind and Valgrind + gctorture() respectively. They report to
standard output all messages from Valgrind.
It may be useful to add a fourth level of instrumentation to cover the
header fields of the memory nodes.
Running test-Gct and no-segfault.R under Valgrind has found five
bugs so far. One is purely theoretical (when using unary "!" with two
arguments). Two are briefly unprotected pointers that might
theoretically cause heisenbugs at some point. The final two are real,
if not terribly major: parse(,n=0) used a status variable
that was never set, and regexpr applied STRING_ELT to
the pattern argument before coercing it to a string, so that
> regexpr(NA,"NANA")
[1] 1
attr(,"match.length")
[1] 2
> regexpr(as.character(NA),"NANA")
Error in regexpr(pattern, text, extended, fixed, useBytes) :
invalid argument
Thomas Lumley. 2005-8-9