--- title: "Concordances" author: "Duncan Murdoch" date: "2022-10-20" output: html_document --- One of the strengths of R is its ability to help in producing documents. `Sweave` and `knitr` can work with `.Rnw` files, evaluating and automatically inserting the results of R code to produce a LaTeX document in a `.tex` file. We call this "preprocessing", since the later steps were originally designed with the assumption that the `.tex` file was directly edited by the user and then processed to produce PDF or other output formats. R Markdown (using `knitr`) does the same for documents written in the Markdown language. A difficulty with preprocessors is that errors arising in the later steps will produce error messages that refer to the intermediate files: for example, LaTeX errors will refer to the `.tex` file rather than the `.Rnw` file that is the true source. Errors in the HTML code generated from help files are reported by the HTML Tidy utility according to their line in the `.html` file, not the `.Rd` or `.R` file which the user originally wrote. Concordances address this issue. A concordance is a mapping between lines in the intermediate file and lines in the input file. If an error is reported at `"file:line"` by LaTeX or HTML Tidy, the concordance allows that location to be translated into the corresponding location in the `.Rnw` or `.Rd` file. I added concordances to `Sweave` many years ago, and wrote the `patchDVI` package to use them with previewers and to translate LaTeX error messages. (See the details in the history below.) With the upcoming release 4.3.0 of R, concordances have been extended to help files. Messages from HTML Tidy will be reported with both the `.html` file location and the `.Rd` file location. For example, the file `hello.Rd` could contain this code: ``` \name{hello} \alias{hello} \title{Hello, World!} \usage{ hello() } \description{ Prints 'Hello, world!'. \out{} } ``` The second last line inserts the literal text `` into the output. This is not a legal HTML token, and HTML Tidy will complain. With the new changes, the complaint will be shown as ``` * checking HTML version of manual ... NOTE Found the following HTML validation problems: hello.html:25:1 (hello.Rd:10): Error: is not recognized! hello.html:25:1 (hello.Rd:10): Warning: discarding unexpected ``` This indicates that the bad token was spotted by HTML Tidy in column 1 of line 25 of the `hello.html` file, and that line originated from line 10 of `hello.Rd`. There may also be an error reported in producing the PDF version of the manual; at present those are not automatically translated by R, but as shown below, the location can be found manually. ## Concordance code The concordance code is mainly intended for internal use, but it is being made available to package writers. One package that might be able to use it is `roxygen2`; among other things, it creates help files from `.R` source files. The new code would allow it to embed its own concordance in the `.Rd` file so that HTML Tidy would report a reference to the true source in the `.R` file. (There are some difficult issues in producing that concordance due to Pandoc limitations, so this might not happen soon.) ### Some details about the new code There's a new class named `"Rconcordance"`, and three related functions exported by the `tools` package. The `"Rconcordance"` objects are simple lists with three fields: - `offset`: If only part of the output file is related to the input file, the initial `offset` lines can be skipped. - `srcLine`: This is a vector of line numbers from the original source file corresponding to a range of lines of the output file starting at line `offset + 1`. - `srcFile`: In simple cases, this is a single filename for the source file; in more complicated cases, it can be a vector of filenames of the same length as `srcLine`, possibly giving a different source file for each of those lines. There is a `print` method for the class: ``` library(tools) concordance <- structure(list(offset = 5, srcLine = 20:30, srcFile = "myHelpfile.Rd"), class = "Rconcordance") concordance ``` ``` ## srcFile srcLine ## 6 myHelpfile.Rd 20 ## 7 myHelpfile.Rd 21 ## 8 myHelpfile.Rd 22 ## 9 myHelpfile.Rd 23 ## 10 myHelpfile.Rd 24 ## 11 myHelpfile.Rd 25 ## 12 myHelpfile.Rd 26 ## 13 myHelpfile.Rd 27 ## 14 myHelpfile.Rd 28 ## 15 myHelpfile.Rd 29 ## 16 myHelpfile.Rd 30 ``` The row labels are the output line numbers, the columns give the source filename and line corresponding to each. The `as.character` method for `"Rconcordance"` objects converts them into one or more fairly compact strings, suitable for inclusion into a final document. For example, ``` conc_as_char <- as.character(concordance) conc_as_char ``` ``` ## [1] "concordance::myHelpfile.Rd:ofs 5:20 10 1" ``` The `as.Rconcordance` function is a generic function, with a default method defined. That method looks for strings like the one above in its input, and combines all of them into a single concordance object. For example: ``` newconcordance <- as.Rconcordance(conc_as_char) newconcordance ``` ``` ## srcFile srcLine ## 6 myHelpfile.Rd 20 ## 7 myHelpfile.Rd 21 ## 8 myHelpfile.Rd 22 ## 9 myHelpfile.Rd 23 ## 10 myHelpfile.Rd 24 ## 11 myHelpfile.Rd 25 ## 12 myHelpfile.Rd 26 ## 13 myHelpfile.Rd 27 ## 14 myHelpfile.Rd 28 ## 15 myHelpfile.Rd 29 ## 16 myHelpfile.Rd 30 ``` Finally, the `tools::matchConcordance` function does the translation of locations in intermediate files to locations in the source file. For example, when proofreading the HTML help files, you may have noticed "Hello, world!" on lines 1, 19 and 23 of the `hello.html` file and decided to change it, but because your actual help file was so large, this isn't the trivial problem it would be with my example. So what you could do is the following: 1. Run `tools::Rd2HTML("hello.Rd", concordance = TRUE)`. This will print the HTML source for the help page, ending with ``` ``` 2. Convert that string to a concordance object using ``` concordance <- tools::as.Rconcordance("") ``` 3. Find the source corresponding to lines 1, 19 and 23 using ``` tools::matchConcordance(c(1, 19, 23), concordance) ``` ``` ## srcFile srcLine ## [1,] "hello.Rd" "3" ## [2,] "hello.Rd" "3" ## [3,] "hello.Rd" "8" ``` The first two arose from the `\title{}` specification, and the third one came from a line of text in the `\description` section. ## Ancient History Many years ago I used `Sweave` for writing papers, presentations, exams, etc. It took `.Rnw` files as input, and produced `.tex` files as output. I would run those files through `latex` to get `.dvi` files which I could preview, print, or convert to PDF for distribution. Previewers existed in those days that let you click on a particular word in the preview, and they'd tell your text editor to jump to the corresponding location in the `.tex` file. That was kind of nice, but also kind of irritating: I then had to figure out the right location in the `.Rnw` file to make my edits, or make the edits in the `.tex` file and be frustrated when they got wiped out by `Sweave` on the next run! My first solution to this problem was to get `Sweave` in `R` 2.5.0 to keep a record of the correspondence between the lines of the `.Rnw` file and the `.tex` file it produced, which I called the "concordance". Given a line in the `.tex` file, it was then possible to find the corresponding line in the `.Rnw` file. By embedding this record in the `latex` output, this could be made automatic. I wrote the `patchDVI` package to modify the links in the `.dvi` file so that the previewer would automatically jump to the right place in the right file. Happiness! Over the years there were lots of developments. I started using `pdflatex` which skipped the `.dvi` stage, but supported `synctex`, so I added support for that into `Sweave` and `patchDVI`. `knitr` arrived to improve on `Sweave`, and included concordance support. I switched text editors and previewers several times, writing new scripts each time to connect things. Unfortunately, R Markdown is processed by Pandoc, and as far as I know, Pandoc doesn't support any way to relate input lines to output lines. I'd love to be corrected if I'm wrong about that! So concordances don't work with R Markdown or other processors like `Quarto` that rely on Pandoc. I believe `roxygen2` uses Pandoc for processing some help files, so it will also be difficult.