A friend recently bought The R Book and I said I would tell him of problems that I’ve noticed with it. You can eavesdrop.
The word “library” is used instead of “package”. This (common) error substantially raises the blood pressure of some people — probably to an unwarranted extent.
An R package is a group of functions, data and their documentation. These are the things that are in repositories like CRAN (where there are over two thousand packages). A package is installed onto your machine into a library.
You are unlikely to call a book a library; don’t call a package a library.
Part of the problem is that packages are attached with the library function:
That is why some instructions have you do the same thing via:
Some of the people whose blood pressure is abnormally raised by seeing this mistake are very important to R, so please get this right.
An example value is:
that is, a complex number. This is in the chapter called “Essentials of the R Language”. I’ve been using R and a language not unlike R for a quarter century. The only time I recall using complex numbers is when documenting them. Complex numbers don’t match my definition of “essential”.
There is a certain amount of irony for a 600-word blog post to take n lines to complain about a 900-page book wasting one line. However, the complex number is an extreme example of a common occurrence in the chapter. There is a lot of the chapter that I don’t find particularly essential.
My take on “essential” is Some hints for the R beginner.
These are two examples of a general feature: while the author’s keyboard seems to work perfectly fine for text, the space-bar is mysteriously broken for R code.
It is clearer to write these as:
> A <- 1:10
> B <- c(2, 4, 8)
The assignment arrow shows up as a separate entity. Spacesaidunderstanding.
The same thing, but this time it’s serious.
really, really should have spaces around the less-than operator.
There is no trouble with this particular example, but what if the example were with minus five?
does not give you a logical vector with TRUE values when x is less than minus five. It changes x to have the single value 5.
This and a whole bunch of other R gotchas are in The R Inferno.
The values in the body of a matrix can only be numbers.
That is a false statement. In particular, if x is a numeric matrix, then the result of
x < -5
is a matrix of logical values (and is the same dimension as x).
This be praise, not quibble.
The book uses “explanatory variables” and “response” in the statistical regression context. It doesn’t enter into the dependent-independent muddle.
Amazon has several reviews of The R Book. There is a range of opinions from very positive to quite negative. A common complaint is that the material is disorganized.
The points I have raised are from a quick glance through the book. Are there other things in the book that should be pointed out to help the unwary?
I don’t think there is such a thing as the best book on R. There can be the best book on R for you as an individual. Which one is the best will depend on where you are and where you want to go. A partial list of your choices is Books related to R.
I started with the R book by Crawley, and I would cautiously endorse some of your comments. I worked my way through it until around chapter 13, and i find that the structure mitigated against my understanding at first.
As i improved with R though, i came to appreciate the way the book is put together much more. That being said, it took me around 100+ pages to get to possibly the most important task when learning R: reading in your own data from a spreadsheet. He was very helpful here.
The best part of this book (IMO) is the material on statistical modelling, and the problems therein. That chapter (i think its 9) is wonderful, and i have returned again and again to it to check some subtleties.
Almost all of the examples are biological, so if you arent in this area you may miss some of your fields favourite methods. That being said, I’m a psychologist and I found that the examples were useful as he’s good at explaining assumptions and issues with each of the procedures used.
All in all, its probably not the best book for the beginner (his introductory statistics using R is much more user friendly), its a great reference book for those wishing to learn more.
Thanks for your comments — it’s good to hear from a perspective that I couldn’t possibly have.
No problem. Actually commenting on this post has made me want to review some of the many R books i’ve used throughout my learning experience.
I only read the part about libraries and packages so far and I’d’ve never thought this could be a problem.
Maybe the only real problem here is that many people don’t make appropriate difference between concepts (“package”) and functions (`library`, `require`). But the sentence “A package is installed onto your machine into a library” introduces a sophisti-bloody-cated difference that makes no difference for an ordinary user, although it might have a theoretical importance to the CS theorists. Moreover, it is not clear. Are you saying that an installed package becomes a library? Then an ordinary user should never complain about packages because all he sees is the installed version on his machine. They should complain about libraries. Or is it that a package is installed into a folder called “library”? Yes it is but so what? Does this mean that we should call an installed package a “library”? By this logic, the working horse of R should be called a “bin”!
Thanks for the comment. I agree that for a typical user the issue of package versus library is very trivial.
However, I think it is not so trivial when a supposedly authoritative book makes the mistake. Partly because it is a red flag that there might be other mistakes in the book — a couple of which I spotted.
I agree that it should be correct in the book. But your explanation about what a library is was a bit confusing — one could understand that an installed version of a package becomes a library (but it could be my poor English:). I checked the R documentation and it actually says that “library” is intended to mean …
“directories in the file system containing a subdirectory for each package installed there.”
So in a typical case there would be just one library (corresponding to the folder called “library”), regardless of how many packages are installed. And having more than two or three libraries would usually not make sense.
Yes, typically only one library per version of R. I interpreted my sentence to mean that — I can see that others might interpret it differently.
The major criticism I have of this book is the bizarre way it introduces vectors, dataframes. They are used way before they are defined.
Factors are introduced (unclearly) and then we are launched into an example using read.table and dataframes!
Similarly tapply is used out of the blue and we get the terms “response variable” (never defined in book as far as I can see) and “categorical explanatory variable” ( factor I think he means here) and we are still waiting to understand what a data.frame is.
Thankfully I have being doing a few courses with R so had already read how important these structures are.
To me it looks like he has not gone over his chaotic first draft and reorganised everything.
Hoping for better as I get more into it but definitely not a first book on R.
Luckily people have kindly put a lot of stuff to help us R newbies
Do you have the second edition or the first edition?
second edition ( the green book) – I am now on page 61!