R Inferno-ism: order is not rank

Do not use order when you want rank.

Background

The update of “A comparison of some heuristic optimization methods” is due to the bug that Luca Scrucca spotted.

Actually, it is two bugs:

    • I used order when I meant rank
    • This somehow escaped being in The R Inferno

 

Problem

What I said in my code was (essentially):

ord <- order(x)

Now what I wanted was the order of the values in x.  What I got was the permutation of indices that would put x into sorted order.  Only under the rarest of circumstances are these the same.  But they sound oh so similar.

What I really wanted to say was:

ord <- rank(x, ties.method="first")

(But see below.)

Timing

Using order in this case doesn’t get us where we want to go.  The advantage is that it gets us there really fast.  The rank function is much slower. (Timings in R version 2.15.0.)

  > x10 <- runif(10)
> system.time(for(i in 1:1e4) order(x10))
   user  system elapsed 
   0.11    0.00    0.11 
> system.time(for(i in 1:1e4) rank(x10, ties.method="first"))
   user  system elapsed 
   1.22    0.00    1.34 
> x100 <- runif(100)
> system.time(for(i in 1:1e4) order(x100))
   user  system elapsed 
   0.14    0.00    0.17 
> system.time(for(i in 1:1e4) rank(x100, ties.method="first"))
   user  system elapsed 
   1.61    0.00    1.64 
> x1000 <- runif(1000)
> system.time(for(i in 1:1e4) order(x1000))
   user  system elapsed 
   1.14    0.02    1.15 
> system.time(for(i in 1:1e4) rank(x1000, ties.method="first"))
   user  system elapsed 
   3.76    0.00    3.82

rank is clearly slower than order. The whole point, though, is that these two commands give us different things.  The command order(order(x)) is another way to get what our rank command gives us.  Even though it is a bit kludgy, it can be significantly faster:

> system.time(for(i in 1:1e4) rank(x10, ties.method="first"))
   user  system elapsed 
   1.39    0.00    1.39 
> system.time(for(i in 1:1e4) order(order(x10)))
   user  system elapsed 
   0.23    0.00    0.24 
> system.time(for(i in 1:1e4) rank(x100, ties.method="first"))
   user  system elapsed 
   1.56    0.00    1.56 
> system.time(for(i in 1:1e4) order(order(x100)))
   user  system elapsed 
   0.36    0.00    0.38 
> system.time(for(i in 1:1e4) rank(x1000, ties.method="first"))
   user  system elapsed 
   3.94    0.00    4.00 
> system.time(for(i in 1:1e4) order(order(x1000)))
   user  system elapsed 
   2.17    0.00    2.17 
> x10000 <- runif(10000)
> system.time(for(i in 1:1e4) rank(x10000, ties.method="first"))
   user  system elapsed 
  34.88    0.00   35.01
> system.time(for(i in 1:1e4) order(order(x10000)))
   user  system elapsed 
  29.51    0.00   29.94

 

This entry was posted in R language and tagged . Bookmark the permalink.

One Response to R Inferno-ism: order is not rank

  1. Pingback: A comparison of some heuristic optimization methods | Portfolio Probe | Generate random portfolios. Fund management software by Burns Statistics

Leave a Reply

Your email address will not be published. Required fields are marked *