Comment by ur-whale

4 years ago

Is there code?

7 comments

ur-whale

malshe 4 years ago

Yes, the author has shared the link to R package here:

https://cran.r-project.org/web/packages/XICOR/index.html

Edit: R code from Dr. Chatterjee's Stanford page is here - https://souravchatterjee.su.domains//xi.R

If you have never worked with R, the code seems clunky so I suggest checking out Python implementation on Github here:

https://github.com/czbiohub/xicor

The Python library is not from the original author though. But it's easy to read the code and it works with pandas as well.

tpaschalis 4 years ago

If anyone is interested, I've also published a Go implementation [1] of the code for float64 slices.
Results seem to exactly match the R and Python implementation, so there will be a second pass focusing on performance, stability and support for categorical variables.
[1] https://github.com/tpaschalis/xicor-go
zmachinaz 4 years ago
The current version of the python lib seems to be extremely badly written code. Or is the algo so bad ? Takes something like 21s to compute the correlation for just 10k samples.
- flyingmutant 4 years ago
  
  This issue contains simple code that is claimed to be >300x faster: https://github.com/czbiohub/xicor/issues/17
  
  1 reply →
ur-whale 4 years ago

Thanks, the Python code is very clear and simple and makes it super easy to understand the idea without having to digest the paper.

loxias 4 years ago

The equation is on the second page, and if you know enough to know what correlation is, you know enough to implement from the equation given. Takes N*Log(N) to run though, if implemented naively. (because you have to sort your data)