Comment by malshe

4 years ago

Yes, the author has shared the link to R package here:

https://cran.r-project.org/web/packages/XICOR/index.html

Edit: R code from Dr. Chatterjee's Stanford page is here - https://souravchatterjee.su.domains//xi.R

If you have never worked with R, the code seems clunky so I suggest checking out Python implementation on Github here:

The Python library is not from the original author though. But it's easy to read the code and it works with pandas as well.

5 comments

malshe

tpaschalis 4 years ago

If anyone is interested, I've also published a Go implementation [1] of the code for float64 slices.

Results seem to exactly match the R and Python implementation, so there will be a second pass focusing on performance, stability and support for categorical variables.

[1] https://github.com/tpaschalis/xicor-go

zmachinaz 4 years ago

The current version of the python lib seems to be extremely badly written code. Or is the algo so bad ? Takes something like 21s to compute the correlation for just 10k samples.

flyingmutant 4 years ago
This issue contains simple code that is claimed to be >300x faster: https://github.com/czbiohub/xicor/issues/17
- malshe 4 years ago
  
  Thanks for locating the solution. I didn’t check the Python code myself so I wasn’t sure what was going on with the slow processing

ur-whale 4 years ago

Thanks, the Python code is very clear and simple and makes it super easy to understand the idea without having to digest the paper.