← Back to context

Comment by daemonk

9 years ago

I am a bioinformatician. 130tb of raw reads or processed data? Are you trying to build a general purpose platform for all *-seq or focusing on something specific (genotyping)?

I think you might be replying to my comment. We just took delivery of a 20K WGS callset that is 30TB gzip compressed (about 240TB uncompressed) and expect something twice as big by the end of the year. We're trying to build something pretty general for variant level data (post calling, no reads), annotation and phenotype data. Currently we focus on QC, rare and common variant association and tools for studying rare disease. Everything is open source, we develop in the open and we're trying hard to make it easy for others to develop methods on our infrastructure. Feel free to email if you'd like to know more.