← Back to context

Comment by nutate

11 years ago

Yeah, I did a presentation at the DC Area Apache Spark meetup... slides: http://www.slideshare.net/RichardSeymour3/2015-0224-washingt... associated blog post: https://www.endgame.com/blog/streaming-data-processing-pyspa...

I've done a bit of scala spark as well, and my initial thought was prototype in pyspark and then rewrite in scala if necessary. Just this week DataBricks announced they are working on changing some of the data structures behind RDDs to save on unnecessary java object creation hubbub https://databricks.com/blog/2015/04/28/project-tungsten-brin...

That and the SQL compiler thing seems pretty darn awesome. Spark has the nice benefit of being plug and play (w/ a joyful time of compiling and deploying) with legacy HDFS/Hadoop systems. That alone will keep it in toolboxes for a long time to come.