Talking head
Pacific Northwest Scala 2013

This presentation, by Charles Francis, is licensed under a Creative Commons Attribution ShareAlike 3.0

I first started using Scala in the context of a mostly Python project visualizing and building models of a pretty massive textual dataset. My first frustration with the language was accordingly how much work it was just to parse a CSV file (don't even get me started on memory usage back in 2.6...)! Then, of course, there was an absolute absence of any idiomatic Scala libraries for statistical and numerical processing---one was left to the terrible inconsistencies of Java's numeric types. Of course the situation has improved in recent years as evidenced by very high quality libraries meant to aid this type of workflow like Saddle, Spire, Breeze, and Scala Notebook. While Scala has experienced quite a lot of success as a language for doing Hadoop-style data processing, in this talk I want to focus instead on the current state of the language as a general statistical platform---less big data than analysts working with in-memory datasets for high-performance analytics and visualization. I will focus on how libraries like the ones mentioned above work together (or don't) and how Scala competes with the dominant languages in this space, R and Python. I intend to illustrate a simple analytics workflow using these tools, showing what works well, problems one will come up against, and indicate some fruitful lines for future work and unification.

Rated: Everyone
Viewed 964 times
Tags: There are no tags for this video.