Talking head
Pacific Northwest Scala 2014

This presentation, by Marc Millstone, is licensed under a Creative Commons Attribution ShareAlike 3.0

Streaming algorithms are a set of techniques for computing on data streams in which the stream of data is presented as a sequence of elements with each element only being processed once. Additionally, these methods are only allowed a constant memory footprint, independent of the size of the input stream. These constraints often lead to required tradeoffs. Solutions are often correct to a threshold, with approximations being required. Understanding the vast application of these methods and how to best integrate them into their software should be a core competency in the modern engineers toolbox. This talk will weave together real-world applications of streaming algorithms encountered here at Socrata, the types and data structures (with open-source implementations in Scala) used internally to implement our streaming algorithm data pipeline, the theory and data structures behind approximation algorithms including the popular Count-Min Sketch and HyperLogLog as well as other lesser-known techniques, and references to and examples of integrating JVM-based implementations of these methods found in Twitter’s Algebird and Clearspring’s stream-lib into the working engineer’s data pipeline.

Rated: Everyone
Viewed 567 times
Tags: There are no tags for this video.