Data-parallel processing frameworks are being introduced at a fast pace and Erlang seems to be particularly well suited to soft real-time applications with high level of data parallelism and processing concurrency where reliability is important. With increased memory size and multi-core computing capabilities of modern processors and introduction of high-performance persistent storage, Erlang covers increasing portions of response time-data volume chart and complements very well existing large-scale analytics platforms like Hadoop.
This talk aims at presenting a case for building soft real-time, scalable data-parallel processing pipelines in Erlang.
We present architecture and simple specification language for building data-parallel flows in Erlang and share use cases covering data-parallel methods such as map-reduce and iterative graph algorithms to illustrate flexibility of the proposed approach. We discuss other important elements of the architecture such as capacity planning for typical use cases, relationship with other ecosystem components, instrumentation and monitoring, scheduling, replication and failover.