PyData SV 2014
Ferry is a Python-based, open-source tool to help developers share and run big data applications. Users can provision Hadoop, Cassandra, GlusterFS, and Open MPI clusters locally on their machine using YAML and afterwards distribute their applications via Dockerfiles. These capabilities are useful for data scientists experimenting with big data technologies, developers that need an accessible big data development environment, or for developers simply interested in sharing their big data applications. In this presentation, I'll introduce you to Docker, show you how to create a simple big data application in Ferry, and discuss ways the Python community can contribute to the open-source project. I'll also discuss future directions for Ferry with a focus on better application sharing and operational deployments.