Pwrake: Distributed Workflow Engine based on Rake
Pwrake aims at the high-performance parallel execution of data-intensive scientific workflows using multi-node clusters with 10,000 cores. In the design of Pwrake, I made use of existing powerful tools. First, Pwrake is implemented as an extension to Rake. In this talk, I show that Rake is so powerful that it enables portable definition of workflow DAGs comprised of many tasks. Second, Pwrake has an option to make use of Gfarm distributed file system for high-performance parallel file I/O. Also, I will talk about other studies on Pwrake such as locality-aware task scheduling.
Masahiro TANAKA, @masa16tanaka
Research Fellow at CCS, University of Tsukuba. The author of NArray, a numerical library for Ruby.