OpenStack Summit Atlanta 2014

Videos provided by OpenStack Summit via OpenStack Foundation YouTube Channel

During the Havana release cycle we discovered Tempest was getting comprehensive enough that it would expose interesting timing problems in OpenStack in the OpenStack gate. Developers were used to calling these flakey tests" and ignoring the negative results, however we saw a pattern emerge where the same pattern for a fail could be seen multiple times. These "statistical failures", where a give scenario will fail 1% of the time, become real issues when you end up with 60+ of them in the code base, and when you create 30,000 clouds per week. We believed we had a couple of interesting race conditions to nail down, and started building a system based on Elastic Search to be able to automatically identify these things. This system first started reporting back data to developers at the very end of the Havana release. This talk will discuss the whole problem space of finding low percentage failures in the code base. The toolchain we build upon for the problem, and the fingerprint and reporting approach that we use to help the OpenStack development community prioritize these issues, how this is informing our thinking about the OpenStack gating system, and where we are headed in the future. Because this whole toolchain is OpenSource it's something we expect others might extend to their own projects as well.

Rated: Everyone
Viewed 280 times
Tags: There are no tags for this video.