Applications typically measure processing throughput as a function of CPU-bound algorithm performance. Most modern production systems will contain 2-16 processor cores, but highly concurrent, compute-heavy algorithms may still reach hard limits. Within recent years, vendors such as Nvidia and Apple have formalized specifications and APIs that allow every developer to run potentially 1,000s of concurrent threads using a piece of hardware already present in the machine: the GPU.