CUDA MMA (Matrix Multiply Accumulate) and the Roofline model
Graphical Processing Units (GPUs) are hardware accelerators for compute, when writing an algorithm we want to squeeze the most of our machine, i.e. we want to maximize the number of floating point ...