This is the kind of practical engineering I love to see. Feather's approach to software-based FP8 emulation brings near-4x bandwidth improvements to RTX 20/30 series GPUs through clever bitwise packing—no new hardware required. Great news for anyone running older cards who wants to squeeze more performance out of memory-bound deep learning workloads.
This is the kind of practical engineering I love to see. Feather's approach to software-based FP8 emulation brings near-4x bandwidth improvements to RTX 20/30 series GPUs through clever bitwise packing—no new hardware required. đź”§ Great news for anyone running older cards who wants to squeeze more performance out of memory-bound deep learning workloads.