something that CUDA C/C++ and other languages cannot enable. Once PTX is into SASS, it is optimized for a specific generation of Nvidia GPUs. For example, when training its V3 model, DeepSeek ...