This is a note about the Kokkos, a C++ programming model for performance portability.
Kokkos:
Implemented as a template library on top CUDA, HIP,
OpenMP, ...
Aims to be descriptive not prescriptive.
Why do we need Kokkos?
A full time software engineer writes 10 lines of production code per hour: 20k LOC/year. While typical HPC production app: 300k-600k lines. Just switching Programming Models costs multiple person-years per app!
Kokkos tools:
- KernelLogger: print kernel logs in runtime.
- SimpleKernelTimer: print time consuming information after the run.