Speaker
Mr
Jeonghwan Pak
(Seoul National University)
Description
There are new NVIDIA technologies, such as direct parallelism and GPU Direct, which reduce the communication time between GPUs and CPUs. NVIDIA Kepler GPUs also provide new features to improve the memory usage of CUDA codes which allows better performance in memory access, memory allocation, and deallocation. We optimize our conjugate gradient code for staggered quarks to obtain the full performance of the GTX Titan black GPU. We also apply various optimization schemes to the Xeon Phi coprocessor. One is the vectorization of the code by using 512-bit SIMD instructions which is essential to the programming on the Xeon Phi. The other is hybrid programming with MPI and OpenMP. In particular, in the case of OpenMP, threads can share the memory, which can, in principle, reduce the communication overload significantly.
Primary authors
Mr
Hwancheol Jeong
(Seoul National University)
Mr
Jeonghwan Pak
(Seoul National University)
Prof.
Weonjong Lee
(Seoul National University)
Ms
Yuree Chung
(Hankuk academy of foreign studies)