23–28 Jun 2014
Columbia University
US/Eastern timezone

Optimization of Lattice QCD Calculation on GTX Titan Black GPU and Xeon Phi Coprocessor

24 Jun 2014, 18:10
Low library

Low library

Board: 33
Poster Algorithms and Machines Poster session


Mr Jeonghwan Pak (Seoul National University)


There are new NVIDIA technologies, such as direct parallelism and GPU Direct, which reduce the communication time between GPUs and CPUs. NVIDIA Kepler GPUs also provide new features to improve the memory usage of CUDA codes which allows better performance in memory access, memory allocation, and deallocation. We optimize our conjugate gradient code for staggered quarks to obtain the full performance of the GTX Titan black GPU. We also apply various optimization schemes to the Xeon Phi coprocessor. One is the vectorization of the code by using 512-bit SIMD instructions which is essential to the programming on the Xeon Phi. The other is hybrid programming with MPI and OpenMP. In particular, in the case of OpenMP, threads can share the memory, which can, in principle, reduce the communication overload significantly.

Primary authors

Mr Hwancheol Jeong (Seoul National University) Mr Jeonghwan Pak (Seoul National University) Prof. Weonjong Lee (Seoul National University) Ms Yuree Chung (Hankuk academy of foreign studies)

Presentation materials