***ATTENTION Indico Users***

Important changes to user logins are coming to Indico at BNL.

Please see the News section for more information.

CSI HPC Seminar Series 2022


Biweekly seminar presentations hosted by the High Performance Computing group at the Computational Science Initiative of Brookhaven National Laboratory, covering a wide range of topics in programming models, compilers, application optimizations on high performance computing systems and architectures. 

Regular time: 2pm US Eastern, on Wednesdays 

    • 12:00 13:00
      Performance portability with the SYCL programming model 1h

      Advancements in high performance computing (HPC) have provided unprecedented potential for scientific research and discovery. To help address the “many platforms problem”—stemming from major semiconductor vendors all staking their claim in the market—are the numerous programming models under development which aim for performance portability. This talk will discuss such programming models, and present recent studies on performance portability, with a focus on SYCL: a single-source heterogeneous programming paradigm from the Khronos Group.

      Speaker: Vincent Pascuzzi (Brookhaven National Laboratory)
    • 14:00 15:00
      Preparing the PIConGPU for the next-generation computing systems 1h

      This talk will highlight journey thus far preparing the high performance computing software stack for large complex scientific applications such as OLCFs CAAR’s PIConGPU for Frontier. The talk will cover recent results, tools used, and programming models used for preparing PIConGPU on pre-exascale systems. OLCF’s Center for Accelerated Application Readiness (CAAR) is created to ready applications for the facility’s next-generation supercomputers. PIConGPU is one of the 8 CAAR projects chosen for Frontier.

      Speaker: Sunita Chandrasekaran (University of Delaware and Brookhaven National Laboratory)
    • 14:00 15:00
      High-Performance Tensor Algebra for Chemistry and Materials 1h

      Abstract: Tensor algebra is the foundation of quantum simulation in all contexts, including predictive chemistry and materials simulation. Unlike the linear algebra (of vectors and matrices), tensor algebra is significantly richer, less understood formally, has less mature software ecosystem, and most importantly puts more emphasis on exploiting data sparsity. In this talk I will review the key computational challenges
      of tensor algebra, especially on modern large-scale heterogeneous HPC platforms, and highlight some of our recent work on the open-source TiledArray tensor framework for data-sparse tensor algebra on distributed memory and heterogeneous platforms and its applications in the context of computational chemistry.

      Speaker: Eduard Valeyev (Virginia Tech)
    • 14:00 15:00
      Covariant programme : a programming approach to target both SIMD and SIMT execution 1h

      Abstract: Discussion of the cross platform programming approach to modern HPCs taken by “Grid”, a high performance QCD C++ library. It targets both SIMD intrinsics vectorisation on modern CPUs and SIMT offload models with HIP, SYCL and Cuda back ends. It allows single source high
      performance kernels to be developed that support all of these targets. I discuss the software approaches and (to the extent allowed) the performance of the code on a number of current or planned platforms including Perlmutter, Frontier and Aurora.

      Speaker: Prof. Peter Boyle (BNL)
    • 14:00 15:00
      Dynamic Loop Scheduling across Multi-xPUs Heterogeneous Processors in Nodes of DoE's Exascale Supercomputers 1h

      Abstract: Performance of science and engineering simulations on supercomputers is dependent on communication across nodes and computation performance within a node. With the Moore's Law costs of data movement across the interconnect network, the next-generation supercomputers - particularly those in the DoE - will have the same number of nodes on a supercomputer, but the nodes will actually become more powerful and extremely heterogeneous, with a set of CPUs (multi-cores) and a set of GPUs (multi-devices) on them. Particularly because of application load imbalance and load imbalance due to system noise and complexities of the node's hardware, managing the computational resources on these nodes is challenging. In this talk, I will discuss support in the DoE Exascale Computer Project (ECP) Software Stack to parallelize MPI+OpenMP offload ECP applications across heterogeneous processors/accelerators through user-defined and custom-tuned locality-sensitive loop scheduling with LLVM’s OpenMP along with interoperability of the MPI and OpenMP runtime systems.

      Speaker: Vivek Kale (BNL)
    • 14:00 15:00
      Designing Efficient Graph Algorithms Through Proxy-Driven Codesign and Analysis 1h

      Abstract: Developing scalable graph algorithms is challenging, due to the inherent irregularities in the graph structure and memory-access intensive computational pattern. Proxy application driven software-hardware codesign plays a vital role in driving innovation among the developments of applications, software infrastructure and hardware architecture. Proxy applications are self-contained and simplified codes that are intended to model the performance-critical computations within applications.

      In this talk, I will discuss facilitating software-hardware codesign through proxy applications with the goal of improving the performance of graph analytics workflows on heterogeneous systems. However, even representative proxy applications may be insufficient to diagnose performance bottlenecks of common graph computational patterns at scale. Therefore, we also discuss the role of derivative benchmarks in enhancing graph applications on HPC systems. We will drive the discussion using three case studies--Graph matching, clustering/community detection and triangle counting, which have applications in the domains of proteomics, computational biology, cybersecurity, numerical analysis and other data science scenarios.

      Speaker: Sayan Ghosh (Pacific Northwest National Laboratory)
    • 14:00 15:00
      Experiences with Ookami – an Fujitsu A64FX testbed 1h

      Abstract: Stony Brook’s computing technology testbed, Ookami, provides researchers worldwide with access to Fujitsu A64FX processors. This processor developed by Riken and Fujitsu for the Japanese path to exascale computing and is currently deployed in the fastest computer in the world, Fugaku. Ookami is the first open deployment of this technology outside of Japan. This Cray Apollo 80 system entered its second year of operations. In this presentation we will share our experiences gained during this exciting first project period. This includes a project overview, details of processes such as onboarding users, account administration, user support and training, and outreach. The talk will also give technical details such as an overview of the compilers, which play a crucial role in achieving good performance. To support users to use the system efficiently we offer various opportunities such as webinars, hands-on sessions and we also try to sustain an active user community enabling exchange between the different research groups. In February 2022 the first Ookami user group meeting took place. We will present the key findings and give an outlook on the next project year, where Ookami will become an XSEDE service provider.

      Speaker: Eva Siegmann (Stony Brook University)