CSI High Performance Computing Seminar Series 2024

US/Eastern
    • 1
      Programming future heterogeneous quantum-classical supercomputing architectures

      Speaker: Alexander McCaskey, Quantum Computing Software Architect, NVIDIA

      Abstract: Supercomputing architectures based on GPU acceleration have greatly improved our scientific computing workflows and applications over the past decade. Quantum computing has recently been proposed as a potential addition to this heterogeneous compute architecture, serving as another node-level accelerator to continue problem scalability in domains such as quantum many-body physics and artificial intelligence. As stand-alone quantum processing units (QPUs) continue to evolve and improve, the applied computational science community is left to wonder - how do we build, program, and deploy large-scale quantum-classical heterogeneous architectures that incorporate both GPUs and QPUs? In this talk, we will demonstrate how NVIDIA is leveraging its current suite of multi-GPU platforms to define and deploy the NVIDIA quantum platform. Specifically, we will highlight CUDA Quantum - a quantum-classical programming model in C++ with Python bindings, and associated compiler toolchain built on the MLIR and LLVM frameworks. This talk will focus on technical details of the programming model and compiler architecture and demonstrate the utility of CUDA Quantum when targeting both real and emulated quantum coprocessors.

      Speaker Bio: Alexander McCaskey is a quantum computing software architect at NVIDIA, and the manager of the Quantum Computing Architecture team. His work is focused on programming models, compilers, and languages for heterogeneous quantum-classical computing. He is the lead architect for the CUDA Quantum project, a novel quantum-classical programming model in C++ and Python enabling performant workflows on heterogeneous architectures. Previously, he was a Staff Scientist at Oak Ridge National Laboratory where he led the development of the XACC system-level quantum framework and the QCOR quantum-classical C++ compiler platform. He received B.Sc. degrees in 2010 in Physics and Mathematics from the University of Tennessee, and a M.Sc. degree in physics from the Virginia Polytechnic and State University in 2014.

    • 2
      Advancing Intelligent Scheduling for Complex Large-Scale Systems

      Speaker: Jing Li, Assistant Professor, Department of Computer Science at New Jersey Institute of Technology

      Abstract: As computer architecture and software continue to evolve, large-scale systems like high-performance computing and supercomputers are becoming increasingly complex, consisting of diverse processing units, specialized accelerators, and complex memory hierarchies. Concurrently, scientific workflows are also growing in complexity and dynamism. Maintaining optimal application performance for timely processing while efficiently utilizing resources poses a significant challenge, exacerbated by the intricate scheduling problems inherent in these systems. Traditional ad hoc heuristic-based approaches are no longer sufficient, and manual resource allocation decisions are cumbersome and time-consuming for developers. To address these challenges, there is a pressing need for an intelligent scheduling framework capable of automating resource allocation to enhance system performance. However, existing learning-based approaches face limitations in handling combinatorial optimization, long-distance dependencies, and generalizing across diverse workflows. This talk will discuss potential avenues to leverage theoretical insights in resource allocation problems and develop efficient reinforcement learning formulations to tackle these challenges head-on.

      Speaker Bio: Jing Li is an assistant professor in the Department of Computer Science at New Jersey Institute of Technology. She received her Ph.D. degree from Washington University in St. Louis in 2017. Her research interests include parallel computing, real-time systems, and reinforcement learning for system design and optimization. She has high impact publications in top conferences with three outstanding paper awards. Jing is the recipient of the NSF CAREER Award in 2024 and the Department of Energy Early Career Research Program (ECRP) Award in 2023.

    • 3
      Supercomputer-based in-silico virtual humans: the future of medicine NOW

      Speaker: Mariano Vazquez, CTO / CSO, ELEM Biotech

      Abstract: ELEM Biotech is a startup company of the Barcelona Supercomputing Center, BSC. We develop Virtual Humans based on supercomputing and high-fidelity mutliscale / multiphysics modellization. Together with supercomputing power and accurate modellization, we develop mathematical tools to create populations of Virtual Humans representative of Real ones. The goal is to put in the hands of the biomedical stakeholders a tool for (a) run in-silico clinical trials and (b) personalize the virtual humans to a given real patient under a certain condition. Our tools allow to improve and optimize therapies. Today we are focus on cardiac and vascular diseases. In this talk we will discuss our latest achievements.

      Speaker Bio: MV is co-founder and CTO/CSO of the ELEM Biotech (The Virtual Humans Factory), a spinoff company of the Spanish Barcelona Supercomputing Center (BSC), founded with the goal of speeding-up the technology transfer of BSC modelling technology for he biomedical domain, in particular, the code Alya. He is also one of the two leaders of the Alya Development Team at the BSC, with more than 70 scientists and developers. Graduated in Physical Sciences from the University of Buenos Aires, Argentina, in 1993, he completed his bachelor's thesis on Chaos in Dynamical Systems. Doctor in Physical Sciences from the Polytechnic University of Catalonia (UPC), Spain, in 1999, he completed his doctoral thesis in Computational Fluid Mechanics (on numerical schemes for stabilization of compressible flow equations for finite elements). He has carried out post-doctoral stays at the Pole Scientifique Univ. Paris VI / Dassault Aviation (in multigrid for compressible and incompressible turbulent flow, funded by a Marie-Curie scholarship from the EC) and at INRIA Sophia Antipolis (shape optimization using the adjoint method), both in France, for 3 years. He was a consultant for the company Gridsystems (grid computing) in Palma de Mallorca (Spain) and a lecturer at the University of Girona (Spain). Since 2012 he has been a senior scientist at the CSIC, on leave since July 2018, when he co-founded ELEM. In 2004, his scientific interests experienced the irresistible grasp of computational biomedicine until this day (and counting).

    • 4
      Time-Series Hamiltonian Kernels: A Parallel Quantum-Classical Approach for Temporal Data Classification

      Speaker: Santosh Kumar Radha, Agnostiq

      Abstract: This talk introduces a novel hybrid quantum-classical machine-learning framework for time-series classification. We present the Time-Series Hamiltonian Kernel (TSHK), constructed using quantum states evolved through parameterized time evolution operators and integrated into a Quantum-Classical-Convex neural network (QCC-net). This end-to-end learnable system produces dataset-generalized kernel functions that are purpose-tuned for temporal and ordered data. We demonstrate the performance on synthetic and real-world datasets and showcase efficient parallel implementation on superconducting quantum processors using Quantum Multi-Programming (QMP). Our approach exploits the quantum-native property of time series evolution in quantum processes to map and identify a corresponding process that best represents the classification of temporal data, addressing the challenges of temporal data analysis in the NISQ era.

      Speaker Bio: Santosh is the Head of R&D and Product at Agnostiq, where he is working on various R&D projects involving quantum algorithms and software. Santosh holds a Ph.D in theoretical physics from Case Western Reserve University, where he started working on massive gravity and moved to condensed matter physics. His research was to theoretically and computationally understand the topological effects occurring in quantum systems as a result of "knotted" wave functions in both interacting and non-interacting fermionic systems and its impact in lower dimensional entanglement. Currently, Santosh plays a pivotal role in shaping Agnotiq’s product strategy, particularly by enhancing the scalability and performance of next-generation AI applications and large-scale scientific simulations across multi-cloud environments.

    • 5
      On Training Large Foundation Models on Frontier

      Speaker: Sajal Dash, ORNL

      Abstract: Training large-scale language models (LLMs) presents significant computational challenges, particularly for models with billions to trillions of parameters. This talk explores efficient distributed training strategies on Frontier, the world's first exascale supercomputer, to tackle these challenges. We examine various parallelism techniques—tensor, pipeline, and sharded data parallelism—to train trillion-parameter models. Through empirical analysis and hyperparameter tuning, we achieve GPU throughputs of 31.96% to 38.38% across models of different sizes and demonstrate 100% weak scaling efficiency on up to 3072 MI250X GPUs. Additionally, we explore the potential of sparsely activated models, such as those using mixture of expert mechanisms, as a more resource-efficient alternative to dense LLMs, providing insights into their design and performance.

      Speaker Bio: As a research scientist, Sajal Dash explores scaling approaches for large-scale deep learning applications by focusing on convergence behavior and problems associated with large batch size. He will also continue his research on mitigating catastrophic forgetting during incremental training of deep learning models in a streaming setting. Before joining Oak Ridge National Laboratory, Sajal completed his Ph.D. in Computer Science at Virginia Tech. His Ph.D. dissertation titled “Exploring the Landscape of Big Data Analytics Through Domain-Aware Algorithm Design” focused on solving large-scale domain problems by leveraging domain-knowledge with properties of big data. His dissertation solved a big data problem in cancer biology by efficiently distributing the combinatorial workload across nodes while regularizing memory access patterns. Dr. Dash’s Ph.D. has been greatly impacted by two summer internships at Oak Ridge National Laboratory in 2018 and 2019 under the mentorship of Dr. Junqi Yin and Dr. Mallikarjun Shankar. Sajal received his B.Sc. in Computer Science and Engineering from BUET, Bangladesh, and MS in Computer Science from UNC Chapell Hill before getting his Ph.D. in Computer Science from Virginia Tech.