Biography
Dr. Xiao Wang is a research staff scientist in the Computational Science and Engineering Division at Oak Ridge National Laboratory (ORNL). He earned dual Bachelor's degrees in Mathematics and Computer Science from Saint Johnâs University, MN (2012), and completed his M.S. and Ph.D. in Electrical and Computer Engineering at Purdue University (2016â2017) under Dr. Charles Bouman and Dr. Samuel Midkiff. Before joining ORNL in 2021, he conducted postdoctoral research at Harvard Medical School and Boston Childrenâs Hospital, focusing on medical imaging. Dr. Wangâs research lies at the intersection of artificial intelligence (AI), high-performance computing (HPC), and computational imaging. He develops algorithms that integrate AI, imaging physics, and HPC to enable high-resolution, data-efficient imaging across modalities such as X-ray, CT, MRI, electron tomography, and satellite imaging, with applications in medicine, biology, climate science, and national security. He received the 2022 AAPM Truth CT Reconstruction Challenge award, was a finalist for the ACM Gordon Bell Prize in 2017 and 2024, and received the 2024 HPCWire Top Supercomputing Achievement Award. His current work focuses on scalable, energy-efficient, and trustworthy Vision Transformer foundation models for large-scale imaging applications.
SRP Project Title
Computing-Efficient Training for Large-Scale Vision Transformer Foundation Models
NAIRR Project
Computing-Efficient Training for Large-Scale Vision Transformer Foundation Models
Topical Areas
Applied Computer Science; Artificial Intelligence and Intelligent Systems; Computer Science
Abstract
Vision Transformer (ViT) is a powerful AI architecture for computer vision that is used by most imaging foundation models due to its effectiveness in discerning complex visual patterns across many tasks. However, training large-scale ViT foundation models requires considerable computing resources, leading to a significant energy footprint for training. For example, Open-AIâs SORA video generator model was trained on more than 10,000 NVIDIA H-100 GPUs and the training took more than a month on a supercomputer. The energy consumption for training SORA was equivalent to the total annual energy consumption of 300 US households. This one-year project aims to improve ViT scaling algorithms computing efficiency, reducing AI development cycle and training time. We will develop a training framework optimized for hardware-conscious scaling and computing efficiency, specifically tailored for large-scale ViT models.
Desired Skills
AI, efficient computing
Lightning Talk Title
Energy Efficient Vision Transformer Training Framework For Exascale Foundation Models
Keywords
vision transformer, exascale foundation model, high performance computing, energy efficiency