October 13, 2024
Pioneering Energy-Efficient Supercomputing: Computer Engineers at Oak Ridge Lead the Way

Pioneering Energy-Efficient Supercomputing: Computer Engineers at Oak Ridge Lead the Way

]As the demand for electricity to power data centers housing artificial intelligence (AI) systems continues to surge, computer engineers are exploring innovative approaches to make supercomputing more energy-efficient. The construction of massive data centers, particularly in states like Virginia and Texas, is driving commercial electricity demand, which is projected to increase by 3% in 2024 alone, according to the U.S. Energy Information Administration.

]The inventory of North American data centers grew by 24.4% year over year in the first quarter of 2024, with new centers being built with capacities of 100 to 1,000 megawatts, equivalent to the power requirements of 80,000 to 800,000 homes. Data centers are expected to consume up to 6.8% of total U.S. electricity generation by 2030, according to the Electric Power Research Institute (EPRI).

]To address this challenge, computer engineers at the Oak Ridge Leadership Computing Facility (OLCF) have been investigating new methods for energy-efficient supercomputing since its inception in 2004. The OLCF has fielded five generations of world-class supercomputing systems, achieving a nearly 2,000 times increase in energy efficiency per floating point operation per second (flops). Frontier, the OLCF’s latest supercomputer, ranks first in the TOP500 list of the world’s most powerful computers and first in the Green500 list of the world’s most energy-efficient computers.

]As private companies enter the high-performance computing (HPC) market with larger, more power-consuming systems, the OLCF’s expertise in energy efficiency could prove invaluable. With decades of experience in making HPC more energy efficient, the OLCF is uniquely positioned to influence the full energy-efficiency ecosystem, from applications to hardware to facilities.

]One significant computational efficiency advancement originated from the video game industry’s need for increasingly sophisticated in-game graphics. Chip makers competing to meet this demand developed dedicated graphics processing units (GPUs), which are more energy-efficient than traditional central processing units (CPUs) for floating point operations. Today, GPUs are an essential component of most supercomputers, especially those used for training AI models.

]The OLCF’s pioneering use of GPUs in leadership-scale HPC with its Titan supercomputer in 2012 required computational scientists to adapt their codes to fully exploit the GPU’s ability to perform simple calculations and speed up the time to solution. A GPU is almost pure floating point units, making it more energy-efficient than a CPU. The OLCF’s gamble on GPUs paid off, resulting in progressively more energy-efficient systems as each generation of OLCF supercomputer increased its number of speedier GPUs.

]However, as exascale discussions began in 2008, the Exascale Study Group foresaw an electric bill of potentially $500 million a year for a stripped-down 1-exaflop system. To address this challenge, the DOE Office of Science launched the FastForward and DesignForward programs to work with vendors on new technologies to improve performance, power consumption, and resiliency.

]As a result, semiconductor chip vendor AMD developed a more powerful and energy-efficient compute node for Frontier, consisting of a 64-core 3rd Gen EPYC CPU and four Instinct MI250X GPUs. AMD also figured out a way to make the GPUs more efficient by turning off sections of the chips that are not being used and then turning them back on when needed in just a few milliseconds.

]In the future, processor vendors will need to focus on small, incremental improvements to energy efficiency, as Moore’s Law, which drove transistors to become smaller, cheaper, and faster, has come to an end. A more integrated, holistic approach to energy efficiency is expected to yield the most significant improvements.

]OLCF researchers are also exploring ways to optimize the energy usage of supercomputers through data analysis and modeling. For example, they are constructing a digital twin of the Frontier supercomputer to experiment with energy-saving scenarios before implementing them on the real machine. This virtual Frontier can be run on a desktop computer and used to predict the future power and cooling needs of Discovery, the OLCF’s next supercomputer.

]From a 10,000-foot viewpoint, a supercomputer is essentially a giant heater that requires significant amounts of electricity to operate and cool down. The OLCF’s cooling systems have evolved alongside chip technology, reducing the energy needed for cooling by 10 times since 2009. The team continues to make cooling optimizations to ensure that the next generation of supercomputers, such as Discovery, can operate efficiently while minimizing energy usage.

Money Singh
+ posts

Money Singh is a seasoned content writer with over four years of experience in the market research sector. Her expertise spans various industries, including food and beverages, biotechnology, chemicals and materials, defense and aerospace, consumer goods, etc. 

Money Singh

Money Singh is a seasoned content writer with over four years of experience in the market research sector. Her expertise spans various industries, including food and beverages, biotechnology, chemicals and materials, defense and aerospace, consumer goods, etc. 

View all posts by Money Singh →