How Google’s Custom AI Chip is Disrupting the Tech Industry

SEI 275896064

Ironwood is Google’s latest tensor processing unit

Nvidia’s dominance in the AI chip market is facing challenges due to a new specialized chip from Google, with several companies, such as Meta and Anthropic, planning to invest billions in Google’s tensor processing units.

What is TPU?

The growth of the AI industry heavily relies on graphics processing units (GPUs), which are designed to execute numerous parallel calculations at once, unlike the sequential processing of central processing units (CPUs) found in most computers.

Originally engineered for graphics and gaming, GPUs can handle operations involving multiple pixels simultaneously, as stated by Francesco Conti from the University of Bologna, Italy. This parallel processing is advantageous for training and executing AI models, particularly with tasks relying on matrix multiplication across extensive grids. “GPUs have proven effective due to their architecture fitting well with tasks needing high parallelism,” Conti explains.

However, their initial design for non-AI applications introduces some inefficiencies in how GPUs handle computations. Google launched Tensor Processing Units (TPUs) in 2016, which are optimized specifically for matrix multiplication, the primary operation for training and executing large-scale AI models, according to Conti.

This year, Google introduced the 7th generation TPU called Ironwood, which powers many of the company’s AI models, including Gemini and AlphaFold for protein modeling.

Are TPUs Superior to GPUs for AI?

In some ways, TPUs can be considered a specialized segment of GPUs rather than an entirely separate chip, as noted by Simon McIntosh-Smith from the University of Bristol, UK. “TPUs concentrate on GPU capabilities tailored for AI training and inference, but they still share similarities.” However, tailored design means that TPUs can enhance the efficiency of AI tasks significantly, potentially leading to savings of millions of dollars, he highlights.

Nonetheless, this focus on specialization can lead to challenges, Conti adds, as TPUs may lack flexibility for significant shifts in AI model requirements over generations. “A lack of adaptability can slow down operations, especially when data center CPUs are under heavy load,” asserts Conti.

Historically, Nvidia GPUs have enjoyed an advantage due to accessible software that assists AI developers in managing code on their chips. When TPUs were first introduced, similar support was absent. However, Conti believes that they have now reached a maturity level that allows more seamless usage. “With TPUs, we can now achieve similar functionality as with GPUs,” he states. “The ease of access is becoming increasingly crucial.”

Who Is Behind the Development of TPUs?

While Google was the first to launch TPUs, many prominent AI firms (referred to as hyperscalers) and smaller enterprises are now venturing into the development of their proprietary TPUs, including Amazon, which has created its own Trainium chips for AI training.

“Many hyperscalers are establishing their internal chip programs due to the soaring prices of GPUs, driven by demand exceeding supply, making self-designed solutions more cost-effective,” McIntosh-Smith explains.

What Will Be the TPU’s Influence on the AI Industry?

For over a decade, Google has been refining its TPUs, primarily leveraging them for its AI models. Recently, changes are noticeable as other large corporations like Meta and Anthropic are investing in considerable amounts of computing power from Google’s TPUs. “While I haven’t seen a major shift of big clients yet, it may begin to transpire as the technology matures and the supply increases,” McIntosh-Smith indicated. “The chips are now sufficiently advanced and prevalent.”

Besides providing more options for large enterprises, diversifying their options could also make economic sense, he notes. “This could lead to more favorable negotiations with Nvidia in the future,” he adds.

topic:

Source: www.newscientist.com

High-profile ocean models accelerated by custom software

This figure shows surface currents simulated by MPAS-Ocean.Credit: Los Alamos National Laboratory, E3SM, U.S. Department of Energy

A new solver algorithm for the MPAS-Ocean model will significantly enhance climate research by reducing and improving computational time. Accuracy. This breakthrough in integrating Fortran and C++ programming is a step forward in efficient and reliable climate modeling.

On the beach, ocean waves provide soothing white noise. However, in scientific laboratories, they play an important role in weather forecasting and climate research. The ocean, along with the atmosphere, is typically one of the largest and most computationally intensive components of Earth system models, such as the Department of Energy’s Energy Exascale Earth System Model (E3SM).

A breakthrough in ocean modeling

Most modern ocean models focus on two categories of waves: barotropic systems, where the wave propagation speed is fast, and baroclinic systems, where the wave propagation speed is slow. To address the challenge of simulating these two modes simultaneously, a team from DOE’s Oak Ridge National Laboratory, Los Alamos National Laboratory, and Sandia National Laboratories has We have developed a new solver algorithm to shorten it. -Ocean, E3SM ocean circulation model, increased by 45%.

The researchers tested the software on the Summit supercomputer at ORNL’s Oak Ridge Leadership Computing Facility, a DOE Office of Science user facility, and the Compy supercomputer at Pacific Northwest National Laboratory. They ran the main simulations on the Cori and Perlmutter supercomputers at the National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory, and their results were International Journal of High Performance Computing Applicationss.

Computing innovations for climate modeling

because TrilinosBecause open source software databases ideal for solving scientific problems on supercomputers are written in the C++ programming language, and Earth system models like E3SM are typically written in Fortran, the team took advantage of the advantages of For Trilinois an associated software library that incorporates Fortran interfaces into existing C++ packages to design and customize new solvers focused on barotropic waves.

“A nice feature of this interface is that you can use all the components of the C++ package in the Fortran language, so you don’t have to translate anything, which is very convenient,” said lead author Hyun, a computational earth systems scientist. Kang said. ORNL.

Improvements to MPAS-Ocean

This work is built on Research results announced before Journal of Advances in Earth System Modeling In this paper, researchers at ORNL and Los Alamos National Laboratory handcrafted code to improve MPAS-Ocean. This time, the ForTrilinos-enabled solver overcomes the remaining shortcomings of the solver obtained in previous studies, especially when the user runs his MPAS-Ocean using a small number of computing cores for a given problem size. Did.

MPAS-Ocean’s default solver is an explicit sub-solver, a technique that uses a large number of small time intervals or time steps to compute barotropic wave properties in conjunction with baroclinic calculations without destabilizing the model. Cycle dependent. If the barotropic and barotropic waves can be advanced with time step sizes of 300 and 15 seconds, respectively, then to maintain the same speed the barotropic calculation would need to complete over 20 times more iterations, a huge amount requires computational power.

In contrast, the new solver for barotropic systems is semi-implicit. That is, it is unconditionally stable, allowing researchers to use the same number of large time steps without sacrificing accuracy, saving significant time and computational power.

The community of software developers has spent years optimizing Trillinos and Fort Lilinos’ various climate applications. As such, a modern MPAS-Ocean solver that leverages this resource will outperform hand-crafted solvers and enable other scientists to accelerate their climate research efforts.

“If we had to code every algorithm individually, it would require much more effort and expertise,” Kang said. “But with this software, you can run simulations quickly and quickly by incorporating optimized algorithms into your programs.”

Future enhancements and impact

Current solvers still have scalability limitations for high-performance computing systems, but they perform very well up to a certain number of processors. This drawback exists because the semi-implicit method requires all processors to communicate with each other at least 10 times per time step, which can reduce model performance. To overcome this obstacle, researchers are currently optimizing processor communication and porting solvers to GPUs.

In addition, the team updated the time-stepping method of the pressure clinic system to further improve the efficiency of MPAS-Ocean. Through these advances, researchers are making climate predictions faster and more reliable, an essential upgrade to ensure climate security and enable timely decision-making and high-resolution forecasting, aims to be more accurate.

“This barotropic mode solver enables faster calculations and more stable integration of models, especially for MPAS-Ocean,” said Kang. “Extensive use of computational resources requires enormous amounts of power and energy, but by accelerating this model we can reduce energy usage, improve simulations, and improve performance over decades and even beyond.” It will be easier to predict the effects of climate change thousands of years into the future.”

Reference: “MPAS-ocean implicit pressure mode solver using a modern Fortran solver interface” by Hyun-Gyu Kang, Raymond S Tuminaro, Andrey Prokopenko, Seth R Johnson, Andrew G Salinger, Katherine J Evans, 2023. November 17th, International Journal of High Performance Computing Applications.
DOI: 10.1177/10943420231205601

This research was supported by E3SM and the Exascale Computing Project (ECP). E3SM is sponsored by the DOE Office of Science’s Biological and Environmental Research Program, and ECP is managed by DOE and the National Nuclear Security Administration. The DOE Office of Science’s Advanced Scientific Computing Research Program funds OLCF and NERSC.

Source: scitechdaily.com