TPUs, HBM, and How SK hynix & Samsung Could Shift the Global AI Chip Power Balance

TPUs, HBM, and Korea’s Rising AI Semiconductor Power

An insight report on the first real crack in NVIDIA’s dominance and the strategic position of SK hynix and Samsung around HBM.

AI infrastructure · Semiconductor industry
📌 TPU · GPU · HBM · SK hynix · Samsung Electronics
📘 Part 1

The Real Shift in the 2025 AI Chip Market — A Massive Tectonic Move Triggered by TPUs

📘 Part 1. The Real Shift in the 2025 AI Chip Market — A Massive Tectonic Move Triggered by TPUs

At first glance, the global AI semiconductor market in 2025 still looks like NVIDIA’s one-man show.

Its financial results keep hitting record highs, data center GPU sales are not slowing down,

and all over the world you keep hearing the same complaints: “There are no H100s,” “There are no B100s.”

But if you look deeper inside the industry,

you start to see a very different current forming.

You can’t fully see it on the surface yet, but the direction has already begun to change.

🔹 At the center of that shift — the return of Google’s TPU

Google’s Tensor Processing Unit (TPU) was first introduced in 2015

as an in-house “AI accelerator” designed to run Google’s own services—

search, ads, YouTube recommendations, Maps, Translate—

as efficiently as possible.

In 2025, the story started to change.

Google’s latest AI model, **Gemini 3**, reached a level that could seriously threaten ChatGPT,

and the chip that trained it, the **7th-generation TPU (Ironwood)**,

suddenly became the focus of industry attention.

This is not just “Google launched another new chip.”

It’s the first visible crack in the near-10-year period

during which the global AI infrastructure market effectively ran on

a “NVIDIA-only” regime.

---

🧩 Why the era of “GPUs alone” is breaking down

Since the launch of ChatGPT, the world has seen an explosion

in the amount of data and computation generated by AI—

to the point where it has become almost impossible to keep up.

From GPT-3 → GPT-4: training compute increased by roughly 20x

From GPT-4 → GPT-4o: multimodal compute requirements surged

From Gemini Ultra → Gemini 3: model size, parameters, and context window all expanded

Meta’s Llama family: more models, and a sharp increase in training clusters

Trying to satisfy this exploding demand with GPUs alone

has created bottlenecks everywhere.

✔ GPU production cannot be ramped as quickly as demand demands  

✔ It’s not just the chip; HBM, substrates, and packaging are all bottlenecks  

✔ Power consumption has soared, and data center costs are exploding  

✔ Pure-GPU clusters face worsening power-efficiency issues

By now, everyone in the industry accepts a simple truth:

> “A single GPU architecture cannot carry this entire market by itself.”

That’s exactly the point at which TPUs re-entered the picture.

---

🧠 Why TPUs are back in the spotlight — the sheer efficiency of specialization

Google was the first to recognize this.

> “Instead of forcing our models to fit a GPU,

it’s more efficient to build chips that fit our models.”

That’s why TPUs are fundamentally different from GPUs.

✔ GPUs

AI + graphics + scientific computing + gaming

A “general-purpose engine” that can do everything

✔ TPUs

Designed around tensor/matrix operations

Optimized for very large LLMs

Architected for search, ads, and YouTube recommendation workloads

A “sports-car engine” tuned for specific use cases

For example,

the types of operations central to large-scale search algorithms

or to sparse operations in ad recommendation systems

often run more efficiently on TPU architectures than on GPUs.

TPUs are also built from the ground up assuming large-scale clusters (Pods),

which means performance scales dramatically when you connect thousands of them.

---

⚡ 7th-Gen TPU “Ironwood” — a new benchmark for AI training

When Google unveiled

its 7th-generation TPU, Ironwood, at Google Cloud Next 2025 in Las Vegas,

the AI hardware world took notice.

The chip has several key characteristics:

Used to train Google’s Gemini 2 and 3 in production

Delivers a major improvement in power efficiency over TPU v5p

Reworked with a bandwidth-centric architecture

Equipped with 8-high HBM3E (with SK hynix as the primary supplier)  

and projected to move to 12-high HBM3E in the enhanced 7e generation

One especially important point:

TPUs typically demand even more HBM than GPUs.

For today’s AI training, **memory bandwidth** has become more critical than raw FLOPs,

and TPUs, by design, impose extremely high demands on HBM I/O.

In practice, the configuration looks like this:

✔ 1 GPU → typically needs 6–8 HBM stacks

✔ 1 TPU → needs 6–8 HBM stacks, or more

(The exact ratio can be even higher depending on the architecture

Google and Broadcom roll out.)

The implication is straightforward:

> As the number of TPUs grows, HBM demand grows even faster

> than the incremental demand coming from GPUs alone.

---

🌍 The market is no longer “GPU vs TPU,” but “GPU + TPU”

On the surface, NVIDIA’s GPUs and Google’s TPUs look like competitors,

but their actual roles are different enough that they function as complements.

For general-purpose AI → GPUs

For internal services and tightly optimized models → TPUs

For massive cluster-level efficiency → TPU Pods

For broad developer ecosystems → GPUs

For power efficiency and cost efficiency → TPUs

For high-performance inference at scale → TPUs

For supporting a wide variety of models and workloads → GPUs

So the market is undergoing the following transition:

> “From a GPU-only market → to a hybrid world where GPUs and TPUs are deployed together.”

Meta’s moves have shaken the market especially hard,

because Meta operates one of the world’s largest AI clusters.

Once reports emerged that Meta was considering adopting TPUs,

the AI industry rapidly converged on a new conclusion:

Running everything on GPUs alone is too expensive

The larger the Llama series becomes, the more attractive TPUs look

TPUs are no longer just Google’s internal chip—they could be shared across big tech

As TPUs scale out, HBM demand grows exponentially

In the end, the core message the 2025 market is converging on is this:

> “The real battle is not GPUs vs TPUs.

> It’s about who secures **more HBM**.”

        
📘 Part 2

“The Age of HBM” — SK hynix and Samsung Tighten Their Grip on Memory Power

---

📘 Part 2. “The Age of HBM” — SK hynix and Samsung Tighten Their Grip on Memory Power

One of the biggest misconceptions in the 2025 AI semiconductor market

is the idea that “chip performance is ultimately defined by GPU or TPU cores.”

On the ground, engineers say something very different:

> “For modern AI chips, memory—not cores—is what decides real performance.”

That may sound like an exaggeration,

but if you actually look at how models like GPT-4o, Gemini 3, and Llama 3.1 run,

most of the bottlenecks show up not in raw compute,

but in **memory bandwidth**.

As AI models grow larger, they demand:

More parameters

Longer context windows

Larger batch sizes

More multimodal data

all being processed at once.

That is exactly where **HBM (High Bandwidth Memory)** comes in.

HBM is, quite literally, an “ultra-high-speed data highway”

and now accounts for a huge share of what determines GPU/TPU performance.

---

🔥 1. Why HBM has become the most critical resource of the AI era

Unlike conventional DRAM,

HBM uses a 3D stack of memory dies built vertically.

This is made possible by **TSV (Through-Silicon Via)** technology—

a process extremely difficult in terms of precision and yield control,

to the point where, among the four or five companies involved,

only two Korean firms can currently mass-produce it reliably at scale.

Here’s why HBM is decisive in AI chips:

GPUs and TPUs perform hundreds to thousands of trillions of operations per second

If the “data feed” to those operations is too slow,

the chips can never reach their theoretical performance

In fact, a significant portion of the H100’s real-world bottlenecks

come from HBM bandwidth constraints

For extremely large models, more HBM bandwidth means higher accuracy,

higher throughput, and better efficiency—all at once

As a result, chipmakers, cloud providers, and AI developers

are all saying essentially the same thing:

> “Without HBM, neither GPUs nor TPUs matter.”

---

🏆 2. The dominant players in the 2025 HBM market — SK hynix and Samsung

(Compiled from securities and IB research)

The current global HBM landscape looks roughly like this:

✔ Global HBM market share

SK hynix: #1

Samsung Electronics: #2

Micron: #3

On paper, that looks like a simple ranking.

In reality, the gap is much bigger.

The HBM market is not just about manufacturing;

it also involves:

TSV yield management

Advanced packaging

Customer qualification

Compatibility with GPUs and TPUs

Long-term supply agreements

all combined into one integrated value chain.

That makes HBM an industry where new entrants

are practically nonexistent.

SK hynix in particular

has dominated the HBM space with performance-leading products

from HBM2E to HBM3 and HBM3E,

earning extremely high levels of trust across the industry.

---

📦 3. Capacity tells the story of “overwhelming dominance”

(Estimates for the end of 2025)

The coldest, clearest metric is **production capacity (WPM: wafers per month)**.

◼ Monthly HBM production capacity (WPM)

SK hynix: 160,000 wafers / month

Samsung Electronics: 150,000 wafers / month

Micron: 55,000 wafers / month

These numbers speak for themselves.

Micron’s capacity is only about one-third that of the Korean players.

On top of that, its relative lack of TSV experience

makes it more difficult to respond quickly and competitively

to the HBM4 generation and beyond.

The market has already reached its verdict:

> “From 2025 to 2027, as AI infrastructure demand explodes,

two Korean companies will effectively act as the only meaningful ‘HBM suppliers’.”

---

🔍 4. TPUs and HBM — why SK hynix stands to gain the most

The key property of TPUs is that

they generally require more HBM than GPUs

and are even more dependent on memory bandwidth.

That’s why, in the TPU supply chain,

Google has effectively positioned SK hynix

as its first-tier partner.

Google’s TPU supply structure

7th-generation TPU (Ironwood) → 8-high HBM3E, with SK hynix as primary supplier

Enhanced 7e generation → projected 12-high HBM3E with SK hynix as exclusive supplier

(based on BofA Global Research analysis)

SK hynix is also in a strong position with:

AWS

Broadcom

Other ASIC customers

The rise of ASIC-based AI accelerators,

outside of the NVIDIA ecosystem,

is especially favorable for SK hynix,

because it means AI compute is proliferating

far beyond the boundaries of traditional GPUs.

---

🧲 5. Samsung’s counterattack — firmly entering the NVIDIA supply chain

Samsung was slower than SK hynix,

but between 2024 and 2025 it finally passed NVIDIA’s HBM qualification process

and is now in the B100 and B200 supply chain.

This is a very meaningful milestone.

NVIDIA still controls roughly **80–90%** of the world’s AI GPU market,

so simply being inside NVIDIA’s supply chain

almost guarantees a stable volume of demand

over the next several years.

Samsung is a fully integrated semiconductor player, with:

Foundry

Advanced packaging

DRAM

HBM

all under one roof,

which positions it to benefit from both TPUs and GPUs

as AI infrastructure expands.

---

🚀 6. The arrival of HBM4 — the game-changer for 2026–2027

At SEDEX 2025, SK hynix unveiled

a physical demo of HBM4 (the next, 6th-generation HBM).

HBM4 offers:

Even higher bandwidth

Lower power consumption

More memory layers per stack

making it the “baseline spec” for AI infrastructure competition

from 2026 onward.

Once HBM4 enters full mass production,

it will underpin:

Google’s 8th-generation TPUs

NVIDIA’s next-gen X-series GPUs

Custom AI chips from AWS and Meta

In other words, the HBM4 era

is likely to be the point at which Korean companies

move even further up the value chain

in AI infrastructure.

---

📌 7. Bottom line — As the HBM market grows,

Korean semiconductor influence scales non-linearly

As the TPU market grows, the HBM market grows.

As the HBM market grows,

the influence of SK hynix and Samsung at its core

expands naturally and non-linearly.

The structure of AI chips can now be summarized as:

> “It’s not the cores, but the bandwidth that defines AI chip performance—

> and bandwidth, ultimately, is defined by HBM.”

And right now, the only country that can mass-produce HBM

at the required scale and reliability

is, effectively, South Korea.

This dynamic is likely to dominate the entire AI industry

from 2025 through 2026–2027.

The “AI chip war” is quietly shifting into a new form:

it’s no longer about who wins among GPU vendors or TPU vendors,

but about which companies can manufacture the most HBM

and keep it flowing reliably into the world’s data centers.

        
📘 Part 3

TPU Architecture vs. GPUs — Moving from Competition to Coexistence

---

📘 Part 3. TPU Architecture vs. GPUs — Moving from Competition to Coexistence

One of the most common misunderstandings

when people look at the AI infrastructure market in 2025

is the belief that

“TPUs and GPUs are in a winner-takes-all battle, and only one will survive.”

On the surface, it can feel that way:

Google’s TPUs have made rapid performance gains

and are rising fast,

while NVIDIA’s GPUs still hold an undeniable dominant position.

But once you look more closely at the actual architecture,

it becomes clear that the two technologies

play fundamentally different, non-substitutable roles.

Understanding this helps explain why, beyond 2025,

the AI chip market is more about **coexistence** than zero-sum competition.

---

🔹 1) TPUs are specialized; GPUs are general-purpose — different goals from day one

The single most important lens for understanding AI chips

is their **design objective**.

✔ GPUs (NVIDIA) — “the general-purpose engine that runs everything”

GPUs were originally designed for gaming, graphics, and 3D rendering.

But because their parallel compute capabilities were so strong,

they naturally took over much of the AI workload

once deep learning arrived.

GPUs are now a central component in:

Image and video processing

Physical simulations

Game engines

Autonomous driving

LLM training and inference

Scientific computing and quantitative finance

On top of that, NVIDIA’s CUDA ecosystem

has drawn in the vast majority of AI developers worldwide.

In short, the GPU’s biggest advantage is:

> “It can handle almost any workload,

> and it owns the developer ecosystem.”

---

✔ TPUs (Google) — “a custom engine that makes specific operations blazingly fast”

TPUs, on the other hand, were born with a very different mission.

They are ASICs (application-specific integrated circuits),

designed by Google to drive one thing extremely well:

the **tensor/matrix operations** at the heart of deep learning.

In other words, TPUs are optimized to run

Google’s own massive internal workloads at minimum cost and maximum efficiency:

Search algorithms

Ad recommendation systems

YouTube personalization

Google Translate

The Gemini model family

Viewed from that angle, Google’s decision framework is simple:

> “We don’t need to run these billions of identical operations

> on a general-purpose GPU architecture.

> We can design chips tailored to our services.”

That’s why TPUs can deliver higher power efficiency than GPUs

for specific model architectures,

and why, in some use cases,

they can cut costs by **30–50%** compared with GPUs.

This is the fundamental reason

Google has never stopped investing in TPUs.

---

🔹 2) TPU generational performance — step-function improvements

With each generation,

TPUs have improved not just incrementally,

but structurally.

Based on Google Cloud TPU documentation and public information,

the core milestones look like this:

TPU generation    Key characteristics

v2 (2017)         45 TFLOPS

v3 (2018)         90 TFLOPS (2x v2) + liquid cooling

v4 (2021)         ~275 TFLOPS architecture, large-scale Pods

v5e (2023)        3x efficiency vs. v4, for both training and inference

v5p (Q4 2023)     Designed for large-scale LLM training, 2.8x over v4

7th-gen Ironwood (2025)   HBM3E, major upgrades to power efficiency and bandwidth

The 7th-generation TPU, Ironwood,

has been used and proven in production

for training Gemini 2 and 3,

and is highly rated in terms of “real” performance—throughput at scale.

A crucial point here is:

> “TPUs are designed to achieve their true performance

> not as individual chips, but as Pods—large-scale clusters.”

In other words, TPUs are not about having a single, ultra-strong chip.

They are about achieving optimal performance

when thousands of them are tightly connected.

---

🔹 3) TPUs vs. NVIDIA’s B100/B200 — two engines in the same domain, with different roles

As of 2025, NVIDIA’s B200 is the flagship GPU in the AI space.

If we break down where each side has an edge, the picture looks like this:

✔ Single-chip peak performance

B200 is clearly ahead

Its FP8 performance, HBM3e configuration, and NVLink interconnect

maximize the traditional strengths of the GPU architecture

✔ Large-scale cluster efficiency

TPU Pods often have the upper hand

Google’s control over the network topology and software stack

enables extremely tight integration at the cluster level

✔ Power and cost efficiency

TPUs come out ahead

As ASICs, they can reduce power consumption

for the same workload compared with GPUs

✔ Versatility

GPUs are overwhelmingly superior

They are compatible with virtually every model,

service, and platform in the AI ecosystem

Summarizing that, you get:

> “NVIDIA’s GPUs are the beating heart of global AI,

> while Google’s TPUs are the core engine

> for ultra-large models and search/ad/recommendation systems.”

This is why the relationship between GPUs and TPUs

is fundamentally one of **coexistence**, not direct replacement.

Each fills in the gaps the other leaves,

and together they expand the overall AI market.

        
📘 Part 4

Conclusion — The Real Battle in 2025 Is Not Chips, But HBM

---

📘 Part 4. Conclusion — The Real Battle in 2025 Is Not Chips, But HBM

Putting together the latest news, official documents, and company disclosures,

one thing becomes clear:

the core issue in the 2025 AI market

is not whether GPUs or TPUs “win.”

What really matters is:

how much HBM each player can secure.

Unless the basic structure of giant LLMs changes dramatically,

Model sizes will keep increasing

Context windows will keep getting longer

Multimodal data will keep growing

Inference requests will keep surging

—and all of that translates directly

into an explosive increase in HBM demand.

The structure can be summarized like this:

◼ Continued upgrades in GPT, Gemini, and Llama

→ Require more GPUs and more TPUs

◼ More GPUs and more TPUs

→ Require far more HBM

◼ HBM market

→ Dominated by SK hynix and Samsung as a two-player oligopoly

◼ Micron

→ With limited capacity and slower TSV ramp, effectively pushed into a secondary role

In that sense, the GPU vs. TPU competition is secondary.

The real core of the AI semiconductor market

is the emerging **memory power game**.

And as of now,

the companies holding that power

are SK hynix and Samsung.

Tech giants like NVIDIA, Google, Meta, AWS, and Broadcom

are all racing to sign long-term agreements with Korean suppliers

for one simple reason:

you cannot build or operate AI services at scale

without a stable supply of HBM.

From 2025 through 2027,

we are already in the middle of a structural shift

in which some of the industry’s leverage

is moving away from chip designers

and toward memory manufacturers.

---

📝 Reference Notes

(In line with the request to avoid direct quotations,

these are summarized pointers to official and primary sources.)

Google Cloud TPU Architecture Docs (performance data for v2–v5p generations)

Google Cloud Next 2025 sessions (announcement and technical overview of 7th-gen TPU “Ironwood”)

NVIDIA official documentation (B100/B200 architecture and HBM3e configurations)

SK hynix and Samsung Electronics IR materials and press releases (HBM3E/HBM4 specs and roadmap)

Yonhap News: coverage on “Each TPU requiring 6–8 HBM stacks and SK hynix as primary supplier” (Nov 2025)

Equity research from Meritz, Korea Investment & Securities, UBS, BofA, HSBC on the global HBM market

        
```0

댓글

이 블로그의 인기 게시물

Why Foreign Investors Pulled Out $12 Billion From KOSPI in November — The Real AI, FX, and Risk Cycle Behind the Sell-Off

Energy Transition & the Battery-Metals Supercycle: A New EV Order

How the Fed, FOMC, FRB and FRBNY Really Set U.S. Interest Rates – A Complete Guide for Korean Investors