TPUs, HBM, and How SK hynix & Samsung Could Shift the Global AI Chip Power Balance
TPUs, HBM, and Korea’s Rising AI Semiconductor Power
An insight report on the first real crack in NVIDIA’s dominance and the strategic position of SK hynix and Samsung around HBM.
AI infrastructure · Semiconductor industry
📌 TPU · GPU · HBM · SK hynix · Samsung Electronics
📘 Part 1
The Real Shift in the 2025 AI Chip Market — A Massive Tectonic Move Triggered by TPUs
📘 Part 1. The Real Shift in the 2025 AI Chip Market — A Massive Tectonic Move Triggered by TPUs
At first glance, the global AI semiconductor market in 2025 still looks like NVIDIA’s one-man show.
Its financial results keep hitting record highs, data center GPU sales are not slowing down,
and all over the world you keep hearing the same complaints: “There are no H100s,” “There are no B100s.”
But if you look deeper inside the industry,
you start to see a very different current forming.
You can’t fully see it on the surface yet, but the direction has already begun to change.
🔹 At the center of that shift — the return of Google’s TPU
Google’s Tensor Processing Unit (TPU) was first introduced in 2015
as an in-house “AI accelerator” designed to run Google’s own services—
search, ads, YouTube recommendations, Maps, Translate—
as efficiently as possible.
In 2025, the story started to change.
Google’s latest AI model, **Gemini 3**, reached a level that could seriously threaten ChatGPT,
and the chip that trained it, the **7th-generation TPU (Ironwood)**,
suddenly became the focus of industry attention.
This is not just “Google launched another new chip.”
It’s the first visible crack in the near-10-year period
during which the global AI infrastructure market effectively ran on
a “NVIDIA-only” regime.
---
🧩 Why the era of “GPUs alone” is breaking down
Since the launch of ChatGPT, the world has seen an explosion
in the amount of data and computation generated by AI—
to the point where it has become almost impossible to keep up.
From GPT-3 → GPT-4: training compute increased by roughly 20x
From GPT-4 → GPT-4o: multimodal compute requirements surged
From Gemini Ultra → Gemini 3: model size, parameters, and context window all expanded
Meta’s Llama family: more models, and a sharp increase in training clusters
Trying to satisfy this exploding demand with GPUs alone
has created bottlenecks everywhere.
✔ GPU production cannot be ramped as quickly as demand demands
✔ It’s not just the chip; HBM, substrates, and packaging are all bottlenecks
✔ Power consumption has soared, and data center costs are exploding
✔ Pure-GPU clusters face worsening power-efficiency issues
By now, everyone in the industry accepts a simple truth:
> “A single GPU architecture cannot carry this entire market by itself.”
That’s exactly the point at which TPUs re-entered the picture.
---
🧠 Why TPUs are back in the spotlight — the sheer efficiency of specialization
Google was the first to recognize this.
> “Instead of forcing our models to fit a GPU,
it’s more efficient to build chips that fit our models.”
That’s why TPUs are fundamentally different from GPUs.
✔ GPUs
AI + graphics + scientific computing + gaming
A “general-purpose engine” that can do everything
✔ TPUs
Designed around tensor/matrix operations
Optimized for very large LLMs
Architected for search, ads, and YouTube recommendation workloads
A “sports-car engine” tuned for specific use cases
For example,
the types of operations central to large-scale search algorithms
or to sparse operations in ad recommendation systems
often run more efficiently on TPU architectures than on GPUs.
TPUs are also built from the ground up assuming large-scale clusters (Pods),
which means performance scales dramatically when you connect thousands of them.
---
⚡ 7th-Gen TPU “Ironwood” — a new benchmark for AI training
When Google unveiled
its 7th-generation TPU, Ironwood, at Google Cloud Next 2025 in Las Vegas,
the AI hardware world took notice.
The chip has several key characteristics:
Used to train Google’s Gemini 2 and 3 in production
Delivers a major improvement in power efficiency over TPU v5p
Reworked with a bandwidth-centric architecture
Equipped with 8-high HBM3E (with SK hynix as the primary supplier)
and projected to move to 12-high HBM3E in the enhanced 7e generation
One especially important point:
TPUs typically demand even more HBM than GPUs.
For today’s AI training, **memory bandwidth** has become more critical than raw FLOPs,
and TPUs, by design, impose extremely high demands on HBM I/O.
In practice, the configuration looks like this:
✔ 1 GPU → typically needs 6–8 HBM stacks
✔ 1 TPU → needs 6–8 HBM stacks, or more
(The exact ratio can be even higher depending on the architecture
Google and Broadcom roll out.)
The implication is straightforward:
> As the number of TPUs grows, HBM demand grows even faster
> than the incremental demand coming from GPUs alone.
---
🌍 The market is no longer “GPU vs TPU,” but “GPU + TPU”
On the surface, NVIDIA’s GPUs and Google’s TPUs look like competitors,
but their actual roles are different enough that they function as complements.
For general-purpose AI → GPUs
For internal services and tightly optimized models → TPUs
For massive cluster-level efficiency → TPU Pods
For broad developer ecosystems → GPUs
For power efficiency and cost efficiency → TPUs
For high-performance inference at scale → TPUs
For supporting a wide variety of models and workloads → GPUs
So the market is undergoing the following transition:
> “From a GPU-only market → to a hybrid world where GPUs and TPUs are deployed together.”
Meta’s moves have shaken the market especially hard,
because Meta operates one of the world’s largest AI clusters.
Once reports emerged that Meta was considering adopting TPUs,
the AI industry rapidly converged on a new conclusion:
Running everything on GPUs alone is too expensive
The larger the Llama series becomes, the more attractive TPUs look
TPUs are no longer just Google’s internal chip—they could be shared across big tech
As TPUs scale out, HBM demand grows exponentially
In the end, the core message the 2025 market is converging on is this:
> “The real battle is not GPUs vs TPUs.
> It’s about who secures **more HBM**.”
📘 Part 2
“The Age of HBM” — SK hynix and Samsung Tighten Their Grip on Memory Power
---
📘 Part 2. “The Age of HBM” — SK hynix and Samsung Tighten Their Grip on Memory Power
One of the biggest misconceptions in the 2025 AI semiconductor market
is the idea that “chip performance is ultimately defined by GPU or TPU cores.”
On the ground, engineers say something very different:
> “For modern AI chips, memory—not cores—is what decides real performance.”
That may sound like an exaggeration,
but if you actually look at how models like GPT-4o, Gemini 3, and Llama 3.1 run,
most of the bottlenecks show up not in raw compute,
but in **memory bandwidth**.
As AI models grow larger, they demand:
More parameters
Longer context windows
Larger batch sizes
More multimodal data
all being processed at once.
That is exactly where **HBM (High Bandwidth Memory)** comes in.
HBM is, quite literally, an “ultra-high-speed data highway”
and now accounts for a huge share of what determines GPU/TPU performance.
---
🔥 1. Why HBM has become the most critical resource of the AI era
Unlike conventional DRAM,
HBM uses a 3D stack of memory dies built vertically.
This is made possible by **TSV (Through-Silicon Via)** technology—
a process extremely difficult in terms of precision and yield control,
to the point where, among the four or five companies involved,
only two Korean firms can currently mass-produce it reliably at scale.
Here’s why HBM is decisive in AI chips:
GPUs and TPUs perform hundreds to thousands of trillions of operations per second
If the “data feed” to those operations is too slow,
the chips can never reach their theoretical performance
In fact, a significant portion of the H100’s real-world bottlenecks
come from HBM bandwidth constraints
For extremely large models, more HBM bandwidth means higher accuracy,
higher throughput, and better efficiency—all at once
As a result, chipmakers, cloud providers, and AI developers
are all saying essentially the same thing:
> “Without HBM, neither GPUs nor TPUs matter.”
---
🏆 2. The dominant players in the 2025 HBM market — SK hynix and Samsung
(Compiled from securities and IB research)
The current global HBM landscape looks roughly like this:
✔ Global HBM market share
SK hynix: #1
Samsung Electronics: #2
Micron: #3
On paper, that looks like a simple ranking.
In reality, the gap is much bigger.
The HBM market is not just about manufacturing;
it also involves:
TSV yield management
Advanced packaging
Customer qualification
Compatibility with GPUs and TPUs
Long-term supply agreements
all combined into one integrated value chain.
That makes HBM an industry where new entrants
are practically nonexistent.
SK hynix in particular
has dominated the HBM space with performance-leading products
from HBM2E to HBM3 and HBM3E,
earning extremely high levels of trust across the industry.
---
📦 3. Capacity tells the story of “overwhelming dominance”
(Estimates for the end of 2025)
The coldest, clearest metric is **production capacity (WPM: wafers per month)**.
◼ Monthly HBM production capacity (WPM)
SK hynix: 160,000 wafers / month
Samsung Electronics: 150,000 wafers / month
Micron: 55,000 wafers / month
These numbers speak for themselves.
Micron’s capacity is only about one-third that of the Korean players.
On top of that, its relative lack of TSV experience
makes it more difficult to respond quickly and competitively
to the HBM4 generation and beyond.
The market has already reached its verdict:
> “From 2025 to 2027, as AI infrastructure demand explodes,
two Korean companies will effectively act as the only meaningful ‘HBM suppliers’.”
---
🔍 4. TPUs and HBM — why SK hynix stands to gain the most
The key property of TPUs is that
they generally require more HBM than GPUs
and are even more dependent on memory bandwidth.
That’s why, in the TPU supply chain,
Google has effectively positioned SK hynix
as its first-tier partner.
Google’s TPU supply structure
7th-generation TPU (Ironwood) → 8-high HBM3E, with SK hynix as primary supplier
Enhanced 7e generation → projected 12-high HBM3E with SK hynix as exclusive supplier
(based on BofA Global Research analysis)
SK hynix is also in a strong position with:
AWS
Broadcom
Other ASIC customers
The rise of ASIC-based AI accelerators,
outside of the NVIDIA ecosystem,
is especially favorable for SK hynix,
because it means AI compute is proliferating
far beyond the boundaries of traditional GPUs.
---
🧲 5. Samsung’s counterattack — firmly entering the NVIDIA supply chain
Samsung was slower than SK hynix,
but between 2024 and 2025 it finally passed NVIDIA’s HBM qualification process
and is now in the B100 and B200 supply chain.
This is a very meaningful milestone.
NVIDIA still controls roughly **80–90%** of the world’s AI GPU market,
so simply being inside NVIDIA’s supply chain
almost guarantees a stable volume of demand
over the next several years.
Samsung is a fully integrated semiconductor player, with:
Foundry
Advanced packaging
DRAM
HBM
all under one roof,
which positions it to benefit from both TPUs and GPUs
as AI infrastructure expands.
---
🚀 6. The arrival of HBM4 — the game-changer for 2026–2027
At SEDEX 2025, SK hynix unveiled
a physical demo of HBM4 (the next, 6th-generation HBM).
HBM4 offers:
Even higher bandwidth
Lower power consumption
More memory layers per stack
making it the “baseline spec” for AI infrastructure competition
from 2026 onward.
Once HBM4 enters full mass production,
it will underpin:
Google’s 8th-generation TPUs
NVIDIA’s next-gen X-series GPUs
Custom AI chips from AWS and Meta
In other words, the HBM4 era
is likely to be the point at which Korean companies
move even further up the value chain
in AI infrastructure.
---
📌 7. Bottom line — As the HBM market grows,
Korean semiconductor influence scales non-linearly
As the TPU market grows, the HBM market grows.
As the HBM market grows,
the influence of SK hynix and Samsung at its core
expands naturally and non-linearly.
The structure of AI chips can now be summarized as:
> “It’s not the cores, but the bandwidth that defines AI chip performance—
> and bandwidth, ultimately, is defined by HBM.”
And right now, the only country that can mass-produce HBM
at the required scale and reliability
is, effectively, South Korea.
This dynamic is likely to dominate the entire AI industry
from 2025 through 2026–2027.
The “AI chip war” is quietly shifting into a new form:
it’s no longer about who wins among GPU vendors or TPU vendors,
but about which companies can manufacture the most HBM
and keep it flowing reliably into the world’s data centers.
📘 Part 3
TPU Architecture vs. GPUs — Moving from Competition to Coexistence
---
📘 Part 3. TPU Architecture vs. GPUs — Moving from Competition to Coexistence
One of the most common misunderstandings
when people look at the AI infrastructure market in 2025
is the belief that
“TPUs and GPUs are in a winner-takes-all battle, and only one will survive.”
On the surface, it can feel that way:
Google’s TPUs have made rapid performance gains
and are rising fast,
while NVIDIA’s GPUs still hold an undeniable dominant position.
But once you look more closely at the actual architecture,
it becomes clear that the two technologies
play fundamentally different, non-substitutable roles.
Understanding this helps explain why, beyond 2025,
the AI chip market is more about **coexistence** than zero-sum competition.
---
🔹 1) TPUs are specialized; GPUs are general-purpose — different goals from day one
The single most important lens for understanding AI chips
is their **design objective**.
✔ GPUs (NVIDIA) — “the general-purpose engine that runs everything”
GPUs were originally designed for gaming, graphics, and 3D rendering.
But because their parallel compute capabilities were so strong,
they naturally took over much of the AI workload
once deep learning arrived.
GPUs are now a central component in:
Image and video processing
Physical simulations
Game engines
Autonomous driving
LLM training and inference
Scientific computing and quantitative finance
On top of that, NVIDIA’s CUDA ecosystem
has drawn in the vast majority of AI developers worldwide.
In short, the GPU’s biggest advantage is:
> “It can handle almost any workload,
> and it owns the developer ecosystem.”
---
✔ TPUs (Google) — “a custom engine that makes specific operations blazingly fast”
TPUs, on the other hand, were born with a very different mission.
They are ASICs (application-specific integrated circuits),
designed by Google to drive one thing extremely well:
the **tensor/matrix operations** at the heart of deep learning.
In other words, TPUs are optimized to run
Google’s own massive internal workloads at minimum cost and maximum efficiency:
Search algorithms
Ad recommendation systems
YouTube personalization
Google Translate
The Gemini model family
Viewed from that angle, Google’s decision framework is simple:
> “We don’t need to run these billions of identical operations
> on a general-purpose GPU architecture.
> We can design chips tailored to our services.”
That’s why TPUs can deliver higher power efficiency than GPUs
for specific model architectures,
and why, in some use cases,
they can cut costs by **30–50%** compared with GPUs.
This is the fundamental reason
Google has never stopped investing in TPUs.
---
🔹 2) TPU generational performance — step-function improvements
With each generation,
TPUs have improved not just incrementally,
but structurally.
Based on Google Cloud TPU documentation and public information,
the core milestones look like this:
TPU generation Key characteristics
v2 (2017) 45 TFLOPS
v3 (2018) 90 TFLOPS (2x v2) + liquid cooling
v4 (2021) ~275 TFLOPS architecture, large-scale Pods
v5e (2023) 3x efficiency vs. v4, for both training and inference
v5p (Q4 2023) Designed for large-scale LLM training, 2.8x over v4
7th-gen Ironwood (2025) HBM3E, major upgrades to power efficiency and bandwidth
The 7th-generation TPU, Ironwood,
has been used and proven in production
for training Gemini 2 and 3,
and is highly rated in terms of “real” performance—throughput at scale.
A crucial point here is:
> “TPUs are designed to achieve their true performance
> not as individual chips, but as Pods—large-scale clusters.”
In other words, TPUs are not about having a single, ultra-strong chip.
They are about achieving optimal performance
when thousands of them are tightly connected.
---
🔹 3) TPUs vs. NVIDIA’s B100/B200 — two engines in the same domain, with different roles
As of 2025, NVIDIA’s B200 is the flagship GPU in the AI space.
If we break down where each side has an edge, the picture looks like this:
✔ Single-chip peak performance
B200 is clearly ahead
Its FP8 performance, HBM3e configuration, and NVLink interconnect
maximize the traditional strengths of the GPU architecture
✔ Large-scale cluster efficiency
TPU Pods often have the upper hand
Google’s control over the network topology and software stack
enables extremely tight integration at the cluster level
✔ Power and cost efficiency
TPUs come out ahead
As ASICs, they can reduce power consumption
for the same workload compared with GPUs
✔ Versatility
GPUs are overwhelmingly superior
They are compatible with virtually every model,
service, and platform in the AI ecosystem
Summarizing that, you get:
> “NVIDIA’s GPUs are the beating heart of global AI,
> while Google’s TPUs are the core engine
> for ultra-large models and search/ad/recommendation systems.”
This is why the relationship between GPUs and TPUs
is fundamentally one of **coexistence**, not direct replacement.
Each fills in the gaps the other leaves,
and together they expand the overall AI market.
📘 Part 4
Conclusion — The Real Battle in 2025 Is Not Chips, But HBM
---
📘 Part 4. Conclusion — The Real Battle in 2025 Is Not Chips, But HBM
Putting together the latest news, official documents, and company disclosures,
one thing becomes clear:
the core issue in the 2025 AI market
is not whether GPUs or TPUs “win.”
What really matters is:
how much HBM each player can secure.
Unless the basic structure of giant LLMs changes dramatically,
Model sizes will keep increasing
Context windows will keep getting longer
Multimodal data will keep growing
Inference requests will keep surging
—and all of that translates directly
into an explosive increase in HBM demand.
The structure can be summarized like this:
◼ Continued upgrades in GPT, Gemini, and Llama
→ Require more GPUs and more TPUs
◼ More GPUs and more TPUs
→ Require far more HBM
◼ HBM market
→ Dominated by SK hynix and Samsung as a two-player oligopoly
◼ Micron
→ With limited capacity and slower TSV ramp, effectively pushed into a secondary role
In that sense, the GPU vs. TPU competition is secondary.
The real core of the AI semiconductor market
is the emerging **memory power game**.
And as of now,
the companies holding that power
are SK hynix and Samsung.
Tech giants like NVIDIA, Google, Meta, AWS, and Broadcom
are all racing to sign long-term agreements with Korean suppliers
for one simple reason:
you cannot build or operate AI services at scale
without a stable supply of HBM.
From 2025 through 2027,
we are already in the middle of a structural shift
in which some of the industry’s leverage
is moving away from chip designers
and toward memory manufacturers.
---
📝 Reference Notes
(In line with the request to avoid direct quotations,
these are summarized pointers to official and primary sources.)
Google Cloud TPU Architecture Docs (performance data for v2–v5p generations)
Google Cloud Next 2025 sessions (announcement and technical overview of 7th-gen TPU “Ironwood”)
NVIDIA official documentation (B100/B200 architecture and HBM3e configurations)
SK hynix and Samsung Electronics IR materials and press releases (HBM3E/HBM4 specs and roadmap)
Yonhap News: coverage on “Each TPU requiring 6–8 HBM stacks and SK hynix as primary supplier” (Nov 2025)
Equity research from Meritz, Korea Investment & Securities, UBS, BofA, HSBC on the global HBM market

댓글
댓글 쓰기