AI Hardware: A Lot of Marketing, a Bit of Reality

We’ve all seen the recent ads: Apple pushing iPhones that support Apple Intelligence, Microsoft adding a Copilot key to some keyboards, and suddenly every new machine being described as “AI ready.”

Let’s be honest: part of that pitch feels like companies are trying to sell AI the same way they sell a better graphics card for gaming. More FPS, more pixels, more AI. Except it is not that simple.

Artificial intelligence, especially large language models like ChatGPT, Claude, or Gemini, is one of the most demanding compute workloads consumer tech has ever had to explain to normal people. To answer a single prompt, a model has to push billions of parameters through GPUs or specialized accelerators. That creates heat, consumes power, needs memory, and does not look much like normal phone usage.

The gaming comparison

One intuition needs correcting: one prompt is not literally the same thing as a full GTA V session.

A gaming PC pulling roughly 250 W for an hour consumes around 250 Wh. A text request on a large cloud model, depending on prompt length, answer length, and the model being used, can land somewhere from fractions of a Wh to a few Wh. One study on LLM inference energy estimated a median of 0.34 Wh for a frontier-model query, and 4.32 Wh for much longer queries with more test-time compute.

So no, a single prompt is not necessarily “heavier” than an evening of gaming. The real issue is scale. One person asking one question is not the problem. Hundreds of millions of people asking billions of questions, generating images, summarizing videos, running agents, and doing it again tomorrow is where this becomes serious infrastructure.

The International Energy Agency estimated that data centers used around 415 TWh of electricity globally in 2024, and that this could rise beyond 900 TWh by 2030. AI is not the only reason, but it is clearly one of the major drivers.

A phone is not a data center

This is where the marketing gets weird.

If our phones can still struggle to run a demanding game for a long time without heating up, we should be careful when marketing implies that a phone is suddenly going to run a full ChatGPT or Claude equivalent locally.

Yes, modern chips have NPUs, or Neural Processing Units. Yes, they can accelerate specific tasks: image recognition, transcription, photo cleanup, local summaries, camera effects, small assistant features. And yes, Apple officially lists Apple Intelligence support for iPhone 15 Pro models, iPhone 16 models, and later devices, with on-device models that download to the device.

But that does not mean the phone replaces the infrastructure behind a large LLM. In many cases, the hardware mainly improves interaction, privacy for certain local tasks, or the efficiency of smaller features. The heavy processing still often lives on machines that are massively more powerful than anything we put in a pocket.

In short: a button, a camera, or a shortcut does not magically turn a device into an AI server.

The Copilot key and the camera button

Microsoft’s Copilot key is a good example. Microsoft introduced it as a new entry point into AI on Windows, and there is also a real Copilot+ PC category with an NPU above 40 TOPS. So it is not completely empty: some features really do depend on specialized hardware.

But a button is still a button.

It can make access faster. It can create a new user habit. It can even make sense if the OS is built around local AI features. But the button is not what makes the AI. The real system is the models, software, chips, servers, data, and integrations behind it.

Same thing with the iPhone 16 and its camera-related controls or Visual Intelligence. Personally, I do not find that revolutionary, but I can admit it fits in the same category: if the interface helps capture context for an AI system faster, then the hardware is part of the experience.

But by that logic, any good interface could be called “AI ready.” If a phone app lets me talk to a model faster, does that suddenly make the app an AI hardware revolution? I think that is a bit much.

The marketing moved too fast

That is exactly where the discomfort starts. Big tech companies sometimes push AI hardware so aggressively that it feels like the machine itself has been transformed, when in many cases we are really talking about software, cloud processing, and a few local accelerations.

Apple ended up in a major controversy around this. In May 2026, the company agreed to pay $250 million to settle a U.S. class action lawsuit over how Siri and Apple Intelligence were marketed around the iPhone 16 and some compatible iPhones. Apple did not admit wrongdoing, but the message is still pretty clear: selling AI promises before the features are actually ready can get expensive.

That does not mean recent phones are identical to older ones. Chips change, memory changes, accelerators change. But there is a big difference between “this phone can run some AI features” and “this phone is a revolutionary AI machine.”

The nuance: local LLMs are coming

This is where I want to add some nuance, because things are moving fast.

Two years ago, saying that a normal consumer would run a local LLM sounded almost like science fiction. Today, that is no longer true. You can run smaller models like Gemma, Phi, Qwen, or Llama on powerful personal computers. Google, for example, documents Gemma models ranging from 270M to 27B parameters, with memory requirements that vary heavily depending on size and quantization.

Small asterisk: when I say “powerful personal computer,” I do not mean the old family laptop used mostly for email. For a decent experience, we are often talking about 16 to 32 GB of RAM, sometimes more, and ideally a GPU or a very efficient chip. For a developer, that is realistic. For a normal user, it is still a lot.

And even then, installing a local LLM is not like opening ChatGPT and selecting a model from a dropdown. You need to understand model size, quantization, available memory, inference speed, context length, benchmarks, and the actual use case. If the computer starts swapping or the model eats all the memory, the experience gets painful very quickly.

So yes, AI hardware is becoming real. But it is not yet as simple, invisible, and consumer-friendly as the marketing often suggests.

I am optimistic, but careful

I am not against specialized AI hardware. Actually, I am very interested in it. NPUs, more efficient GPUs, unified memory, smaller models, better optimization: all of this will probably make local AI much more useful over the next few years.

I would not be surprised if, in five years or even sooner, we talk about AI hardware in a much more concrete way. Capable local models, real offline features, better privacy guarantees, assistants that do not always depend on the cloud: that is where things get interesting.

But right now, I still feel like the marketing is often selling the trophy before the race is finished. AI is pushing hardware forward, that part is true. But the hardware sold today as “AI hardware” is often mostly an interface to systems that are much bigger, much more expensive, and much more energy-hungry.

What about you? Have you already fallen for AI marketing that was a little too enthusiastic?