What is Edge AI / On-Device AI? | AI Dictionary

What is Edge AI?

Edge AI (also called on-device AI) refers to running AI models directly on local devices — smartphones, laptops, IoT devices, embedded systems — rather than sending data to cloud servers for processing. The model inference happens at the "edge" of the network, close to where the data is generated.

Why It Matters

Edge AI enables AI functionality without internet connectivity, with lower latency, better privacy, and reduced cloud costs. Apple Intelligence runs on iPhone, Google's Gemini Nano runs on Pixel, and Microsoft's Copilot+ PCs process AI locally. As models get smaller and more efficient (quantization, distillation), edge AI is becoming the default for many consumer applications.

How It Works

Why run AI on-device?

Privacy — data never leaves the device (voice commands, health data, photos)
Latency — instant responses without network round-trips (real-time object detection, AR)
Reliability — works offline (aircraft systems, remote sensors)
Cost — no cloud API fees for inference
Bandwidth — process data locally instead of streaming to cloud

Making models fit on-device:

Quantization — reduce weight precision from 32-bit to 8-bit or 4-bit (75-87% size reduction)
Knowledge distillation — train a small "student" model to mimic a large "teacher" model
Pruning — remove unnecessary weights and connections
Small model architectures — models designed for edge: Gemma Nano (1.8B), Phi-3 Mini (3.8B), LLaMA 3.2 (1B/3B)

Hardware accelerators:

NPU (Neural Processing Unit) — dedicated AI chip in modern phones and laptops (Apple Neural Engine, Qualcomm Hexagon)
GPU — mobile GPUs can run smaller models
TPU Edge — Google's edge-specific tensor processing unit
Specialized chips — Intel Movidius, NVIDIA Jetson for IoT

Hybrid edge-cloud:

Simple tasks run on-device; complex tasks are routed to the cloud
Apple Intelligence uses this pattern: basic Siri queries are local, complex ones use Apple's cloud

Example

Apple Intelligence on iPhone uses on-device models to suggest email replies, summarize notifications, and generate images — all without sending your data to Apple's servers. Only complex queries that exceed the on-device model's capability are sent to Apple's Private Cloud Compute, where they're processed with strong privacy guarantees.

What is Edge AI?

Why It Matters

How It Works

Why run AI on-device?

Privacy — data never leaves the device (voice commands, health data, photos)
Latency — instant responses without network round-trips (real-time object detection, AR)
Reliability — works offline (aircraft systems, remote sensors)
Cost — no cloud API fees for inference
Bandwidth — process data locally instead of streaming to cloud

Making models fit on-device:

Quantization — reduce weight precision from 32-bit to 8-bit or 4-bit (75-87% size reduction)
Knowledge distillation — train a small "student" model to mimic a large "teacher" model
Pruning — remove unnecessary weights and connections
Small model architectures — models designed for edge: Gemma Nano (1.8B), Phi-3 Mini (3.8B), LLaMA 3.2 (1B/3B)

Hardware accelerators:

NPU (Neural Processing Unit) — dedicated AI chip in modern phones and laptops (Apple Neural Engine, Qualcomm Hexagon)
GPU — mobile GPUs can run smaller models
TPU Edge — Google's edge-specific tensor processing unit
Specialized chips — Intel Movidius, NVIDIA Jetson for IoT

Hybrid edge-cloud:

Simple tasks run on-device; complex tasks are routed to the cloud
Apple Intelligence uses this pattern: basic Siri queries are local, complex ones use Apple's cloud

What is Edge AI?

What is Edge AI?

Why It Matters

How It Works

Example

Sources

What is Edge AI?

What is Edge AI?

Why It Matters

How It Works

Example

Sources