OfflineLLM 5: Local AI Gets a Lot More Useful With MCP and Agents

Bilaal Rashid

07 June 2026

When people talk about local AI, the conversation usually revolves around privacy.

That’s understandable. Running an AI model on your own device means your chats, files, and prompts stay under your control. Nothing needs to be sent to a cloud server.

But privacy alone isn’t enough.

The biggest advantage of cloud AI services has never been the models themselves. It’s everything around them. They can search the web, access your calendar, interact with your tools, and automate tasks.

Local AI has traditionally been isolated.

With OfflineLLM 5, we’re closing that gap.

Today we’re releasing the biggest update in OfflineLLM’s history, introducing Model Context Protocol (MCP) support, new chat personalities, and significant performance improvements across our custom Apple Silicon execution engine.

MCP Support

The headline feature in OfflineLLM 5 is support for Model Context Protocol, better known as MCP.

If you’ve been following the AI space recently, you’ve probably seen MCP becoming the standard way for AI models to connect to external tools and services.

In practical terms, MCP allows your local AI assistant to do much more than answer questions.

You can connect OfflineLLM to services such as:

GitHub
Zapier
Atlassian
Calendar
Reminders
Apple Music
Web Search
And any others that you choose to bring along

Instead of being limited to the information already inside a conversation, your AI assistant can interact with the tools you use every day.

The result is something that feels less like a chatbot and more like an assistant.

From Chatbot to AI Agent

One of the most interesting things about MCP is that it changes what local AI can actually do.

You can ask a model to search for information, access project documentation, manage tasks, trigger automations, or interact with external services through connected tools.

For a long time, these kinds of workflows were mostly associated with cloud-based AI platforms.

Now they’re available in OfflineLLM while still giving users control over their models, data, and integrations.

As AI agents become more capable, we believe privacy will become even more important. The more responsibility you give an AI system, the more valuable it becomes to know where your data is going.

Faster Across the Board

Performance has always been a major focus for OfflineLLM.

Version 5 includes a number of improvements to our custom execution engine, resulting in faster generation speeds, lower latency, and a more responsive experience throughout the app.

OfflineLLM is built specifically for Apple Silicon and is designed to take full advantage of Apple’s GPU architecture. Every release includes ongoing optimisation work, and Version 5 continues that trend.

Whether you’re chatting with a model, analysing documents through RAG, using voice chat, or working with vision models, interactions should feel noticeably faster.

New Chat Personalities

Not every conversation needs the same kind of assistant.

OfflineLLM 5 introduces built-in chat personalities that make it easy to tailor your AI assistant to different tasks without manually editing system prompts. The initial options include Assistant, a balanced general-purpose helper for productivity, research, and everyday questions; Coder, which is tuned for programming, debugging, and technical problem-solving; and Storyteller, designed for creative writing, brainstorming, roleplay, and long-form content generation. We’ll continue expanding these personalities and capabilities in future updates.

Everything Else

While MCP is the star of this release, OfflineLLM continues to support everything users already rely on:

Run AI models entirely on-device
Install your own models, including DeepSeek, Llama, Qwen, Gemma, Phi, Mistral, and more
Live Voice Chat with two-way conversations
Vision models for image understanding
Retrieval Augmented Generation (RAG)
Apple Intelligence integration
OpenAI-compatible API support
Siri Shortcuts
Widgets
Advanced model configuration
Custom system prompts
Beginner and Advanced modes
No ads, tracking, or subscriptions

And, of course, everything continues to work with privacy as the default.

Why We Built OfflineLLM

We started OfflineLLM because we believed AI shouldn’t require handing over your data to a remote server.

That belief hasn’t changed.

What has changed is what’s possible on modern Apple hardware.

Every year, iPhones, iPads, Macs, and Vision Pro devices become more capable of running increasingly powerful AI models locally. Features that once required expensive cloud infrastructure can now run directly on consumer devices.

OfflineLLM exists to take advantage of that shift.

Our goal is simple: give users access to powerful AI without requiring them to trade away privacy, ownership, or control.

Available Now

OfflineLLM 5 is available now from the App Store for iPhone, iPad, Mac, and Apple Vision Pro.

If you’ve ever wanted the flexibility of modern AI tools without relying on the cloud, this is the biggest step forward we’ve made so far.

We’re excited to see what you build with it.