What's Trending on Ollama Right Now (February 2026)
The local AI scene moves fast. Here's what's topping the charts on Ollama this month — from massive reasoning models to lightweight on-device runners.
The State of Ollama in February 2026
Ollama has become the default way to run AI models locally. Every month, new models land on the platform — and this February has been especially busy. The standout theme? Agentic AI — models designed not just to chat, but to reason through complex problems and take actions.
Here's what's trending right now, sorted by popularity.
🥇 GLM-5 — The 744B Reasoning Giant
19.6K pulls | Cloud-only | From Z.ai
GLM-5 is a massive 744 billion parameter model with only 40 billion active at any time (using Mixture of Experts). It's built for complex systems engineering and long-horizon reasoning tasks — think multi-step problem solving that requires planning across dozens of actions.
Who it's for: Researchers and advanced users running cloud-connected Ollama instances. This isn't a model you'll run on a consumer GPU — it's a preview of where frontier-class open models are heading.
🥈 MiniMax-M2.5 — The Productivity Powerhouse
17.6K pulls | Cloud-only | From MiniMax
MiniMax-M2.5 is designed for real-world productivity: writing, analysis, summarisation, and coding. It's gaining traction as a practical workhorse for professionals who need reliable, fast responses for everyday tasks.
Who it's for: Professionals who want a capable general-purpose model for writing and coding tasks. Currently cloud-only on Ollama, but worth watching for local versions.
🥉 GLM-4.7-Flash — Lightweight Champion
227.9K pulls | Runs locally | From Z.ai
This is the one that matters most for local users. GLM-4.7-Flash is being called the strongest model in the 30B class — and at 227K+ pulls, it's clearly resonating. It supports tool use and thinking modes, meaning it can reason step-by-step through complex problems while still running on consumer hardware.
Who it's for: Anyone with an 8-12 GB GPU who wants a local model that punches well above its weight. This is one of the best local models available right now.
💻 Qwen3-Coder-Next — Local Coding Agent
80K pulls | Cloud | From Alibaba Qwen
Alibaba's Qwen team has released a coding-specific model optimised for agentic coding workflows — meaning it doesn't just generate code, it can plan, reason about code structure, and iterate on solutions. It's designed for local development workflows where you need an AI that understands your codebase context.
Who it's for: Developers who want a dedicated coding assistant. Pair it with Overlay for voice-driven coding prompts.
👁️ Kimi K2.5 — Multimodal + Agentic
66.7K pulls | Cloud | From Moonshot AI
Kimi K2.5 is notable because it combines vision and language understanding with agentic capabilities. It can look at images, understand them, and take actions based on what it sees — all while supporting both instant and step-by-step thinking modes.
Who it's for: Users who need multimodal AI — processing screenshots, documents, diagrams, and photos alongside text prompts.
📱 LFM2.5-Thinking — Built for Your Device
65.6K pulls | Runs locally (1.2B+) | From Liquid AI
This is the most exciting trend on the list. LFM2.5 is a hybrid model family specifically designed for on-device deployment. The smallest version is just 1.2 billion parameters — small enough to run on almost any modern computer, even without a dedicated GPU. Yet it supports tool use and thinking capabilities.
Who it's for: Anyone interested in running AI locally on modest hardware. This represents the future of edge AI — capable models that fit in your pocket.
📄 GLM-OCR — AI That Reads Documents
Trending | Vision + Tools | From Z.ai
GLM-OCR is a multimodal model built specifically for complex document understanding. It can read scanned documents, handwritten notes, tables, receipts, and forms — extracting structured information from messy real-world documents. It supports vision input and tool use for processing pipelines.
Who it's for: Anyone who works with documents — accountants, researchers, administrators, legal professionals. Paired with local dictation, you could process documents entirely offline.
What This Tells Us
Three trends stand out from this month's Ollama charts:
- Agentic models are the new default — nearly every top model supports tool use and multi-step reasoning, not just chat
- The cloud-to-local pipeline is accelerating — models launch as cloud-only but quickly get quantised and optimised for consumer GPUs
- On-device AI is getting real — models like LFM2.5 prove that useful AI can run on minimal hardware
For privacy-conscious users, the message is clear: the models are getting smaller, smarter, and more accessible. What required a data centre two years ago is heading for your laptop. And tools like Vox Bar are already there — running frontier-quality transcription entirely on your GPU, no cloud required.
Join the local AI movement
Vox Bar: private transcription that runs on your hardware. No cloud. No subscription.
Coming Soon Early Bird