Each card shows agent capability, Model Context Protocol (MCP) support, minimum RAM, and supported platforms.
S
LLM Runner
Local LLM runner (CLI + API)
The standard for running open models on your own machine.
AgenticLow
MCPPlugin
RAM8GB (small) · 32GB+ for 70B class
PlatformMac · Linux · Windows
Single-command install, pulls models like Docker images. Runs Llama, Qwen, DeepSeek, Gemma, Mistral, etc. on Mac / Linux / Windows. Has a REST API so other tools can hit it.
Free (open source)
Redundancy check — Overlaps LM Studio (CLI vs GUI). Most watchmen pick one.
Best for: technical watchmen comfortable with the terminal; the foundation other local tools build on.
Start here. Most-used local runner for a reason — simple, stable, fast.
S
LLM Runner
Local LLM runner (GUI app)
The watchman-friendly way to run local models.
AgenticLow
MCPYes
RAM8GB · 32GB+ for 70B class
PlatformMac · Linux · Windows
Desktop app for Mac / Linux / Windows. Browse, download, run open-weight models from a clean GUI. Includes a chat interface and an OpenAI-compatible API. No terminal required.
Free
Redundancy check — Overlaps Ollama. LM Studio has a nicer UI; Ollama has a smaller resource footprint.
Best for: watchmen who want point-and-click local AI without learning a terminal.
Best entry point if you don't already love the terminal. Free, fast, works.
A
LLM Runner
Apple Silicon ML framework
The fastest way to run local models on Mac.
AgenticN/A
MCPN/A
RAMSame as model: 16GB for 7B, 64GB+ for 70B Q4
PlatformApple Silicon only (M1/M2/M3/M4)
Apple's native ML framework for M-series chips. Models converted to MLX format run faster and with lower memory than llama.cpp on Apple Silicon. Pair with mlx-lm or mlx-examples to use it.
Free (open source)
Redundancy check — Different layer than Ollama — Ollama can use MLX as a backend on Macs.
Best for: Mac watchmen who want maximum performance from their unified memory.
If you have a Mac Studio M3 Ultra, this is what makes 70B+ models feel snappy.
A
LLM Runner
Local LLM inference engine
The C++ engine under most local LLM tools.
AgenticN/A
MCPN/A
RAMPer model
PlatformMac · Linux · Windows
Powers Ollama, LM Studio, and many others under the hood. Direct command-line use is technical; most watchmen use it via Ollama or LM Studio. The GGUF model format originated here.
Free (open source)
Redundancy check — Indirectly used by Ollama / LM Studio.
Best for: watchmen who want fine-grained control over inference parameters and model formats.
You don't need this directly unless you're benchmarking. Trust Ollama / LM Studio to handle it.
A
LLM Runner
Local web UI for any LLM backend
ChatGPT-style UI for your local models.
AgenticMedium
MCPYes
RAMServer: 4GB · plus per-model RAM
PlatformDocker (Mac · Linux · Windows)
Runs as a Docker container, connects to Ollama / LM Studio / OpenAI-compatible backends. Adds chat history, RAG over uploaded files, multi-user, prompt library — feels like ChatGPT, runs local.
Free (open source)
Redundancy check — Different from Ollama — Ollama is the engine, Open WebUI is the GUI on top.
Best for: watchmen who want a polished web interface for their local models.
Best 'looks like ChatGPT, runs on my hardware' choice. Pair with Ollama; you're set.
S
Frontier Model
Open-weight frontier model family
Meta's open frontier — the new flagship.
AgenticHigh
MCPN/A
RAM8B: 6GB · 70B Q4: 48GB · 405B: 256GB+ (cluster)
PlatformAny platform via Ollama / MLX
Released early-mid 2025. Sizes 8B / 70B / 405B (with multimodal variants). 70B is the sweet spot for 128GB+ Macs. Quality competitive with GPT-4 / Claude 3.5 Sonnet on many benchmarks.
Free (Meta license; not pure OSS but generous)
Redundancy check — Overlaps Qwen 3 / DeepSeek for general reasoning.
Best for: watchmen who want the strongest open-weight English-language model.
Default model for the local stack. 70B Q4 on a 128GB Mac and you're set.
A
Frontier Model
Open-weight Chinese frontier model (multilingual, MoE)
Alibaba's open frontier. Strong reasoning. Strong multilingual.
AgenticVery High
MCPYes
RAM0.6B: 2GB · 32B: 24GB · 235B-A22B Q4: 96GB+
PlatformAny via Ollama / MLX / vLLM
Family includes 0.6B → 235B-A22B (Mixture-of-Experts). The 235B-A22B MoE activates ~22B parameters at inference, so it runs at 22B speed on 128GB+ unified memory. State-of-the-art reasoning in mid-2025.
Free (Apache 2.0)
Redundancy check — Overlaps Llama 4 / DeepSeek for English; better for Chinese / multilingual.
Best for: watchmen doing multilingual ministry, technical reasoning, or who want the most capable open MoE.
Worth pulling alongside Llama. Try both on a real task; one will click for your style.
A
Frontier Model
Open-weight reasoning model (671B MoE)
The free DeepSeek you saw on the cloud — running on your machine.
AgenticHigh
MCPN/A
RAMQ4: ~96GB · Q2: ~64GB
PlatformMac (slow) · Linux + GPU (fast)
DeepSeek released V3 (general) and R1 (reasoning) as open weights in late 2024 / early 2025. 671B MoE activates ~37B at inference. Quantized to int4, fits on a 128GB+ Mac with patience.
Free (MIT-style license)
Redundancy check — Overlaps Qwen 3 / Llama 4 for top-tier reasoning.
Best for: watchmen who want frontier reasoning on their own hardware, no API key required.
Patience required (slow on Mac without dedicated inference). But it's frontier-class for free.
B
Frontier Model
European open-weight frontier model
France's open-weight contender.
AgenticHigh
MCPN/A
RAMQ4: 64GB+
PlatformAny via Ollama
123B dense model. Strong code + multilingual. Mistral's flagship open release. Runs comfortably on 96GB+ Macs at int4.
Free for research; commercial license required for business use
Redundancy check — Overlaps Llama 4 70B. Mistral often wins on European languages.
Best for: watchmen in European-language ministries or doing technical work where Mistral's tuning shines.
Worth a side-by-side test against Llama if your work touches French / Spanish / German.
B
Frontier Model
Mixture-of-Experts open-weight model
176B parameters, 39B active. Speed of small, quality of large.
AgenticHigh
MCPN/A
RAMQ4: 80GB · Q5: 96GB
PlatformAny via Ollama
Sparse MoE — 8 experts of 22B each, 2 activate per token. Effective speed of ~39B parameters but quality closer to dense 100B+ models.
Free (Apache 2.0)
Redundancy check — Overlaps Llama 4 / Qwen 3 MoE for general-purpose reasoning.
Best for: watchmen who want quality output at faster speeds than dense models.
Solid second model behind Llama / Qwen. Pull it if you want to A/B test.
B
Frontier Model
Open-weight Google model family
Google's open-weight family — small, fast, capable.
AgenticMedium
MCPN/A
RAM1B: 4GB · 27B Q4: 18GB
PlatformAny via Ollama / MLX
Sizes 1B / 4B / 9B / 27B. Trained on Gemini-2-class data. The 27B variant runs comfortably on 32GB Macs and rivals much larger models for most tasks.
Free (Gemma license)
Redundancy check — Overlaps Phi-4 for the 'small but mighty' slot.
Best for: watchmen with mid-tier hardware (32-64GB) who want capable local AI.
Best balance of capability vs. RAM footprint for the 32-64GB Mac.
B
Frontier Model
Open-weight enterprise / RAG-tuned model
Cohere's RAG-tuned open release.
AgenticVery High
MCPN/A
RAMQ4: 56GB
PlatformAny via Ollama
104B dense model from Cohere. Particularly strong at retrieval-augmented generation and tool use. Less popular than Llama but quietly excellent for business workflows.
Free for non-commercial; commercial license required
Redundancy check — Overlaps Llama 4 / Qwen for general use; differentiates on RAG.
Best for: watchmen running local RAG over their own documents (sermons, books, business records).
Underrated for the watchman who wants strong RAG without sending docs to the cloud.
B
Frontier Model
Small but capable Microsoft model
14B that punches above its weight.
AgenticMedium
MCPN/A
RAMQ4: 9GB
PlatformAny via Ollama / MLX
Microsoft's Phi-4 is dense at 14B. Trained on synthetic + curated data, it scores higher than expected on reasoning benchmarks. Fast on any modern Mac.
Free (MIT)
Redundancy check — Overlaps Gemma 3 27B for the 'capable small model' slot.
Best for: watchmen running on lighter hardware (16-32GB) who still want frontier-tier reasoning.
If your Mac is 16GB or 32GB, this is your workhorse. Tiny, fast, smart.
B
Image
Open-weight image generation
The current open-weight image champion.
AgenticLow
MCPN/A
RAM24GB+ unified or VRAM (full) · 12GB+ quantized
PlatformMac · Linux + GPU · Windows + GPU
Released by Black Forest Labs (the team behind Stable Diffusion). FLUX.1 [dev] is the open-weight version that runs locally with 24GB+ VRAM (or quantized on Apple Silicon). Quality rivals Midjourney for many use cases.
Free (non-commercial license; FLUX [pro] is commercial via API)
Redundancy check — Overlaps Stable Diffusion 3.5 (FLUX has better quality in 2026).
Best for: watchmen who want Midjourney-quality output running on their own hardware.
Run via ComfyUI for full control. Quantized FLUX runs on 24GB+ Mac unified memory.
B
Image
Open-weight image generation
Stability AI's flagship open model.
AgenticLow
MCPN/A
RAM16GB+ unified or VRAM
PlatformMac · Linux + GPU · Windows + GPU
Stable Diffusion 3.5 Large (8B params) is the latest. Free for non-commercial, commercial license available. Less SOTA than FLUX in 2026 but huge ecosystem of fine-tunes and LoRAs.
Free (Stability community license)
Redundancy check — Overlaps FLUX.1 — SD has more community fine-tunes; FLUX has better default quality.
Best for: watchmen who want a vast library of fine-tuned styles and LoRAs.
Use FLUX as your default; pull SD 3.5 for community fine-tunes and specific styles.
B
Image
Visual workflow editor for image / video models
Node-based UI for FLUX / SD / video models.
AgenticLow
MCPPlugin
RAMPer loaded model
PlatformMac · Linux · Windows
Web app that runs locally. Build node graphs to chain image gen, upscale, ControlNet, etc. The standard for serious local image work. Steeper learning curve than DrawThings or Fooocus.
Free (open source)
Redundancy check — Different layer — ComfyUI runs models like FLUX / SD via workflows.
Best for: watchmen making serious volume of visual content who want a reproducible pipeline.
Pair with FLUX. Steeper curve, but once you have a workflow saved, it's repeatable.
A
Voice
Local speech-to-text (Whisper port)
Whisper running locally — no API calls.
AgenticN/A
MCPN/A
RAM2-8GB depending on model size
PlatformMac · Linux · Windows
C++ port of OpenAI's Whisper, optimized for CPU and Apple Silicon. Real-time transcription on a Mac. Much faster than the original Python implementation.
Free (MIT)
Redundancy check — Different from cloud Whisper API (this runs locally, no upload).
Best for: watchmen transcribing sensitive content (counseling notes, sermon prep) who don't want audio leaving the machine.
Use this over the API when content is sensitive. Free and fast.
B
Voice
Whisper + speaker diarization
Whisper with speaker labels and word-level timestamps.
AgenticN/A
MCPN/A
RAM8-16GB
PlatformMac · Linux · Windows
Builds on Whisper to add speaker diarization (who said what) and word-level alignment. Useful for transcribing conversations, panel recordings, sermons with multiple voices.
Free (BSD)
Redundancy check — Adds to Whisper.cpp; not a replacement.
Best for: watchmen transcribing multi-speaker recordings (conversations, panels, group prayer, board meetings).
Use when you need to know who said what. Free.
C
Voice
Local text-to-speech / voice cloning
Local TTS — your voice, on your machine.
AgenticN/A
MCPN/A
RAM8-16GB
PlatformMac · Linux + GPU · Windows + GPU
OpenVoice and Bark are leading open-source TTS engines. Bark generates expressive natural speech with non-verbal vocals (laughs, sighs). OpenVoice clones a voice from short samples.
Free (open source)
Redundancy check — Different from ElevenLabs (cloud, polished). Local TTS is rougher but free + private.
Best for: watchmen experimenting with voice cloning on private data without uploading samples to a service.
Local TTS quality lags ElevenLabs. Use these for privacy-sensitive experiments only.
A
Coding
VS Code extension w/ local model support
Cursor-like AI coding with your local LLM.
AgenticHigh
MCPYes
RAMPer model used
PlatformMac · Linux · Windows
Open-source VS Code (and JetBrains) extension. Connect to Ollama / LM Studio for autocomplete and chat using local models. Free, private.
Free (open source)
Redundancy check — Overlaps Cursor for AI coding; Continue is free + uses your local models.
Best for: watchmen who want Cursor-style coding without the subscription, using their local models.
Best free Cursor alternative for the local-first watchman.
A
Coding
Open-source agentic VS Code extension
Open-source autonomous coding agent.
AgenticVery High
MCPYes
RAMPer model used
PlatformMac · Linux · Windows
VS Code extension that operates like a Claude Code clone — autonomous agent that reads/writes files, runs commands, plans multi-step changes. Works with Anthropic, OpenAI, or local models via Ollama.
Free (open source); pay for whichever model API you use
Redundancy check — Closest local-friendly alternative to Claude Code.
Best for: watchmen who want an autonomous coding agent that can run on local models.
Excellent. Pairs with Llama 4 70B locally for a free Claude Code substitute.
C
Coding
Terminal AI pair programmer (local-friendly)
Same Aider from the Compass — running on your local LLM.
AgenticHigh
MCPYes
RAMPer model used
PlatformMac · Linux · Windows
Aider supports any OpenAI-API-compatible backend. Point it at Ollama or LM Studio and you have a fully local Aider session. No API charges.
Free
Redundancy check — Same Aider as in the cloud; this is the same tool with a different model.
Best for: terminal-loving watchmen who want Claude Code's shape with their own local model.
If you already use Aider with Claude API, switching it to a local backend takes 30 seconds.
C
Coding
Self-hosted code autocomplete
GitHub Copilot-style autocomplete, on your own server.
AgenticMedium
MCPNo
RAM8-24GB depending on model
PlatformDocker (Mac · Linux · Windows)
Self-hosted alternative to Copilot. Runs as a Docker container, supports VS Code, JetBrains, Vim. Uses local code models like StarCoder.
Free (open source); paid Tabby Pro tier exists
Redundancy check — Overlaps GitHub Copilot. Tabby is self-hosted; Copilot is cloud.
Best for: watchmen in regulated industries who can't send code to a cloud autocomplete service.
Niche. Use only if compliance forbids cloud Copilot.
A
Knowledge / RAG
Local RAG over your documents
ChatGPT-style chat over your own files. 100% local.
AgenticMedium
MCPPlugin
RAM8-16GB + model RAM
PlatformMac · Linux · Windows
Drop in PDFs, Word docs, websites — AnythingLLM ingests them, stores them in a local vector database, lets you chat with the corpus using a local LLM. Free, open-source, polished UI.
Free (open source); paid cloud tier exists
Redundancy check — Overlaps Khoj / Open WebUI for RAG.
Best for: watchmen who want NotebookLM but local — chat over your own books, sermons, family records.
Best NotebookLM alternative that runs entirely on your machine.
B
Knowledge / RAG
Personal AI search engine
AI search across your notes, emails, files.
AgenticMedium
MCPPlugin
RAM8-16GB + model RAM
PlatformMac · Linux · Windows
Open-source 'Perplexity for your own stuff.' Indexes Obsidian, Notion, GitHub, email, and runs AI search + chat over them. Self-hosted; free.
Free (self-hosted)
Redundancy check — Overlaps AnythingLLM; Khoj is more search-focused.
Best for: watchmen with a deep personal corpus (Obsidian + email + files) who want AI search over all of it.
Killer pairing with Obsidian. Free, fast, private.
C
Knowledge / RAG
Open-source Perplexity clone
Perplexity, but local + uses your own models.
AgenticMedium
MCPNo
RAM8-16GB + model RAM
PlatformMac · Linux · Windows
Open-source Perplexity-like interface for AI search. Web search + LLM synthesis with citations. Runs locally; uses any LLM (local or API).
Free (open source)
Redundancy check — Overlaps Perplexity Pro for research.
Best for: watchmen who do research-heavy work and want Perplexity's UX without a subscription.
Solid Perplexity alternative for the local-first watchman.
C
Knowledge / RAG
Local-first knowledge graph + AI search plugin
Obsidian + AI plugins, fully local.
AgenticMedium
MCPPlugin
RAMPer model used
PlatformMac · Linux · Windows
Same Obsidian as the cloud Compass entry — but with the Smart Connections plugin pointed at a local LLM, every note in your vault becomes AI-searchable without sending data to a cloud.
Obsidian free; Smart Connections plugin free; some plugins paid
Redundancy check — Same Obsidian as the cloud entry, just configured locally.
Best for: watchmen already on Obsidian who want a privacy-first AI layer over their notes.
If Obsidian is your second brain, add Smart Connections + a local model and you have private AI search over everything.
C
Automation
Self-hosted automation platform
Same n8n from the Compass — running on your machine.
AgenticHigh
MCPNative
RAM4-8GB for n8n + per model
PlatformDocker (Mac · Linux · Windows)
Self-host via Docker. All AI nodes work with local Ollama / LM Studio. The watchman's choice for full data sovereignty + automation. MCP-native.
Free (self-hosted; Docker)
Redundancy check — Same n8n as the cloud entry.
Best for: watchmen who want enterprise-grade automation with zero data leaving their network.
If you have a homelab, n8n + Ollama is the local automation stack.
B
Automation
Local Model Context Protocol servers
Run MCP servers on your machine to wire local tools into Cowork.
AgenticHigh
MCPNative
RAMMinimal (per server)
PlatformMac · Linux · Windows
Anthropic publishes reference MCP servers (filesystem, git, sqlite, etc.) and the community has hundreds more. Run them locally and Cowork or Claude Code can use them as tools — purely local.
Free (open source)
Redundancy check — Different layer — these are the building blocks for agentic systems.
Best for: watchmen extending Cowork with local capabilities (custom databases, internal APIs, file systems).
Future-proof. As MCP grows, more watchman-relevant servers will exist.
B
Automation
Open-source agentic desktop tool
Same Goose from the Frontier — running purely local.
AgenticVery High
MCPNative
RAMPer model + 1-2GB for Goose
PlatformMac · Linux · Windows
Block's open-source MCP-native desktop agent. Point it at a local Ollama backend and you have a fully local autonomous agent.
Free, open source
Redundancy check — Same Goose as the Frontier listing; this is the local-first config.
Best for: watchmen who want Cowork-class agentic capability without the Anthropic subscription.
Best free agent. Pair with Llama 4 70B locally for a serious autonomous setup.
No tools in this filter.