Many Agents, One Context Layer: My AI Engineering Environment
June 2, 2026 · 7 min read
I don't use "an AI assistant." I use five or six of them, each picked for a different job — and the thing that makes that workable isn't any single model. It's a shared context layer underneath all of them. This is a tour of how the whole environment fits together: the agents I reach for, the MCP servers that give every one of them the same view of my work, the local model I run on a 32 GB Mac, and the review loop that keeps me in control.
The roster — right tool for the job
No single agent wins everything. The skill is matching the task to the tool:
- Claude Code — agentic coding at the highest quality. My default for anything that touches code I actually care about.
- Claude Desktop — planning, co-work, remote dispatch, and broader agentic tasks.
- opencode — a lighter terminal agent, and what I point at my local model.
- Antigravity — an agentic IDE, and my backup when Claude Code is busy.
- OpenClaw — an always-on personal agent: morning briefings, inbox triage, notes, scheduled jobs.
- Gemini (desktop + CLI) — research, document generation, image and video; the CLI doubles as a backup.
- llama.cpp + Qwen — a local model for the runs I want to keep offline, private, or simply off the cloud meter.
Every agent starts blind
Each of these is great in isolation — and isolation is exactly the problem. Open a fresh session and the agent knows nothing about my repositories, my inbox, or what I have loaded in my IDE. For a while I re-explained my world every single session.
The fix is the Model Context Protocol. Instead of teaching every tool about my work separately, I stand up a handful of MCP servers once and point all of the clients at them.
The context layer — four servers, shared by all
| Server | What it gives every agent |
|---|---|
| GitHub (remote, OAuth) | PRs on my repos, PRs and issues assigned to me, reviews, CI status |
| Proton Mail Bridge (local) | Read, search and triage my mail — it never leaves the machine |
| Rider (local) | Live context from the JetBrains solution I actually have open |
| Puppeteer (local) | Drive a real Chromium — navigate, click, screenshot, smoke-test |
The payoff is that context follows me across tools. I can ask Claude Code to "review the PR assigned to me," ask Antigravity "what changed in the file I have open in Rider," and ask Claude Desktop "did the reviewer email me back about that PR" — and every one of them already has the answer in reach. Fix a server once and every agent benefits; there's no per-tool config to keep in sync.
Only one secret lives in the whole setup — the local Proton Bridge password. GitHub uses OAuth, so there's no token to rotate or leak, and the mail and source-code servers are localhost-only.
Skills — what the agents reliably do
MCP is what the agents can see. Skills are what they reliably do with it — small, auto-triggered playbooks. The lucky break is that opencode loads the same ~/.claude/skills/*/SKILL.md format as Claude Code and Claude Desktop, so I author each skill once and three tools pick it up. A pr-review skill pulls a diff and reviews it with live IDE context; standup stitches together PRs awaiting review, my open PRs, and unread mail about my repos; email-to-action turns the inbox into a list of things that need a reply. Each is read-only by default — anything that posts a review, pushes, or sends mail asks first.
The local tier — Qwen on a 32 GB Mac
The part I'm proudest of, and the most honest about, is the local model. A Mac's unified memory means the GPU can see all of the RAM — but 32 GB is tight for a model like this. You build llama.cpp from source (the models are new enough that bugfixes land weekly), download a quantized Qwen small enough to fit, and serve it locally:
llama-server -m ~/models/Qwen3.6-35B-A3B-UD-IQ4_XS.gguf \
--mmproj ~/models/mmproj-BF16.gguf \
-c 131072 --batch-size 256 -ngl 99 -np 1 \
--host 127.0.0.1 --port 8899
Then I point opencode at http://127.0.0.1:8899/v1 and work with it much like any other agent. A few notes on the knobs: the IQ4_XS quantization buys back several GB over the more common Q4_K_M, and you need every one of them; -c 131072 pins context to 128K (Qwen gets confused below that, and 256K won't fit); -np 1 queues requests instead of running them in parallel, which matters when memory is this scarce.
So — is it any good? For some tasks, genuinely yes. The classic win is the adapter pattern: "here's a thing that works and a test suite that proves it; build a compatible thing that passes the same tests." Hand it that and a local Qwen will grind out a real implementation — with more guidance than Opus would need, but far less effort than writing it by hand. Where it struggles is anything that won't fit in its head: a large codebase overflows the context window, and the occasional spatial or geometric bug is one it can name correctly and then flail at fixing. It'll sometimes print a thinking trace and just stop.
I keep it around anyway, because the question my wife and I keep asking is a real one: as the AI-financing bubble wobbles, how much of this can we run at home? Today it's "cool and useful." On a box with twice the RAM and a stronger GPU it might cross into routinely offloading work from the expensive cloud providers — and that's worth finding out.
The loop — how a task actually moves
None of this is autopilot. A change moves through a loop with a human gate at every step:
- Analyze the problem with a model, and have it open issues on GitHub — through the same MCP server every other agent uses.
- Review what it filed, leave notes, and have it do a second pass against those notes.
- Plan — have it draft an approach; I tweak and approve before any code is written.
- Open a PR, then review the diff by hand.
- Review pass with my notes and a smoke-test added to the PR, then merge.
- Verify the changes actually landed where they should, and run the smoke test myself.
The thread running through all of it: the agent proposes, I decide. Nothing reaches main without a read-through.
On top of the interactive loop, OpenClaw runs standing jobs on a schedule — a morning cybersecurity briefing (news, exploits, fresh vulnerabilities), a tech-news digest, and an inbox summary with action items — all reading from the same shared context layer the interactive agents use.
Why it works
Three things turn this from a pile of novelties into actual leverage:
- One context layer, many clients. Re-explaining my world to each tool was the tax; MCP removes it. Fix a server once and every agent gets the benefit.
- A tiered model strategy. Cloud Opus for the hard, correctness-critical work; a local model for the cheap, private, or offline runs. The skill is picking the right brain for the job.
- A loop with a human at every gate. Review every diff, let the agent run the verification, keep the secrets out of its reach.
That's the whole game: give every agent the same grounded view of my work, keep a person on the decisions, and let each tool do the part it's actually best at.