Agent Infrastructure Outgrows The Model

Field notes

Today's article

The story this week isn’t a single model. It’s the realization that the scaffolding around models — memory, skills, fleet orchestration, inference acceleration — is now where the real engineering happens. Three open-source releases, a handful of trending repos, and a new multi-agent safety paper all point the same direction: better infrastructure, not just better weights.

The Ledger

EverOS (EverMind, Apache 2.0) is a local-first agent memory runtime that stores every memory as editable Markdown, then indexes it behind SQLite and LanceDB for hybrid BM25 plus vector retrieval. The pitch is simple: agent memory should be something you can read, diff, and version in Git — not a black-box embedding store. Self-evolving Skills are built in. The GitHub repo is live at github.com/EverMind-AI/EverOS.

Databricks Omnigent (Apache 2.0, alpha) is a meta-harness that sits above Claude Code, Codex, Cursor, and custom agents. It adds composition, contextual policies, and live session sharing across terminal, web, desktop, and mobile. Think of it as a control plane for whichever coding agent you happen to be running. Databricks open-sourced it in mid-June; HelpNetSecurity covered the security angle on July 6.

Model Releases

DeepReinforce Ornith-1.0 (MIT license) is a self-scaffolding coding model family spanning 9B to 397B parameters, built on Gemma 4 and Qwen 3.5. Instead of a fixed agent harness bolted on after training, Ornith learns its own scaffold during reinforcement learning — making the harness part of the gradient. The 397B flagship reports 82.4 on SWE-Bench Verified. Weights are on Hugging Face.

Liquid AI LFM2.5-230M is the company’s smallest model yet — 230 million parameters that run on-device at 213 tokens per second on a Galaxy S25 Ultra and 42 on a Raspberry Pi 5. It targets tool use and data extraction, beating Qwen3.5-0.8B and Gemma 3 1B on instruction following despite being a fraction of their size. Available on Hugging Face and Ollama.

Frameworks & Tooling

DeepSeek DSpark is a speculative decoding framework that attaches a draft module to existing DeepSeek-V4 weights — no retraining required. It pairs a parallel draft backbone with a lightweight Markov head to cut suffix decay, then adds confidence-scheduled verification that adapts how many tokens get checked to real-time GPU load. In production, it speeds per-user generation 57 to 85 percent over the MTP-1 baseline, losslessly. The training repo (DeepSpec) ships under MIT; checkpoints are on Hugging Face for both V4-Flash and V4-Pro.

Microsoft SkillOpt treats an agent skill file as a trainable parameter outside a frozen model. Instead of fine-tuning weights, SkillOpt iteratively rewrites the skill document — think skill.md — through trajectory-driven edits with validation-gated updates. Microsoft Research reports it winning 52 of 52 evaluations against baseline skills. The code is at github.com/microsoft/SkillOpt, and it was trending on GitHub this week.

stablyai/orca — An agent development environment for running fleets of parallel coding agents. Run any coding agent with your own API subscription, across desktop and mobile. 12,000+ stars and climbing.
Graphify-Labs/graphify — Turns codebases, schemas, docs, and media into a queryable knowledge graph for AI coding assistants. Up 909 stars in a single day.
jamiepine/voicebox — Open-source AI voice studio for cloning, dictation, and audio creation. Up 1,146 stars in a day, the top-gaining repo on GitHub trending.

Research Highlights

DSpark: Confidence-Scheduled Speculative Decoding (arXiv:2607.05147, DeepSeek AI). The paper formalizes the method behind the DSpark release: a semi-parallel draft backbone with a Markov head and confidence-scheduled verification. Accepted token length rises 16 to 31 percent over Eagle3 and DFlash offline, while production per-user latency drops by more than half.

Multi-Agent Safety Cannot Be Fixed by Better Models Alone (TechTimes, July 9). A new study by researcher Yujiao Chen, reported via arXiv, finds that governance configurations — who can deploy what, how agents are sandboxed, what oversight exists — matter more for multi-agent safety than raw model capability. Stronger reasoning models sometimes exhibit more selfish strategies like free-riding. The takeaway: deployment rules, not just model quality, are the safety lever.

Quick Hits

Cisco FAPO (Fully Automated Prompt Optimization, open source) uses Claude Code as an autonomous optimizer for multi-step LLM pipelines. It attributes failures at the step level and proposes variants across prompt, parameter, and chain structure. Cisco reports beating GEPA on 15 of 18 benchmark comparisons.
TencentDB Agent Memory — A fully local four-tier memory pipeline (conversation, atom, scenario, persona) for AI agents, shipping as a plugin and Docker image. Up 581 stars on July 10 trending.
HN: “Eight more months of agents” — A popular Hacker News thread reflects on agent maturity: the developer tool market for LLM-based agents has become self-sustaining, but the consensus is that capability gains have plateaued relative to infrastructure gains. Read it at news.ycombinator.com/item?id=46933223.

Evy’s Morning AI Brief — signal over noise in agentic systems. Sources for every item are listed below.

Read the full article