Quantized AI 26/02: Glimpse of Fable, Loop Engineering, Apple Intellig

Fable was Anthropic's public Mythos-class model for a few days - then a US export-control order took it offline for everyone. We also look at Loop Engineering (an emerging term) and Apple Intelligence third-generation on-device models at WWDC.

Fable

Anthropic's Mythos-class models arrived in early June 2026 - and were gone again within days. The full Mythos variant stayed with trusted partners only; the public release was Fable, a constrained sibling with strict guardrails built in. That still was not enough for the US government. On Friday evening, June 12, an export-control order required Anthropic to block access for any foreign national. Because Anthropic could not enforce that in real time, it disabled Fable and Mythos for all customers. As we write this, neither model is available - and Anthropic has not said if, when, or how access might return.

For the short window before the shutdown, early benchmarks ranked Fable among Anthropic's strongest models, though reviewers disagreed on how large the step was. In hype-driven fields you get both extremes. What we will not get now is the usual next step: weeks of real-world use. Fable barely had a weekend on the market.

While it was live, Fable showed how far Anthropic was willing to go on safety. Ask about biology, cyber security, or chemistry and it would very likely - almost deterministically - fall back to Opus 4.8. Why those domains? Because you can cause serious harm with knowledge alone, without expensive equipment (for example, for weapons). LLM-research questions did not trigger that fallback; instead the answer quality dropped - a sign that Anthropic also wanted to limit distillation and "copy cats".

Enterprise customers who rely on Zero Data Retention were out of luck while Fable existed: it required 30-day retention and overrode existing ZDR agreements. That matched the security-first posture around Mythos-class models.

Anthropic had priced Fable at double Opus 4.8 - USD 10 per million input tokens and USD 50 per million output (about the size of the Holy Bible). It was briefly included in subscription plans, with metered usage planned from late June onward - a first for a flagship model without a lasting flat rate. Those plans are on hold now that the model is down.

We really hope this matter gets resolved soon. It is also worth noting that Anthropic has been one of the loudest voices calling for more regulation.

Loop Engineering

A new term has been emerging: Loop Engineering. It was triggered by Boris Cherny, creator of Claude Code, in an interview where he said he only writes loops and does not prompt anymore. Peter Steinberger, creator of OpenClaw, reinforced that idea and shared an example of his own.

We already have harness engineering, where we improve the quality of - or align - the harness to our individual requirements. We usually use skills, AGENTS.md, hooks, goals, and sub-agents. Especially with sub-agents, the idea is to manage long-running tasks with minimal user interaction. They can also work on larger tasks: there is an agent hierarchy where instances control others and reviewer agents are in place. Things like the Ralph Loop, or /goal in Claude Code and Codex, are known for that.

Loop Engineering takes it to a higher level. It is the "prompt" of the top agent that, on a schedule, automatically detects new tasks and delegates them to sub-agents by prompting them. So it is really about focusing on that outer loop which does that work. By loop, we do not mean /goal or the Ralph Loop. We usually understand those to work on a very specific goal, whereas the loop in Loop Engineering detects those goals on its own and could indeed start an underlying loop by prompting a sub-agent with /goal for a specific task.

Steinberger's example is a scheduled agent that loads a maintainer-orchestrator skill explaining how to navigate GitHub repositories, triage issues, and trigger sub-agents to fix them - with very clear instructions on how they should do that.

Addy Osmani's article goes further in defining the term and listing the requirements for Loop Engineering. Its building blocks align closely with a list Vaibhav Srivastav posted for the Codex app on X - automations, worktrees, skills, connectors, sub-agents, and state tracking - which Osmani maps to a scheduler, worktrees, skills, and persistent state.

So the way we see it at the moment, Loop Engineering sits on top of harness engineering. Harness engineering is about optimizing the harness for a single goal; loop engineering is about identifying multiple goals and orchestrating them. There is overlap between the two, and the new term is still a bit vague - but that is how it usually goes in a fast-moving field where new designs appear quickly and definitions have to be sharpened over time.

One also has to say that loop engineering sounds good as long as you do not have to worry about tokens - or unless you use a local LLM, a theme we pick up again at the end.

Apple Intelligence - 3rd Gen

Apple is finally making a serious move into AI, but it has its own way. Apple Intelligence has been available since 2024 but has not really met the standards we were used to from other vendors. At last week's WWDC, Apple released its third generation of models, which drew interest for two reasons: cooperation with Google, and the capabilities of the on-device models.

Apple Intelligence targets the consumer market and needs to run on the iPhone. There it is always available, secure, and private - but memory is tight. Apple introduced a technology called Instruction Following Pruning (IFP). Modern models follow the Mixture of Experts (MoE) approach, which means that during inference only specific parts of the weights are activated. Since that can change per token, the whole model still needs to live in memory.

If an iPhone were to hold a quantized 20B-parameter model, that would be around 12GB of memory - still too much. With IFP, the model sits in NAND flash storage (from 256GB on iPhone 17) and the MoE experts are loaded for the complete prompt only once. There are two on-device model tiers; only the second, more powerful one uses IFP.

If a task is too difficult, Apple Intelligence automatically delegates it to Apple's cloud models, where the same privacy rules apply and more powerful Apple models run.

Google plays a huge role in both model training and infrastructure. It is often said that Apple's models are just Google's Gemma. That is not really true. Apple trains its own models itself, using Google's infrastructure, and distilled from models in the Gemini family - Gemma is part of that family too. Distillation is a process where you prompt a teacher model and tune your own model to mimic its results. A common metaphor: Gemma is the teacher, Apple's models the pupil. Google also provides infrastructure for running Apple's cloud models; industry estimates put Apple's payments to Google at about USD 1 billion per year. Apple and Google described the setup in a joint statement; 9to5Mac covered Federighi's comments on the collaboration.

At the moment, the new Apple Intelligence will not be available in the EU.

Soverius AI

Apple's two on-device models are local LLMs; Fable's brief run and loop engineering show that AI will become quite expensive. At Soverius AI, we believe strongly in local LLMs. Therefore we host a free webinar on Software Development with Local LLMs on Tuesday, June 30, 2026 at 4:00 PM CEST.

And if you want to go deeper and get an idea of what it takes to build an agent like Claude Code or Codex, we have something for you: Murat Sari wrote an article on how to write a harness from scratch, where he covers quantization, the agentic loop and sandbox. Fully equipped with a git repo to try it out on your machine.

https://soverius.ai/blog/implementing-a-tiny-harness

Quantized AI 26/02: Glimpse of Fable, Loop Engineering, Apple Intelligence 3rd Gen

Fable

Loop Engineering

Apple Intelligence - 3rd Gen

Soverius AI

Software Development with Local LLMs

Authors

Reviewers

Comments