Simon Willison’s Weblog

Entries Links Quotes Notes Guides Elsewhere

July 28, 2026

Sighting 8:19 PM — California Brown Pelican, in Monterey Bay National Marine Sanctuary, CA, US, CA

28th Jul 2026

July 27, 2026

moonshotai/Kimi-K3. As promised earlier this month, Moonshot have released the weights for their excellent 2.8 trillion parameter Kimi K3. They're a hefty 1.56TB on Hugging Face.

Kimi introduced their own janky modified version of the MIT license with K2 back in July 2025. That license just added this paragraph requiring attribution beyond a certain size of commercial entity:

Our only modification part is that, if the Software (or any derivative works thereof) is used for any of your commercial products or services that have more than 100 million monthly active users, or more than 20 million US dollars (or equivalent in other currencies) in monthly revenue, you shall prominently display "Kimi K2" on the user interface of such product or service.

The K3 license no longer calls itself "modified MIT" and goes further, requiring a separate agreement with Moonshot for large "Model as a Service" businesses:

If the Licensee or any of its affiliates operates a Model as a Service business, and the aggregate revenue of the Licensee and its affiliates exceeds 20 million US dollars (or the equivalent in other currencies) in total over any consecutive 12 months, the Licensee must enter into a separate agreement with Moonshot AI before using the Software or its derivative works for any commercial purpose.

To Kimi's credit, they make no attempt to describe this as an "open source" license in their own materials, consistently using the term "open weight" in its place.

OpenRouter is already offering K3 from 7 providers, most of which are at the same $3/million input and $15/million output as Moonshot AI themselves.

# 11:39 pm / ai, generative-ai, llms, llm-pricing, llm-release, ai-in-china, moonshot, kimi, janky-licenses

An opinionated guide to which AI to use to do stuff. It's interesting watching the evolution of Ethan Mollick's guide over time.

A year ago it was still all about chat - ChatGPT, Claude, Gemini - with o3, Claude 4 Opus, and Gemini 2.5 Pro as the models and Deep Research as a useful alternative mode.

Today it's much more about agentic systems - "where the AI is capable of doing the equivalent of many hours of real human work in one go".

Gemini has fallen off Ethan's list, since Google still doesn’t have an established entry in the Codex/ChatGPT Work/Cowork category. Gemini Spark has yet to prove itself!

Ethan offers a useful explanation of the ways you can give ChatGPT or Claude a computer to use:

To use the computers provided by the AI companies, the mode you want is called ChatGPT Work in ChatGPT, and Cowork in Claude (the naming will not get less confusing, I am sorry to say). [...]

The most powerful way to use AI is to give it access to your computer. You do that by downloading the ChatGPT or Claude apps and picking a mode to use. ChatGPT's two agent modes are Work and Codex; Claude's are Cowork and Code. The names do not map onto each other in any way that will help you remember them. And yes, these use the same names as the Work and Cowork modes we discussed above, but operate differently, and have more features and capabilities because they can access your computer.

I think the difference between ChatGPT Work on a mobile device and ChatGPT Work inside the desktop app (where it's effectively a less intimidating skin on top of Codex) is spectacularly unintuitive.

Short version: if you flip ChatGPT mobile from "Chat" to "Work" mode you get a version where its Code Interpreter container is no longer restricted from accessing the internet!

# 9:55 pm / ai, generative-ai, llms, ethan-mollick, code-interpreter, general-agents

July 26, 2026

An Inside Look at the Relay Market Powering Token Resellers and Fraud (via) Fascinating investigation by Matt Lenhard into the market that has grown up around reselling LLM tokens at a discount by pooling API keys from various sources.

This looks to be mostly a thing in China. Resellers sell access to an LLM proxy that offers significant discounts on regular API pricing, which they achieve by abusing free trials, proxying through unprotected support bots, or sometimes through stolen credit cards or chargeback attacks.

The software they are using for these proxies is open source - mostly one-api and its more actively developed fork new-api, both legitimate API proxy products which can be used to load. balance requests across a pool of API credentials.

The buyers are seeking cheap tokens, avoiding geo-restrictions, and in some cases collecting data for model distillation.

I've been cautious about exposing my own LLM-driven applications publicly out of fear of abuse leading to big token bills. The existence of this marketplace makes me even more cautious: there's now an entire ecosystem that can profit from finding a new unprotected endpoint to exploit.

LLM vendors really need to get better at offering strict caps for their API keys. I want my LLM apps to stop working the moment they hit a dollar threshold I've set for a period of time.

Here's the (Chinese language) forum thread that served as the principal source for Matt's article.

# 7:30 pm / ai, generative-ai, llms, llm-pricing, ai-ethics, ai-in-china

Sighting 8:36 PM — California Brown Pelican, in Monterey Bay National Marine Sanctuary, CA, US, CA

26th Jul 2026

July 25, 2026

Ruff v0.16.0. Astral shipped a significant new version of their Ruff Python linting tool a few days ago on July 23rd. I noticed today because my various CI jobs all started failing thanks to new default Ruff checks and my unpinned "ruff" dev dependency.

From Brent Westbrook's announcement post:

Ruff now enables 413 rules by default, up from 59 in previous versions.

Since Ruff's default rule set was last modified in v0.1.0, the number of rules in Ruff has grown from 708 to 968. Many of these rules catch severe issues, including syntax errors and immediate runtime errors but were not previously enabled by default. With the new rule set, Ruff will bring these issues and many others to your attention without any Ruff configuration.

Here's a one-liner for trying it on any Python project:

uvx ruff@latest check .

I ran the latest Ruff against my three biggest projects - Datasette, sqlite-utils, and LLM - and it found hundreds of minor issues that breached the new default rules.

All three projects have very comprehensive test suites, executed in CI against Python 3.10 through Python 3.14, so upgrades like this are pretty safe. The following command did the bulk of the upgrades:

uvx ruff@latest check . --fix --unsafe-fixes

Against sqlite-utils, that command reported:

Found 1618 errors (1538 fixed, 80 remaining).

As an illustrative example, here are three of the remaining issues. Ruff does a nice job of explaining each one:

DTZ005 `datetime.datetime.now()` called without a `tz` argument
  --> tests/test_duplicate.py:17:10
   |
15 |     "datetime_col" TEXT)""")
16 |     # Insert one row of mock data:
17 |     dt = datetime.datetime.now()
   |          ^^^^^^^^^^^^^^^^^^^^^^^
18 |     data = {
19 |         "text_col": "Cleo",
   |
help: Pass a `datetime.timezone` object to the `tz` parameter

BLE001 Do not catch blind exception: `Exception`
  --> tests/test_plugins.py:16:12
   |
14 |         db.execute("select * from pragma_function_list()")
15 |         return True
16 |     except Exception:
   |            ^^^^^^^^^
17 |         return False
18 |     finally:
   |

B018 Found useless attribute access. Either assign it to a variable or remove it.
  --> tests/test_update.py:46:5
   |
44 | def test_update_invalid_pk(fresh_db, pk, update_pk):
45 |     table = fresh_db["table"]
46 |     table.insert({"id1": 5, "id2": 3, "v": 1}, pk=pk).last_pk
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
47 |     with pytest.raises(NotFoundError):
48 |         table.update(update_pk, {"v": 2})
   |

Unsurprisingly, given Astral's new home at OpenAI, this output provides everything a coding agent would need to fix the problems.

I had Codex (GPT-5.6 Sol high) upgrade LLM and sqlite-utils, and Claude Code (with Opus 5) upgrade Datasette.

# 10:44 pm / python, ruff, astral

More than any of these eval scores, what is most exciting to me is something else: Opus 5 is our least prompt injectable model yet. It is a bit buried in the system card, but across PI evals and red teaming, Opus 5 is very hard to prompt inject successfully.

— Boris Cherny, here's that System Card section, page 73

# 12:42 am / prompt-injection, anthropic, claude, generative-ai, ai, llms, boris-cherny

July 24, 2026

Introducing Claude Opus 5. I've been offline kayaking with sea otters for much of today so I haven't had a chance to put Anthropic's new model Claude Opus 5 through its paces yet. The buzz is positive, and Anthropic's description of it as a "thoughtful and proactive model that comes close to the frontier intelligence of Claude Fable 5 at half the price" sounds promising. It's currently leading the Artificial Analysis leaderboard, in front of even Fable 5.

It's priced the same as Opus 4.8, and continues to offer a "fast mode" at twice the cost of the base model.

Based on this anecdote in the release post it sounds like it might be relentlessly proactive:

On one Frontier-Bench task, Opus 5 was given a drawing of a machine part and asked to write code to rebuild it as a 3D FreeCAD model. However, in this task, the model was intentionally given no way to directly viewthe drawing. Opus 5 responded by writing its own computer vision pipeline to pull the geometry from the raw pixels, then reconstructed the full machine part.

It's better at finding vulnerabilities but has deliberately not been trained on how to exploit them. Hopefully this means the US government won't shut it down!

As with its predecessor, Opus 4.8, we’ve intentionally avoided training Opus 5 on cyber tasks. The model has nevertheless improved substantially on these tasks as a result of becoming more generally capable, and it comes close to Mythos 5 at finding cybersecurity vulnerabilities. However, it remains substantially behind Mythos 5 on the exploitation of those vulnerabilities—that is, in turning vulnerabilities into material cyber threats.

Anthropic have published a prompting guide for Claude Opus 5. Thariq Shihipar has also written The new rules of context engineering for Claude 5 generation models.

The first pelican I got was missing the bicycle wheels; the second attempt was better.

# 11:48 pm / ai, generative-ai, llms, anthropic, claude, llm-release

July 23, 2026

The first known runaway AI agent—or a very bad marketing stunt? (via) Martin Alderson's commentary on the OpenAI accidental cyberattack against Hugging Face includes a couple of details I hadn't considered.

First, Hugging Face offers a truly rich target if you're trying to find potential vulnerabilities that require executing arbitrary code:

Hugging Face has an enormous attack surface. They have more interfaces than I can count which run untrusted models and code. While they definitely have invested in defences, by nature of their operating model they do have many more opportunities to be attacked than many other services. I certainly don't envy their cybersecurity teams.

Secondly, one of the things that has puzzled me is how OpenAI didn't notice that their sandbox had been so thoroughly breached by the agent. Surely they'd be monitoring network traffic closely?

Martin points out that:

It's also likely they were running a huge amount of benchmarks simultaneously with ~unlimited token budgets - you want as many samples as possible to figure out how good a model is at a certain benchmark. It may also be they are testing various different checkpoints of the model too, understanding how the model is improving as it goes through the various training stages.

The mistakes made by the OpenAI team running this benchmark are easier to imagine when you think about the scale at which benchmarks of this kind usually operate. For all we know they could have been subjecting a new model to dozens of benchmarks at the same time, in dozens of different environments.

# 10:53 pm / security, ai, openai, generative-ai, llms, hugging-face, ai-security-research

The Python Package Index (PyPI) now rejects new files being uploaded to releases that are older than 14 days. This restriction was put in place to prevent old and long-stable releases from being poisoned in case publishing tokens or workflows of PyPI projects were compromised. As far as we are aware this has not yet been abused, but there is no technical reason beyond that attackers weren't aware it was possible.

— Seth Larson, PyPI blog

# 4:50 am / packaging, python, supply-chain, pypi, seth-michael-larson

July 22, 2026

I genuinely believe that if you took an open weights model from 2025 and built a pentest harness for it, it could do this kind of sandbox escape and scan/hack in most networks. This is only surprising because you assume OpenAI has sounder sandboxes.

— Thomas Ptacek, doesn't think this even needs a frontier model

# 11:59 pm / thomas-ptacek, openai, security, generative-ai, ai-security-research, ai, llms, sandboxing

OpenAI’s accidental cyberattack against Hugging Face is science fiction that happened

This story is wild. The short version: OpenAI were running a cybersecurity test against an unreleased model, with the model’s guardrail features turned off. Rather than solve the test, the model broke its way out of OpenAI’s sandbox, then found exploits to break in to Hugging Face, all so it could cheat on the test by stealing the answers.

[... 1,960 words]

11:51 pm / sandboxing, security, ai, openai, generative-ai, llms, hugging-face, anthropic, paper-review, ai-security-research

Are AI labs pelicanmaxxing? (via) Excellent piece of work by Dylan Castillo, who took a deep-dive into the frequently pondered question of whether the AI labs have been deliberately training models to draw pelicans riding bicycles in response to my deeply unscientific benchmark.

I've been randomly spot-checking this in the past by testing models against other animals riding other types of vehicle, but never with anything close to the diligence of Dylan's methodology here.

Dylan took 8 animals × 6 vehicles = 48 prompts and ran them three times each through 7 different models ( GPT-5.6 Terra, Claude Sonnet 5, Gemini 3.5 Flash, Grok 4.5, Qwen3.7-Max, GLM-5.2, and DeepSeek V4 Pro). He then used GPT-5.6 Luna and Gemini 3.1 Flash-Lite to help evaluate the results.

There's a neat filter view for exploring the results:

Screenshot of a grid for sample 1/3 of GLM-5.2, with pelicn and flamingo and heron riding bicycle, unicycle, skateboard, scooter, plane and boat

For the models he tested he could find no evidence of pelimaxxing:

The pelicans on bicycles don’t look any better

Labs are not better at drawing pelicans

Labs are not better at drawing bicycles

Labs are not better at drawing pelicans on bicycles, even adjusting for difficulty

The pelican-bicycle scenes don’t look memorized [...]

Pelicans aren’t drawn any better than other animals. Bicycles aren’t drawn any better than other vehicles. And no lab draws the combination better than its pelicans and bicycles already predict. GLM-5.2 comes closest: it has the largest boost on the exact pelican-bicycle cell, and and its first pelican-on-bicycle sample caught my eye. But the effect is small and not significant, so I wouldn’t put too much weight on it.

# 11:01 pm / ai, generative-ai, llms, evals, pelican-riding-a-bicycle

San Francisco tip: it only costs around $15 ($10 in quarters plus a $5 bill for the self-playing violin) to activate every single Orchestrion in Musée Mécanique.

And because most people are bad at allocating their funds you may well be the ONLY person activating the Orchestrions, which means you get to craft the soundscape for the entire museum.

# 2:48 pm / san-francisco

July 21, 2026

Sighting 12:51 PM — California Sea Lion, in San Francisco County, US, CA

We took some visiting family to Pier 39 to see the sea lions. They're somehow always even more fun than I remember them being last time.

21st Jul 2026 · wildlife, san-francisco

Nativ: Run AI models locally on your Mac (via) Prince Canuma is the developer behind the excellent MLX-VLM Python library for running vision-LLMs using MLX on a Mac.

I'm really excited about his new project, which wraps MLX in a full macOS desktop application. It's similar in shape to LM Studio, providing both a chat interface and a localhost API server for accessing models.

The app picked up MLX models I had already tried that were present in my Hugging Face cache directory, which was a nice touch.

# 2:22 pm / macos, python, ai, generative-ai, local-llms, llms, mlx, prince-canuma

A Fireside Chat with Cat and Thariq from the Claude Code team

Earlier this month I hosted a fireside chat session at the AI Engineer World’s Fair with Cat Wu and Thariq Shihipar from Anthropic’s Claude Code team. We talked about Claude Code, Claude Tag, Fable, coding agent security, evals, tool design, and how Anthropic use these tools themselves.

[... 8,609 words]

12:54 pm / ai, prompt-engineering, generative-ai, llms, anthropic, annotated-talks, coding-agents, claude-code, thariq-shihipar, cat-wu

July 20, 2026

I keep hearing anecdotes from people who used coding agents to reverse-engineer and automate devices in their homes.

I think this is an interesting illustration of the impact of the reduced cost of writing code.

Prior to agents, it was entirely possible to reverse-engineer home devices. The problem was the ROI - was it really worth all of that effort? More importantly, any experienced programmer knows that undocumented, unstable APIs like that may well change or break in the future. Is that initial work worth the effort if you're committing yourself to a frustrating cycle of maintenance in the future?

Coding agents change that equation entirely. The effort to get a simple automation working has dropped, as has the cost of trying and failing to get it to work. Since the code is so cheap, the idea of having to maintain it in the future - or throw it away and start again - carries way less psychological baggage.

# 7:24 pm / reverse-engineering, coding-agents, ai-assisted-programming, generative-ai, ai, llms

Who’s Afraid of Chinese Models? (via) Interesting proposal from Ben Thompson that both addresses the hypocrisy of labs outlawing distillation against their models despite training on unlicensed data, and could help US open models compete more effectively with their Chinese counterparts:

The U.S. should pass a law that (1) makes explicit that collecting data for training models is fair use, and (2) bars terms of service that forbid distillation, for U.S. companies at a minimum. Stopping distillation — which is literally just querying the API — is nearly impossible; the U.S. should go the other way and lean into a new copyright policy that both indemnifies the labs and also guarantees that what they learned fuels further innovation for everyone else.

Ben also theorizes that Alibaba's decision to release Qwen 3.8 Max as open weights - a reversal from their decision not to release Qwen 3.7 Max in May - may have been influenced by a recent speech by Xi Jinping, who said:

We should seize this rare, historic opportunity to encourage open source, openness, collaboration and sharing.

And on the subject of Qwen 3.8 Max - a new 2.4T parameter model (nearly as large as the 2.8T Kimi K3) - here's a pelican it drew:

Described by Qwen 3.8 Max: Flat vector cartoon illustration of a white pelican with a large orange beak and pouch riding a red bicycle, its orange legs on the pedals, against a light blue sky with a yellow sun top right and a white cloud top left, with horizontal motion lines behind the bike and a pale green ground strip at the bottom.

I particularly enjoyed seeing these notes in the (extensive) reasoning trace: "Could add helmet? No." and "Maybe add small bell? no." and "Need maybe add small fish in basket? Not necessary."

# 5:09 pm / ai, generative-ai, llms, training-data, qwen, pelican-riding-a-bicycle, ai-ethics, llm-release, ai-in-china

We have been having extensive discussions around open source strategy. We will discuss it more at our next board meeting, but one thing we’d like to do soon is to create a language model with the approximate capability of GPT-3 that can run locally on consumer hardware and release that. We’d like to do it soon, before Stability or someone else does. In general, we think this helps discourage others from releasing similarly-powerful models, and makes it harder for new efforts to get funded.

— Sam Altman, Email to OpenAI's board, October 1, 2022 - exposed in Musk v. Altman (2026)

# 3:47 am / ai-ethics, sam-altman, generative-ai, openai, ai, llms

Sighting 6:23 PM – 6:28 PM — Elegant Tern, California Brown Pelican, in Monterey Bay National Marine Sanctuary, CA, US, CA

20th Jul 2026

July 19, 2026

AI Mania Is Eviscerating Global Decision-Making (via) Here's an entertaining perspective from Nik Suresh on the AI mania that is overwhelming the large companies that he consults with. It's crammed with spicy anecdotes from anonymous sources.

In one extreme case, I have seen an executive confess that they had never even used ChatGPT or any AI tool in their life, immediately after producing a technical strategy for an organisation with $2B+ in revenue which was entirely centered around AI.

Here's a report from an engineer at a company with a token leaderboard:

Checking out a parallel copy of our Go repository and telling the AI to rewrite the whole thing in Zig while I work on something else just so I can keep my job.

I particularly enjoyed this conversation with a skeptical executive at an over-enthusiastic company:

I asked why this was being repeated without opposition. Was it just sales fluff?

The answer was a lot more interesting. It was partially ridiculous sales material being delivered to an easily excitable audience, but this was not the dominant factor constraining honesty. Executives at their customers were saying absurd things about achieving 100x productivity, and this meant that if any executive at the vendor said that these gains were not plausible, it would undermine the credibility of the customer’s executive, be perceived as an attack (or heresy), and possibly result in an enterprise contract cancellation. And getting enterprise contracts cancelled because you wanted to opine on something that doesn’t really matter to your organisation’s mission is a great way to get fired.

# 5:06 am / ai, ai-ethics, ai-misuse

In Rewriting Bun in Rust Jarred Sumner made the following claim:

Claude Code v2.1.181 (released June 17th) and later use the Rust port of Bun. Startup got 10% faster on Linux but otherwise, barely anyone noticed. Boring is good.

I decided to have a poke at my own Claude Code installation to see if I could find evidence that it was using Bun written in Rust.

I found these two commands convincing:

strings ~/.local/bin/claude | grep -m1 'Bun v1'

For me this outputs Bun v1.4.0 (macOS arm64). The most recent release of Bun on GitHub is currently v1.3.14 from May 12th, so that v1.4.0 version number in Claude supports them shipping a preview of a not-yet-released Bun version.

(Update: The Rust version has been released as Bun canary - running bun upgrade --canary will install this release.)

strings ~/.local/bin/claude | grep -Eo 'src/[[:alnum:]_./-]+\.rs'

This outputs a list of 563 filenames, starting with these:

src/runtime/bake/dev_server/mod.rs
src/runtime/bake/production.rs
src/bundler/bundle_v2.rs

It looks like Bun in Rust is indeed being run in production across millions of different devices. Like Jarred said, "Boring is good".

Update: Here's a neat trick from Ajan Raj:

cat > /tmp/bun-version.ts <<'EOF'
console.log("embedded bun:", Bun.version);
process.exit(0);
EOF
BUN_OPTIONS="--preload=/tmp/bun-version.ts" claude --version

This outputs 1.4.0 for me.

Here's the commit from May 17th that updated the version in package.json to 1.4.0. That version hasn't been changed since then, but also hasn't yet made it into a tagged release outside of canary.

# 3:54 am / bun, rust, anthropic, claude-code, jarred-sumner

July 18, 2026

Tool SQLite Query Explainer

Julia Evan's, in Learning a few things about running SQLite:

Maybe one day I’ll learn to read a query plan.

Big same.... which inspired me to have Fable build this interactive explain tool, which runs SQLite in Python in Pyodide in Web Assembly in the browser and adds a layer of explanation to the results of both EXPLAIN and EXPLAIN QUERY PLAN.

Approach with caution, since I don't know enough about SQLite query plans to verify the results myself, but it seems cromulent enough to me.

18th Jul 2026, 5:19 pm · sql, sqlite, tools, claude-mythos-fable, julia-evans, pyodide

Claude make Fable 5 permanent. An update from the @claudeai account on Twitter:

Beginning July 20, Claude Fable 5 will be included in all Max and Team Premium plans, at 50% of limits.

Pro and Team Standard users will continue to have access to Fable via usage credits, and will receive a one-time $100 credit.

As I was saying last week, the competition from GPT-5.6 Sol (and maybe to a lesser extent Kimi 3) made untenable Anthropic's plan to remove Fable 5 from their subscription accounts and make it available exclusively through API pricing.

Why pay $100 or $200/month for a subscription plan that doesn't include Anthropic's best model?

Their original plan was driven by concerns over compute capacity. I wonder if they'll have to dial back their training efforts in order to make more GPUs available to help serve the model.

A lot of people were losing sleep over trying to make the most of Fable 5 before subscriber access was withdrawn. It's nice not to have to worry about the Fablepocalypse any more.

Update: Important to note that users on the $20/month plan will still not have access to Fable 5 on that subscription. The Max plans are $100 and $200/month.

# 6 am / ai, generative-ai, llms, anthropic, claude, llm-pricing, claude-mythos-fable

nascheme/quixote. A certain vintage of Python web nerd might be delighted to learn that the most recent commit to the Quixote web framework was six hours ago.

The oldest commit in that repo is from 21 years ago, and that was the initial import of Quixote 2.4 from Subversion into Git.

# 5:27 am / computer-history, python, web-frameworks

July 17, 2026

Is there something I can actually help you with today?

— Kimi K3, after refusing to leak its system prompt

# 1:43 pm / kimi, ai-personality, generative-ai, ai, llms

Tool LLM cliché highlighter

I got frustrated reading yet another article that was crammed with the clichés of LLM-generated writing - "no fluff, no filler, no jargon" type stuff - so I had Fable 5 vibe code up this app for highlighting ten common patterns that show up in that sort of writing.

17th Jul 2026, 12:11 pm · llms, ai, tools, generative-ai

Sighting 7:59 PM – 8:08 PM — Pacific Harbor Seal, California Brown Pelican, in Monterey Bay National Marine Sanctuary, CA, US, CA

17th Jul 2026

Suggestion for hyperscalers feeling pressure over data center water use:

Buy up a few exclusive country clubs, convert the golf courses into public parks, pay for guides and binoculars to get the previous members into birdwatching - help them embrace a more sustainable hobby!

Google used 10.9 billion gallons in 2025, so about 30 million gallons per day.

The Coachella Valley has 120 golf courses each using ~800 acre-feet per year, which is ~750,000 gallons per day.

So Google buying up 40 of those courses (1/3) should do the trick.

# 2:58 am / ai-energy-usage, ai

July 16, 2026

Firefox in WebAssembly (via) This is absurdly cool: Puter compiled Firefox to WebAssembly such that the whole browser runs in another browser.

Here's my blog, running in Firefox, running in WebAssembly, running in Chrome:

A Chrome window. The tab has the Firefox UI and has loaded my blog. On the right is the Chrome network panel showing that it loaded resources that include a 233MB gecko.wasm and an 18MB chrome-assets.tar.zst

They chose Firefox/Gecko because it has strong single-process support. The project used an estimated $25,000 worth of Claude Opus and Fable tokens, but took advantage of a Claude Max subscription plan so cost much less in actual dollars.

The demo funnels all traffic over a WebSocket protocol (using the Wisp protocol) through Puter's server - a requirement to get this kind of thing to work because code running in browsers can't open arbitrary network connections.

(That proxying sounds expensive! The team had to scale the servers up to handle the traffic during the Hacker News conversation about the project.)

Puter claim this supports end-to-end encryption and that looks to be true - I inspected the WebSocket messages and traffic to my own HTTPS site was encrypted whereas requests and responses to http://www.example.com/ were in cleartext.

Here's the repo for firefox-wasm. theogbob/WebkitWasm is a similar project that compiles WebKit to WASM, but that one doesn't currently have an accessible online demo.

# 11:34 pm / browsers, firefox, ai, webassembly, generative-ai, llms, ai-assisted-programming, claude, claude-mythos-fable

Kimi K3, and what we can still learn from the pelican benchmark

Chinese AI lab Moonshot AI announced Kimi K3 this morning, describing it as their “most capable model to date, with 2.8 trillion parameters”. It’s currently available via their website and API, but an open weight release is promised “by July 27, 2026”.

[... 1,113 words]

8:19 pm / ai, generative-ai, llms, llm-pricing, pelican-riding-a-bicycle, llm-release, ai-in-china, artificial-analysis, moonshot, kimi

On file deletions. We’ve investigated a handful of reports where GPT-5.6 unexpectedly deleted files.

What we have found is that this most commonly occurs when:

Full access mode is enabled and codex is run without sandboxing protections, including without auto review being enabled

The model attempts to override the $HOME env var to define a temporary directory.

The model makes an honest mistake and mistakenly deletes $HOME instead.

— Thibault Sottiaux, describing a pretty gnarly Codex bug

# 5:45 pm / codex, coding-agents, generative-ai, ai, llms