Simon Willison’s Weblog

Int-4 LLaMa is not enough - Int-3 and beyond (via) The Nolano team are experimenting with reducing the size of the LLaMA models even further than the 4bit quantization popularized by llama.cpp.

Posted 13th March 2023 at 11:55 pm

Recent articles

My Lethal Trifecta talk at the Bay Area AI Security Meetup - 9th August 2025
The surprise deprecation of GPT-4o for ChatGPT consumers - 8th August 2025
GPT-5: Key characteristics, pricing and model card - 7th August 2025

ai 1497 generative-ai 1311 llama 76 local-llms 133 llms 1289

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe