Show HN: Running an LLM Inside Scratch

This runs the smallest llama2.c checkpoint (stories260K) inside Scratch/TurboWarp by compiling C inference code into Scratch blocks using llvm2scratch. The model is quantized to Q8_0 and packed into Scratch lists. If everything works, the sprite streams "Once upon a time..." token-by-token into its speech bubble.

I started this as an experiment in how far Scratch's VM could be pushed, and because the idea of running an LLM inside Scratch felt absurd and fun. The main challenges were fitting quantized weights into list memory, working around JS call stack limits, and patching llvm2scratch to support additional IR patterns emitted by clang -O2.

Generates ~1 token every 10 seconds.

Live demo: https://scratch.mit.edu/projects/1277883263

Source: https://github.com/broyojo/llm_from_scratch

Comments