Gemma 4 26B on a 16 GB AMD GPU !
Research and validation by dd.ie for the Logical Zombies Ltd AI Infrastructure Series.
It's hard to be retired, the days are long - the hardware is slow - the internet is choked..
lol - it's not like that at all - I have fibre at 500Mb/sec - I just built out Gemma 26Bn as my local LLM running at almost silly t/sec of:
Today (verified on this build) Speculative
64K @ 40 tok/s, parallel=4 64K @ 60–100 tok/s with MTP
131K @ 37 tok/s, parallel=4 131K @ 55–92 tok/s with MTP
262K @ 26.74 tok/s, parallel=1 262K @ 40–67 tok/s, parallel=4, with MTP + TurboQuant
26B MoE on 16 GB 31B Dense on 24 GB
so life is good... even better..
In my retirement, I don't garden; (I will if the weather ever gets better) for now I build. This week, I validated 40 tok/s on a 26B Gemma 4 local build—outperforming most cloud-latency tiers. The full technical verification is hosted at my firm, Logical Zombies Ltd."
here if you can't wait..
https://zombies.ie/blog-2026-q2-gemma-amd.html
"I spent the morning debating the ethics of silicon-based reasoning with Claude on the cloud, but I spent the afternoon actually running it on my own metal with Gemma. Is there difference between talking and doing, alas a 26B cannot compete with a 70B or wharever the number is in the stratosphere - it is like having a runabout easy to park versus the Ferrari waiting pensively for your return..
- FYI Ferrari is a make of car not a girl's name.
Abstract
The article documents the local-LLM build of Gemma 4 26B A4B (Apache 2.0) running on a Sapphire Nitro+ AMD Radeon RX 9060 XT 16 GB consumer GPU under Ubuntu 24.04, with ROCm 7.2.1 and a hand-built llama.cpp HIP backend. Three configurations are verified from live server logs: 64K context at ~40 tok/s for daily multi-slot use, 131K context at ~37 tok/s for long-context work, and the model’s full native 262,144-token context at 26.74 tok/s in single-slot mode. Each tier is documented with its complete launch command, memory breakdown, and tok/s measurements taken from llama.cpp’s own timing instrumentation.
Enjoy!
..
https://zombies.ie/blog-2026-q2-gemma-amd.html
..