I built the world's first Chrome extension that runs LLMs entirely in-browser—WebGPU, Transformers.js, and Chrome's Prompt API
Here's something that caught my attention — someone built the world's first Chrome extension that runs large language models directly in your browser. And get this — no servers, no subscriptions, just local inference. According to /u/psgganesh, it runs models like Llama 3.2, Qwen3, and Mistral right in Chrome using WebGPU, Transformers.js, and Chrome's Prompt API. The best part? All models are cached offline in IndexedDB, so it works even without internet. You can do quick drafts, summaries, or code help without worrying about API costs or privacy breaches. Now, here's where it gets interesting — this isn’t about replacing GPT-4, but for everyday tasks, a 3-billion-parameter model running locally is more than enough. As /u/psgganesh points out, it’s perfect for organizations with strict data restrictions or those who want complete privacy. So what does this actually mean? It’s a game-changer for anyone who wants fast, private AI — right in their browser, anytime.