Google Unveils AI Edge Gallery: A Groundbreaking Offline AI Experience

Google has released a new app that nobody asked for, but everyone wants to try. The AI Edge Gallery, which launched quietly on May 31, puts artificial intelligence directly on your smartphone—no cloud, no internet, and no sharing your data with Big Tech's servers. The experimental app—released under the Apache 2.0 license, allowing anyone to use it for almost anything—is available on GitHub, starting with the Android platform. The iOS version is coming soon. It runs models like Google's Gemma 3n entirely offline, processing everything from image analysis to code writing using nothing but your phone's hardware. The app, which appears to be aimed at developers for now, includes three main features: AI Chat for conversations, Ask Image for visual analysis, and Prompt Lab for single-turn tasks such as rewriting text. Users can download models from platforms like Hugging Face, although the selection remains limited to formats such as Gemma-3n-E2B and Qwen2.5-1.5 B. Reddit users immediately questioned the app's novelty, comparing it to existing solutions like PocketPal. Some raised security concerns, though the app's hosting on Google's official GitHub counters impersonation claims. No evidence of malware has surfaced yet. We tested the app on a Samsung Galaxy S24 Ultra, downloading both the largest and smallest Gemma 3 models available. Each AI model is a self-contained file that holds all its “knowledge”—think of it as downloading a compressed snapshot of everything the model learned during training, rather than a giant database of facts like a local Wikipedia app. The largest Gemma 3 model available in-app is approximately 4.4 GB, while the smallest is around 554 MB. Once downloaded, no further data is required—the model runs entirely on your device, answering questions and performing tasks using only what it learned before release. Even on low-speed CPU inference, the experience matched what GPT-3.5 delivered at launch: not blazing fast with the bigger models, but definitely usable. The smaller Gemma 3 1B model achieved speeds exceeding 20 tokens per second, providing a smooth experience with reliable accuracy under supervision. This matters when you're offline or handling sensitive data you'd rather not share with Google or OpenAI's training algorithms, which use your data by default unless you opt out. GPU inference on the smallest Gemma model delivered impressive prefill speeds over 105 tokens per second, while CPU inference managed 39 tokens per second. Token output—how fast the model generates responses after thinking—reached around 10 tokens per second on GPU on average and seven on CPU. Additionally, it appears that CPU inference on smaller models yields better results than GPU inference, although this may be anecdotal; however, this has been observed in various tests. For example, during a vision task, the model on CPU inference accurately guessed my age and my wife’s in a test photo: late 30s for me, late 20s for her. The supposedly better GPU inference got my age wrong, guessing I was in my 20s (I’ll take this “information” over the truth any day, though.) Google's models come with heavy censorship, but basic jailbreaks can be achieved with minimal effort. Unlike centralized services that ban users for circumvention attempts, local models don't report back about your prompts, so it can be a good practice to use jailbreak techniques without risking your subscription or asking the models for information that censored versions will not provide. The app only accepts .task files, not the widely adopted .safetensor format that competitors like Ollama support. This significantly limits the available models, and although there are methods to convert .safetensor files into .task, it’s not for everybody. Code handling works adequately, although specialized models like Codestral would handle programming tasks more effectively than Gemma 3. Again, there must be a .task version for it, but it can be a very effective alternative. For basic tasks, such as rephrasing, summarizing, and explaining concepts, the models excel without sending data to Samsung or Google's servers. So, there is no need for users to grant big tech access to their input, keyboard, or clipboard, as their own hardware is handling all the necessary work. The context window of 4096 tokens feels limited by 2025 standards, but matches what was the norm just two years ago. Conversations flow naturally within those constraints. And this may probably be the best way to define the experience. Considering you are running an AI model on a smartphone, this app will provide you a similar experience to what the early ChatGPT provided in terms of speed and text accuracy—with some advantages like multimodality and code handling. But why would you want to run a slower, inferior version of your favorite AI on your phone, taking up a lot of storage and making things more complicated than simply typing ChatGPT.com? Privacy remains the killer feature. For example, healthcare workers handling patient data, journalists in the field, or anyone dealing with confidential information can now access AI capabilities without data leaving their device. “No internet required” means the technology works in remote areas or while traveling, with all responses generated solely from the model’s existing knowledge at the time it was trained.. Cost savings add up quickly. Cloud AI services charge per use, while local models only require your phone's processing power. Small businesses and hobbyists can experiment without ongoing expenses. If you run a model locally, you can interact with it as much as you want without consuming quotas, credits, or subscriptions, and without incurring any payment. AI could help solve the quantum problem. Image created by Decrypt using AI Artificial Intelligence Google's AI Breakthrough Brings Quantum Computing Closer to Real-World Applications Google researchers have discovered a new technique that could finally make quantum computing practical in real life, using artificial intelligence to solve one of science's most persistent challenges: more stable states. In a research paper published in Nature, Google Deepmind scientists explain that their new AI system, AlphaQubit, has proven remarkably successful at correcting the persistent errors that have long plagued quantum computers. "Quantum computers have the potential to revolutionize...