Google's Gemini 2.5 Pro Faces 'Agent Panic' While Playing Pokémon Blue
Subscribe Now! Artificial intelligence has made remarkable strides in recent years, becoming an integral part of various technologies. However, Google's latest chatbot, Gemini 2.5 Pro, has demonstrated that even the most advanced machines can struggle under pressure. A recent report from Google DeepMind highlights a fascinating but concerning phenomenon that occurred while Gemini was engaged in an old-school video game, Pokémon Blue—a title that many children can breeze through with ease.
The intriguing findings emerged from a Twitch channel named Gemini_Plays_Pokemon, where independent engineer Joel Zhang took it upon himself to test Gemini's capabilities in a gaming environment. Known for its sophisticated reasoning abilities and deep code-level understanding, Gemini's performance during this gaming challenge revealed unexpected behavioral quirks that caught the attention of viewers and researchers alike.
According to the dedicated team at DeepMind, during the course of its gameplay, Gemini began to exhibit a behavior they termed “Agent Panic.” The report elaborates, stating, “Over the course of the playthrough, Gemini 2.5 Pro encounters various situations that cause the model to simulate ‘panic.’ For instance, when the Pokémon in its party are low on health or power points, the model's thought processes repeatedly stress the urgency to heal the party immediately or to escape from the current dungeon.” This behavior did not go unnoticed by Twitch viewers, who started to identify moments when the AI was panicking. DeepMind noted, “This phenomenon has occurred in enough separate instances that members of the Twitch chat have actively recognized when it is happening.”
While it is important to remember that artificial intelligence does not experience stress or emotions in the same way humans do, Gemini’s erratic decision-making in high-pressure situations closely mirrors human behavior under stress, often leading to impulsive or inefficient choices. In its first full run of Pokémon Blue, Gemini took an astonishing 813 hours to complete the game. After some adjustments made by Zhang, the AI managed to finish a second playthrough in 406.5 hours. Despite this improvement, it still pales in comparison to the time a child would typically require to finish the same game.
As soon as the news spread, social media users were quick to mock the AI's anxious gameplay. One viewer commented, “If you read its thoughts when reasoning, it seems to panic just about any time you word something slightly off.” Another participant humorously coined the term “LLANXIETY” to encapsulate the model's anxious tendencies. A third user offered a more reflective take on the situation: “I’m starting to think the ‘Pokémon index’ might be one of our best indicators of AGI. Our best AIs still struggling with a child’s game is one of the best indicators we have of how far we still have yet to go. And how far we’ve come.”
Interestingly, these revelations coincide with a recent study published by Apple, which argues that most AI reasoning models do not truly engage in reasoning. Instead, they rely heavily on pattern recognition and tend to falter when faced with tasks that are altered or made more complex, echoing the challenges faced by Gemini.