Relax, You’re Still Better at Playing ‘Doom’ Than AI

by shayaan

Despite the buzz around artificial intelligence, even the most advanced vision language-GPT-4O, Claude Sonnet 3.7 and Gemini 2.5 Pro-Straks with a decades of old challenge: playing the classic first-person shooter Doom.

A new research project introduced on Thursday VideoGamebenchAn AI benchmark that is designed for testing whether ultramodern vision language models can play and beat a series of 20 popular video games, with only what they see on the screen.

“Our experience is that the current state-of-the-art VLMs have considerable effort to play video games because of a high latency of the inference,” the researchers said. “When an agent makes a screenshot and the VLM asks about which action to take, by the time the answer comes back, the game status has changed considerably and the action is no longer relevant.”

The researchers stated that they used classic Game Boy and MS-DOS games because of their simpler visuals and various input styles, such as a mouse and keyboard or game controller, which better test the spatial reasoning options of a vision language than text-based games.

Videogamebench was developed by computer scientist and AI researcher Alex Zhang. The game package includes classics such as Warcraft II, Age or Empires and Prince of Persia.

According to the researchers, delayed answers are the most problematic with First-person shooters such as Doom. In these fast environments, an enemy that is visible in a screenshot may already be moved or even reached the player through the time that the model works.

See also  Web2 Giants Are Playing Their Part to Support Crypto Purchases Everywhere

For software developers, Doom has long served as a litmus test for technological possibilities in game environments. Lawnmowers, Bitcoin and even human intestinal bacteria have confronted the demons from hell with different levels of success. Now it’s AI’s turn.

“What Doom brought out of the shadow of the 90s and in the modern light is not the compelling gameplay, but rather the attractive computational design,” said Mit Biotech researcher Lauren Ramlan earlier Decrypt. “Built on the ID Tech 1 engine, the game is designed to only require the most modest setups to be played.”

In addition to struggling with understanding game environments, the models often failed to perform simple in-game actions.

“We have observed frequent cases in which the agent had difficulty understanding how his actions – such as going to the right – would translate on the screen,” the researchers said. “The most consistent failures across all border models we have tested was an inability to reliably control the mouse in games such as Civilization and Warcraft II, where precise and frequent mouse movements are essential.”

To better understand the limitations of the current AI systems, Videogambankch emphasized the importance of evaluating their reasoning skills in environments that are both dynamic and complex.

“In contrast to extremely complicated domains such as unsolved mathematical evidence and Olympiad level math problems, playing video games is not a superhuman reasoning task, but models are still struggling to solve them,” they said.

Published by Andrew Hayward



Source link

You may also like

Latest News

Copyright © Sovereign Wealth Signals