Elevate your enterprise data technology and strategy at Transform 2021.
Facebook today proposed NetHack as a grand challenge for AI research, for which the company is launching a competition at the NeurIPS 2021 AI conference in Sydney, Australia. It’s Facebook’s assertion that NetHack, an ’80s video game with simple visuals that’s considered among the hardest in the world, can enable data scientists to benchmark state-of-the-art AI methods in a complex environment without the need to run experiments on a powerful computer.
Games have served as AI benchmarks for AI for decades, but things really kicked into gear in 2013 — the year Google’s DeepMind demonstrated a system that could play Pong, Breakout, Space Invaders, Seaquest, Beamrider, Enduro, and Q*bert at superhuman levels. The advancements aren’t merely improving game design, according to experts like DeepMind cofounder Demis Hassabis. Rather, they’re informing the development of systems that might one day diagnose illnesses, predict complicated protein structures, and segment CT scans.
In particular, reinforcement learning — a type of AI that can learn strategies to orchestrate large systems like manufacturing plants, traffic control systems, financial portfolios, and robots — is transitioning from research labs to highly impactful, real-world applications. For example, self-driving car companies like Wayve and Waymo are using reinforcement learning to develop the control systems for their cars. And via Microsoft’s Bonsai, Siemens is employing reinforcement learning to calibrate its CNC machines.
“Recent advances in reinforcement learning have been fueled by simulation environments such as games like StarCraft II, Dota 2, or Minecraft. However, this progress came at substantial computational costs, often requiring running thousands of GPUs in parallel for a single experiment, while also falling short of leading to … methods that can be transferred to more real-world problems outside of these games,” Facebook AI researchers Edward Grefenstette, Tim Rocktäschel, and Eric Hambro wrote in a blog post. “We need environments that are complex, highlighting shortcomings of RL, while also allowing extremely fast simulation at low computation costs.”
Facebook’s proposal follows the release of the company’s NetHack Learning Environment (NHLE), a research tool based on the original NetHack. (The NetHack Challenge is in turn based on the NHLE.) NetHack, which was first released in 1987, tasks players with descending more than 50 dungeon levels to retrieve a magical amulet, during which they must use wands, weapons, armors, potions, spellbooks, and other items and fight monsters. Levels in NetHack are procedurally generated and every game is different, which the Facebook researchers note tests the generalization limits of leading AI.
“Winning a game of NetHack requires long term planning in an incredibly unforgiving environment. Once a player’s character dies … the game starts from scratch in an entirely new dungeon,” Grefenstette, Rocktäschel, and Hambro continued. “Successfully completing the game as an expert player takes on average 25 to 50 times more steps than an average StarCraft II game, and players’ interactions with objects and the environment are extremely complex, so success often hinges on calling upon imagination to solve problems in creative or surprising ways as well as consulting external knowledge sources [such as] the official NetHack Guidebook, the NetHack Wiki, and online videos and forum discussions].”
Partial observation makes exploration in NetHack essential, and procedural generation and “permadeath” make the cost of failure significant. And AI can’t reset or interfere with the environment, making the methods that underpin systems like DeepMind’s AlphaZero for StarCraft II or Uber’s Go-Explore for Montezuma’s Revenge impossible.
“[The challenges in NetHack] range from randomized mazes to more structured challenges, like large rooms full of monsters and traps, towns and forts, and hazards such as kraken-infested waters,” Grefenstette, Rocktäschel, and Hambro said. “New ways of dealing with the ever changing observations in a stochastic and rich game world calls for the development of techniques that have a better chance of scaling to real-world settings with high degrees of variability.”
NetHack has another advantage in its lightweight architecture. A turn-based, ASCII-art world and a game engine written primarily in C captures its complexity. NetHack forgoes all but the simplest physics while rendering symbols instead of pixels, importantly, allowing AI to learn quickly without wasting computational resources on simulating dynamics or rendering observations.
Indeed, training sophisticated machine learning models in the cloud remains prohibitively expensive. According to a recent Synced report, the University of Washington’s Grover, which is tailored for both the generation and detection of fake news, cost $25,000 to train over the course of two weeks. OpenAI racked up $256 per hour to train its GPT-2 language model, and Google spent an estimated $6,912 training BERT, a bidirectional transformer model that redefined the state of the art for 11 natural language processing tasks.
By contrast, a single high-end graphics card is sufficient to train AI-driven NetHack agents hundreds of millions of steps a day using the TorchBeast framework, which supports further scaling by adding more graphics cards or machines. Agents can experience billions of steps in the environment in a reasonable time frame while still challenging the limits of what current techniques can achieve.
“[The NHLE] can train reinforcement learning agents …15 times faster than even decade-old Atari benchmark[s]. Furthermore, NetHack can be used to test the limits of even more recent state-of-the-art deep reinforcement learning methods while running 50 to 100 times faster than challenges of comparable difficulty while providing a higher degree of complexity.”
The NHLE consists of three components: a Python interface to NetHack using the popular OpenAI Gym API, a suite of benchmark tasks, and a baseline machine learning agent. To beat the NetHack Challenge, entrants must develop AI that can reliably either win at NetHack or achieve as high a score as possible. In doing so, the competition aims to yield a head-to-head comparison of different methods and new benchmarks for future research, while at the same time showcasing the suitability of the NHLE as a setting for research.
There won’t be restrictions on how the systems can be trained for the NetHack Challenge, Facebook says — participants are welcome to use techniques besides machine learning if they choose. Awards will be given for (1) the best overall AI system, (2) the best AI system not using a neural network, and (3) the best AI system from an academic or independent team.
Grefenstette, Rocktäschel, and Hambro say that achieving these objectives will lay the groundwork for follow-up competitions focused on specific aspects of AI. Moreover, the NetHack Challenge might help bring light to classes of training methods and modeling approaches capable of dealing with highly varied environments and a high cost of errors, like having to restart from scratch if a character is killed by a creature.
“Many real-world and industrial problems — navigation, for example — share these characteristics. Consequently, making progress in NetHack is making progress toward reinforcement learning in a wider range of applications,” Grefenstette, Rocktäschel, and Hambro said.
Facebook’s NeurIPS 2021 NetHack Challenge will be conducted in partnership with co-organizer AIcrowd, and it’ll run from early June through October. The winners will be announced at NeurIPS in December.
GamesBeat’s creed when covering the game industry is “where passion meets business.” What does this mean? We want to tell you how the news matters to you — not just as a decision-maker at a game studio, but also as a fan of games. Whether you read our articles, listen to our podcasts, or watch our videos, GamesBeat will help you learn about the industry and enjoy engaging with it. How will you do that? Membership includes access to:
- Newsletters, such as DeanBeat
- The wonderful, educational, and fun speakers at our events
- Networking opportunities
- Special members-only interviews, chats, and “open office” events with GamesBeat staff
- Chatting with community members, GamesBeat staff, and other guests in our Discord
- And maybe even a fun prize or two
- Introductions to like-minded parties