A.I. / Thought

Artificially intelligent software learns to defeat human gamers

–Billy Wright

FEBRUARY: Artificially intelligent software, the Deep Q-network agent, has taught itself to play 49 different video games; it then proceeded to defeat its makers, as well as the human professionals. The A.I. agent even developed its own game-specific strategies, using novel tricks the programmers didn’t even know existed – a research team from Google-owned company DeepMind report in the journal, Nature.

Eyes are on the Deep Q-network software as major piece of research in the field of artificial intelligence, or A.I.

“This… is the first time that anyone has built a single general learning system that can learn directly from experience to master a wide range of challenging tasks,” says co-developer Demis Hassabis, of Google DeepMind. (NewScientist reports).

The British artificial intelligence company, Google DeepMind, formally DeepMind Technologies, has been operating, in the shade, since 2011, behind an unmarked London door. Their website gives little away: their goal is to “SOLVE INTELLIGENCE”—and they’re currently hiring.
Last year, Google acquired the company for a ballpark USD$500 million. Reportedly, one of DeepMind’s conditions for Google was that they establish an A.I. ethics committee.

The muted whirr and murmur broke in February, with an article published in Nature, detailing the artificial neurology of a childlike A.I. agent, and its brave new experiences with videogames from a 1980s Atari arcade console.

The agent is programmed with the Deep Q-network: a complex system of algorithms, supported by recent advances in ‘deep’ neural networks. This ultimately amounts to the agent’s ability to learn and adapt, adopting appropriate and effective behaviours in completely unfamiliar environments.

This is what differentiates it from the ‘Expert’ mode in a game of Pong, or any other computer-controlled adversary in a videogame. These computerised ‘players’ are preprogramed with sets of actions and responses, particular to the situation and mediated by a set ‘skill level’. Their design often works to simulate human-like intelligence—they make mistakes, miss the ball, get angry, or even panic and forfeit—but they are no more ‘intelligent’, or at least volitional, than a wound-up drumming monkey.

The neurology of the Deep Q-network is not an extension of the rulebook. When introduced to Space InvadersBreakoutMs. Pacman, and so on, the agent of Deep Q-network is not given any instructions on how to play each game. It presses random buttons on the keyboard until something happens.

“The only information they (the system) get is the pixels (on the screen) and the game score and the goal they’ve been told is to maximise the score,” Hassabis says.

“Apart from that, they have no idea about what kind of game they are playing, what their controls do, what they’re even controlling in the game.”

“It’s a little bit like a baby opening their eyes and seeing the world for the first time.”


Layers of interconnected artificial neurons—much like the cellular innards of the human brain—mathematically organise what begins as sensory input. For a human, this would be the taste, light, sound, etcetera, we receive through tongues, eyeballs and the auditory meatus. As Hassabis explains, the Deep Q-network agent receives a visual-like input of a game’s onscreen pixels, as well as the game score.

Over time, the agent begins to associate certain actions at specific game states with certain consequences, ultimately relative to its score: ‘doing this is good because it means that, which means this, which lets me do that—but I should do this sparingly, or otherwise that might happen again.’

It learns through association and positive and negative reinforcement. Hours or days later, depending on the game, what started as a limp flounder against space invasion becomes thoughtful, dextrous, expert-grade defence.

However, the agent performed on some games better than others. It achieved mastery at Space Invaders; but the maze design of Ms. Pacman was a hurdle, a game where scoring your first few points is more complicated. Quite impressively, at Breakout, which involves bouncing a ball to clear rows of blocks, it learnt to tunnel through one column of blocks and bounce the ball off the back wall, a trick which its makers hadn’t thought of.

Screenshot from Atari 2600 home version of Breakout. Source: Wikimedia.

Screenshot from Atari 2600 home version of Breakout. Source: Wikimedia.

Could this be thought of as an act of creativity? Perhaps the significance here is that it was at least an act of innovation. The agent came up with a trick that no one had thought of, which turned out to be the best way to get things done.

Ultimately, DeepMind describe this as the purpose of research like the Deep Q-network. Its neurology is a step towards eventually creating an intelligent agent that can be placed in any context, whether familiar or not, and find the fastest and most effective way to solve a posed problem.

Experts today are pointing to the sheer immensity of the amount information in our world. Telecommunications data, the stock market, weather patterns and climate change, medical research—such bodies of data are growing at an exponential rate and have already begun to overwhelm the human operators. Artificially intelligence, with the ability to process massive quantities of new information, adapting and responding accordingly, has been foresighted as the solution to this problem.

The agent of the Deep Q-network has a sole purpose: reach a higher score. It lives and performs with existential purity, and into eternity, if it has to. It will not rest, nor deviate. It learns, remembers, anticipates and moves forward constantly and it absolutely will not stop. On some games, it approaches perfection.

With that kind of dedication, maybe Deep Q and pals can help us eradicate a few diseases before one of them becomes the Singularity: a self-improving artificial genius, existing though intangible ubiquity, ghostlike and electrical, eventually achieving omnipotence, and tolerating humans until it figures out that they’re no good for it and no good for themselves.

Prometheus stole fire from the gods and gave it as a gift to man and woman. For his crime, he was chained to the mountain. And from the smith of Hephaestus, Greek god of fire and forge, flew some winged machine. It screamed with black oil and roosted under bronze feathers like razors. Each day, the bird would fly to Prometheus and peck out his liver, only for it to grow back before next morning. Such was his perpetual torment.

But, of course, they used to say television rotted your brain, when it first became a thing.

Stay in touch: find Lapsus on Facebook.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s