A trio of Stanford computer scientists have developed a deep learning model to geolocate Google Street View images, meaning it can figure out generally where a picture was taken just by looking at it.
The software is said to work well enough to beat top players in GeoGuessr, a popular online location-guessing game.
That’s not to say the academics’ model can pinpoint exactly where a street-level photo was taken; it can instead reliably figure out the country, and make a good guess, within 15 miles of the correct location, a lot of the time – though more often than not, it’s further out than that distance.
In a preprint paper titled, “PIGEON: Predicting Image Geolocations,” Lukas Haas, Michal Skreta, and Silas Alberti describe how they developed PIGEON.
It’s an image geolocation model derived from their own pre-trained CLIP model called StreetCLIP. Technically speaking, the model is augmented with a set of semantic geocells – bounded areas of land, similar to counties or provinces, that consider region-specific details like road markings, infrastructure quality, and street signs – and ProtoNets – a technique for classification using only a few examples.
PIGEON recently competed against Trevor Rainbolt, a top ranked player of GeoGuessr known simply as Rainbolt on YouTube, and won.
The boffins in their paper claim PIGEON is the “first AI model which consistently beats human players in GeoGuessr, ranking in the top 0.01 percent of players.” Some 50 million or more people have played GeoGuessr, we’re told.
Alberti, a doctoral candidate at Stanford, told The Register, “It was kind of like our small Deep Mind competition,” a reference to Google’s claim that its DeepMind AlphaCode system can write code comparable to human programmers.
I think that this was the first time AI beat the world’s best human at GeoGuessr
“I think that this was the first time AI beat the world’s best human at GeoGuessr,” he said, noting that Rainbolt prevailed in two previous matches with AI systems.
Geolocating images has become something of an art among open source investigators, thanks to the work of journalistic research organizations like Bellingcat. The success of PIGEON shows that it’s also a science, one that has significant privacy implications.
While PIGEON was trained on to geolocate Street View images, Alberti believes this technique may make it easier to geolocate almost any image, at least outdoors. He said he and his colleagues had tried the system with image datasets that don’t include Street View images and it worked very well.
The other kind of intelligence
Alberti recounted a discussion with a representative of an open source intelligence platform who expressed interest in their geolocation technology. “We think it’s likely that our method can be applied to these scenarios too,” he said.
Asked whether this technology will make it even harder to conceal where images were captured, Alberti said, if you’re on any street, geolocation will become quite likely because there are so many telltale signs about where you are.
“I was asked the other day ‘what about if you are off the streets, somewhere in the middle of nature?'” he said. “Even there, you have a lot of signs of where you could be, like the way the leaves are, the sky, the color of the soil. These can certainly tell you what country or what region of a country you’re in, but you can probably not locate the particular town. I think interior pictures will probably remain very hard to locate.”
I think interior pictures will probably remain very hard to locate
Alberti said one of the key reasons PIGEON works well is that it relies on OpenAI’s CLIP as a foundation model.
“Many other geolocation models previously, they just train the model from scratch or use an ImageNet-based model. But we noticed that using CLIP as a foundation model, it has just seen a lot more images, has seen a lot more small details, and is therefore much better suited to the task.”
Alberti said the use of semantic geocells proved very important because if you just predict coordinates, you tend to get poor results. “Even with CLIP as a foundation model, you’ll land in the ocean most of the time,” he said.
“We spent a lot of time optimizing these geocells, for example, making them proportionate to the density of the population in certain regions, and making them respect different administrative boundaries on multiple levels.”
Haas, Skreta, and Alberti also devised a loss function – which computes the distance between the algorithm’s output and the expected output – that minimizes the prediction penalty if the predicted geocell is near the actual geocell. And they apply a meta learning algorithm that refines location predictions within a given geocell to improve accuracy.
“That way we can sometimes match images up to like a kilometer,” said Alberti.
As Skreta noted in the Rainbolt video, PIGEON currently guesses 92 percent of countries correctly and has a median kilometer error of 44 km, which translates into GeoGuessr score of 4,525. According to the research paper, the bird-themed model places about 40 percent of guesses within 25 km of the target.
Game on. ®