Google's Pixel 3 and 3 XL have some of the best (best) cameras in the year, and in pairs with it, impressive performance is a handful of beautiful nifty software features. One of them, called Top Shot, makes these great pictures even better by helping you make a mistake and snaps a picture just a bit too early or late. And while we knew a little about how Top Shot worked, there were quite some gaps in our knowledge. Fortunately for the curious, Google has just published a more technical explanation of the technology behind it.
The full deed is on Google's AI blog for your long-format technical readership. Although the official review goes a little bit out of order, it doubles again in ways that can make it harder to read than other good AI blog posts. . If you prefer the simpler version, read on:
Google Clips Heritage
Top Shot functionality is based on the same tools created by Google for use with Google Clips. Although it does not seem to have been successful as a product itself, the restrictions associated with Clips required some advanced problem solving: How could you create an auto camera that independently recognizes and saves it only best short video options it sees?
For the skinny of how Google has pulled the magic out, read the detailed explanation that was published earlier this year, but the (exceedingly) short version was that they used machine learning to do it – the vogue mechanism to solve all the abstract problems these days.
Photographers and video actors delivered relative measurements between pairs of Google Clips training data.
A model was trained on thousands of preselected source videos using professional photographers and video editors who manually chose between pairs to train the model on the best clips to search for imitation. In fact, over 50 million binary comparisons were made to collect data for the model. Combining it with the already existing image recognition already developed for Google Photos, the developers of Google Clips could create a model that worked to predict "interesting" content that was evaluated on what it called an Moment Score built on recognized things in line with the qualities of a good clip. But the model could only run on power-hungry hardware. The really Genius then trained a simpler model in parallel to imitate the performance of its server-based brother (ie, using a model to train another).
There is much more to it, but with all this information combined (plus some ongoing on-device training that recognizes more "familiar" faces and pets over time), Google Clips can determine a relative score what it sees and further select when and how to capture content.
With the small Google Clips that could recognize good quality from not so good, it was a relative mental hop, jump and a jump to consider adapting the overall concept to Pixel 3 – albeit in a slightly different way.
Brought to pixel 3
Even before you press the metaphoric shutter on Pixel 3, Top Shot already works in the background with Motion Photos – if you remember it, it's the feature of Google Pixels that allows you to record a short video just before, and after a picture is taken. It may seem like a simple step from capturing before and after video to before and after pictures, but there's much more to it.
Google claims that Top Shot captures up to 90 images from the 1.5 seconds before and after the shutter is pressed for comparison. When we spoke to a Google representative at the Top Shot Pixel 3 launch event, we were told most of the alternative photos where still images were taken from Motion Photo, but a few few Top Shot options are also stored before the video encoding process, in one higher quality resolution than the subsequent Motion Photo video.
Abstract diagram of the Top Shot capture process.
But long before it can save these pictures, Top Shot must decide very quickly Which one It's worth saving, adding the previously mentioned Google Clips. Top Shots custom, effective model on the device was trained on sample photos in sets of up to 90 to sort through all the photos taken to save the best. It removes those that can be unclear, incorrectly exposed, or where a subject's eyes can be closed and attempting to recognize things like smiling or other visible emotional expressions. It even takes into account other data such as information from the phone's gyroscope (captured for other uses) to further increase the speed at which it can determine an alternate photo quality.
Once it has detected up to two images, it means it could be better than what you intended to capture, they are stored in HDR + quality and set aside in the same file with Motion Photo. And later, when you go to review these photos, the ability to switch to one of the other intelligently captured alternatives will introduce themselves. If you wish, you can even manually select one of the lower resolutions, non-HDR photos, if you think they are even better.
Two quick and easy Top Shot recommendations. Who knew such simplicity was built on such complexity?
Like all the new new features on phones, Top Shot's photo magic today is courtesy of a lot of advanced machine learning-driven software engineering. For how easy and practical it is to use, there are a lot of advanced and complex machinations that take place behind the scenes, and now they are a bit less mysterious to you.