Why it matters: Researchers continue to find new ways to leverage artificial intelligence and machine learning capabilities as the technologies evolve. Earlier this week, Google scientists announced the creation of Transframer, a new framework with the ability to generate short videos based on singular image inputs. The new technology could someday augment traditional rendering solutions, allowing developers to create virtual environments based on machine learning capabilities.
The new framework’s name (and, in some ways, concept) are a nod to another AI-based model known as Transformer. Originally introduced in 2017, Transformer is a novel neural network architecture with the ability to generate text by modeling and comparing other words in a sentence. The model has since been included in standard deep learning frameworks such as TensorFlow and PyTorch.
Just as Transformer uses language to predict potential outputs, Transframer uses context images with similar attributes in conjunction with a query annotation to create short videos. The resulting videos move around the target image and visualize accurate perspectives despite having not provided any geometric data in the original image inputs.
Transframer is a general-purpose generative framework that can handle many image and video tasks in a probabilistic setting. New work shows it excels in video prediction and view synthesis, and can generate 30s videos from a single image: https://t.co/wX3nrrYEEa 1/ pic.twitter.com/gQk6f9nZyg
— DeepMind (@DeepMind) August 15, 2022
The new technology, demonstrated using Google’s DeepMind AI platform, functions by analyzing a single photo context image to obtain key pieces of image data and generate additional images. During this analysis, the system identifies the picture’s framing, which in turn helps the system to predict the picture’s surroundings.
The context images are then used to further predict how an image would appear from different angles. The prediction models the probability of additional image frames based on the data, annotations, and any other information available from the context frames.
The framework marks a huge step in video technology by providing the ability to generate reasonably accurate video based on a very limited set of data. Transframer tasks have also shown extremely promising results on other video-related tasks and benchmarks such as semantic segmentation, image classification, and optical flow predictions.
The implications for video-based industries, such as game development, could be potentially huge. Current game development environments rely on core rendering techniques such as shading, texture mapping, depth of field, and ray tracing. Technologies such as Transframer have the potential to offer developers a completely new development path by using AI and machine learning to build their environments while reducing the time, resources, and effort needed to create them.
Image credit: DeepMind