Self-driving-car developers initially held a similar philosophy of data maximization. They generate video from arrays of cameras inside and outside the vehicles, audio recordings from microphones, point clouds mapping objects in space from lidar and radar, diagnostic readings from vehicle parts, GPS readings, and much more.
Some assumed that the more data collected, the smarter the self-driving system could get, says Brady Wang, who studies automotive technologies at market researcher Counterpoint. But the approach didn’t always work because the volume and complexity of the data made them difficult to organize and understand, Wang says.
In more recent years, companies have started holding on to only data believed to be specifically useful, and have also focused on organizing them well. Practically speaking, data from driving on a sunny day in the desert for an hour might start looking repetitive, so the utility of keeping them all has come into question.
Limits aren’t entirely new. Chatham, the distinguished software engineer at Waymo, says getting access to more digital storage wasn’t simple when the company was a tiny project inside Google over a decade ago and he was a one-person team. Data that had no clear use was deleted, like recordings of failed driverless maneuvers. “If we treated storage as infinite, the costs would be astronomical,” Chatham says.
After Waymo became an independent company with significant outside investment, the project gobbled data storage more freely. For instance, when Waymo started testing the Jaguar I-Pace in late 2019, the crossover SUV came with more powerful sensors that generated a bigger stream of information—to the point that full logs for an hour’s driving equated to more than 1,100 gigabytes, enough to fill 240 DVDs. Waymo increased its storage capacity significantly at the time, and teams got less picky about what they kept, Chatham says.
More recently, Chatham’s team began setting strict quotas and asking people across the company to be more judicious. Waymo now keeps only some of its newly generated data and more recently began deleting saved data as it becomes outdated compared to current technology, conditions, and priorities. Chatham says that strategy is working well. “We have to start discarding data fast as our service grows,” he says.
Waymo carried paying passengers more than 23,000 miles in California between September and November of last year, up from about 13,000 miles over a similar timeframe just six months earlier, according to disclosures to state regulators.
Data caps in some cases have factored in the priorities of autonomous vehicle companies. With some negotiation allowed, Chatham’s team allots quarterly storage allowances to groups of engineers working on different tasks, such as developing AI to identify what’s around a vehicle (perception) or testing planned software updates against past rides (evaluation). Those teams decide what’s worth keeping—say, data on the actions of emergency vehicles—and an automated system filters out everything else. “That becomes a business decision,” Chatham says. “Is snow or rain data more important to the business?”
Snow has won out for now, because Waymo so far has only limited data from driving in it. “We’re keeping every piece,” Chatham says. Rain has gotten less interesting. “We’ve gotten better at rain, so we don’t need to go to infinity.” Being data-thrifty can sometimes prompt creativity or valuable discoveries, he says. Waymo learned at one point that its rain data needlessly included all the sensor readings its cars had collected while parked.