Tesla CEO Elon Musk is bored of the auto industry. We’ve known that for a while, but Tesla’s recent earnings call really solidified the CEO’s wandering priorities as the executive team attempted to refocus the call on the future of Tesla being “more than a car company.”
One interesting point that Musk brought up during the call was that he believes Tesla could be the Amazon Web Services (AWS) of distributed inference cloud computing—or a fancy way of saying he wants to sell the computing resources in every one of Tesla’s cars while they’re not being driven.
Tesla’s big AI and autonomy push
Elon Musk says the future of Tesla is in robotaxis, robotics and AI, not so much cars as we know them. Networking his cars to train AI models to autonomously “drive” is seen as key to this. But Tesla has to do a lot of work to get there first.
This would allow Tesla to sell idle compute power in all the cars across its fleet to the highest bidder, making Tesla some cold hard cash with virtually no extra effort. And such networking would also feed into Musk’s dreams for a network of fully self-driving robotaxis, trained and powered by artificial intelligence.
If achievable—and as with many things Tesla, that’s a big if—it could represent a radical reinvention of the way we understand cars to exist and operate today.
Training vs Inference
Before we get into Musk’s big idea, it’s important to understand the difference between model training and inference as they apply to the nascent field of artificial intelligence—because the hardware needed is vastly different.
“Training” an AI model means feeding algorithms curated data to produce accurate results. The model learns the desired outcome from the data it’s fed, which can be used to teach other applications how to behave. For example, video footage showing cars making safe unprotected left turns could help the model understand when it is, or isn’t, safe to execute the turn.
Then there’s inference, which is the term Musk used continually during the presentation to describe his big plan for an AWS-like distributed computing platform.
Hosting provider Cloudflare defines it well:
Inference is the process that a trained machine learning model uses to draw conclusions from brand-new data. An AI model capable of making inferences can do so without examples of the desired result. In other words, inference is an AI model in action.
An example of AI inference would be a self-driving car that is capable of recognizing a stop sign, even on a road it has never driven on before. The process of identifying this stop sign in a new context is inference.
To put it in simpler terms: Tesla is writing the dictionary (training the model) in its data center, and then Tesla’s cars would be looking up a word (running an inference task).
Musk’s Big Plan
Tesla designed its FSD computer to perform inference for its self-driving AI model. This means that inference is done locally on every Autopilot-equipped Tesla on the road today, enabling the cars to understand the training data and make informed decisions on how to brake, steer, and accelerate depending on its surroundings—so as long as those surroundings match what the model trains the car to look for.
Musk’s big idea is that Teslas around the world equipped with certain hardware versions (it isn’t clear at this time if that means HW3, HW4, or HW5) can be used as a distributed cluster of computing resources to run inference tasks, essentially using the resources found in the in-car computer to run an AI model that Tesla has already trained.
Here’s what Musk had to say during that first-quarter earnings call:
I think there’s also some potential here for an AWS element down the road. We’ve got very powerful inference because we’ve got a Hardware 3 in the cars, but now all cars are being made with Hardware 4,” said Musk. “Hardware 5 is pretty much designed and should be in cars, hopefully, toward the end of next year. And there’s a potential when the car is not moving to actually run distributed inference.
So, kind of like AWS, but distributed inference. Like it takes a lot of computers to train an AI model, but many orders of magnitude less compute to run it. So, if you can imagine future, perhaps where there’s a fleet of 100 million Teslas, and on average, they’ve got like maybe a kilowatt of inference compute. That’s 100 gigawatts of inference compute distributed all around the world.
Musk’s idea is to use up to a kilowatt of power from the car’s battery to supply power to the car’s onboard inference computer, which is part of the car’s Full Self-Driving hardware suite.
To put that amount of power into perspective, it’s essentially the equivalent of shoving an oversized power supply in a modern gaming computer with Nvidia’s top-of-the-line 4090 graphics card (and leaving some power headroom to spare).
To Tesla, this is a win-win. The biggest plus being it won’t cost them a dime to build out or maintain the hardware.
As Musk outright said during the quarterly earnings call, “the [capital expenses are] shared by the entire world.” That means anyone who buys a Tesla has already paid for the hardware that the automaker plans to use for this purpose. Plus, Tesla doesn’t have to maintain a central data center where power and cooling will cost them money.
More from Musk on that call:
[Amazon] found that they had excess compute because the compute needs would spike to extreme levels for brief periods of the year and then they had idle compute for the rest of the year. So, then what should they do to pull that excess compute for the rest of the year?
Monetize it. It seems like kind of a no-brainer to say, “OK, if we’ve got millions and then tens of millions of vehicles out there where the computers are idle most of the time that we might well have them do something useful.”
And then I mean if you get like to the 100 million vehicle level, which I think we will, at some point, and you’ve got a kilowatt of useable compute and maybe your own Hardware 6 or 7 by that time. Then I think you could have on the order of 100 gigawatts of useful compute, which might be more than anyone, more than any company.
“Everyone owns a small chunk,” said Musk. “And they get a small profit out of it, maybe.”
All Hardware is Not Created Equal
AI, at a foundational level, is based on mathematics. Training and inference utilize different numerical formats, two of which are called Integer (Int) and Floating Point (FP). Essentially, an integer value can only be a whole number whereas a float can be a number with decimal points. This means that you can store the number 3.1415 as a float, but only 3 natively as an integer. Generally, a float is used when more precision is needed.
Both Int and FP values have a finite amount of memory in which they can store numbers (for example; 8-bit, 16-bit, and 32-bit). The more memory width, the more precise a number can be stored with the caveat that more compute power is needed to perform both training and inference as numbers become more precise. Numerical formats have different trade-offs in efficiency depending on what type of calculations are being performed. Generally, integer operations can be calculated more quickly than floats.
Different inference tasks may require different levels of precision. A model where accuracy is of utmost importance but certain levels of higher latency can be allotted may use a floating point value with a 32-bit memory width (called FP32 for short) in inference, but could experience bottlenecking and more latency as a result. Another model may require results to be near-real-time and trade off some precision in favor of speed; this is where a trained model could utilize a quantized value (where a high-precision value is converted to a lower-precision value) during inference to process with a lower float or potentially even an integer value to save on resources and time.
So what does all of this mean, and why is it relevant to Tesla’s AWS-like rental model?
Different hardware can support different native numerical value processing. For example, Nvidia’s H100 GPUs recently added native support for FP8, whereas its predecessor, the A100, did not have this native support.
This means that if a potential customer used an FP8 data type for inference and wanted to rent out hardware from AWS, they wouldn’t pay for an EC2 P4 instance because these clusters use a Nvidia A100 GPU. They would rent an EC2 P5 instance that uses H100 GPUs instead.
Tesla is no stranger to these GPUs. In fact, it has a giant compute cluster being built out at the Buffalo Gigafactory to train the AI model that every Tesla on the road uses when Autopilot is activated. Musk said during the quarterly earnings call that Tesla’s training compute cluster currently uses 35,000 units of Nvidia’s $40,000 H100 GPUs to train Tesla’s AI model. Musk expects that cluster to swell to around 85,000 units by the end of 2024, meaning around $3.4 billion worth of GPUs will used in model training.
Tesla’s cars use a significantly cheaper and less compute-intensive solution to infer the data trained by these GPUs.
The point here is that the relevant hardware can be expensive and that Tesla’s hardware may not be cut out for every general inference job.
Musk says that in-car inference is performed using Int8 today. It’s not clear if Tesla’s hardware is designed to efficiently compute other numerical formats, or if that may be an improvement added in its upcoming Hardware version 5 (HW5), which is expected to be installed in cars shipping at the end of 2025.
Tesla hasn’t publicly stated the capability of HW4, however, based on the information known about its onboard hardware and data pulled by notorious Tesla hacker GreenTheOnly, it’s estimated that HW4 is capable of performing 245 teraops of integer-based operations (up from 144 TOPS in HW3) using 100-watts of energy.
Comparatively, a modern Nvidia 4060 GPU can perform 242 TOPS of Int8 calculations with a maximum power draw of 115 watts and a Nvidia H100 can deliver between 3,958 and 7,916 TOPS (depending on form factor) of Int8 calculations at about 700 watts.
Home Networking Just Isn’t The Same
Another consideration Tesla will have to make is how it plans to address the networking aspect of this service.
Unlike a data center, which can use low latency, multi-gigabit fiber connections between the internet and servers, Tesla’s vehicles connect to the internet wirelessly, either through a cellular connection when on the road or over WiFi when parked.
As you can imagine, cellular coverage can be hit or miss depending on a multitude of factors, as can speed. We’re going to assume that Tesla has taken this into account and won’t be performing this AWS-like service over a cellular connection, especially since it specified that this inference computing wouldn’t
The Model 3 currently supports the 802.11ac WiFi standard, which has a maximum theoretical speed of about 1,300 Megabits per second. Real-world speeds are significantly slower and dependent on a number of environmental and technical factors—but Forbes reports that the fastest recorded speeds are around 720 Mbps. When compared to AWS’s EC2 backbone of 10 Gigabits per second, that feels a bit slow. Not to mention the unknown of what comes at the demarcation point where the internet service provider takes over.
If the customer uses the Musk-adjacent Starlink internet, the ISP says that its users typically get “download speeds between 25 and 220 Mbps” and upload speeds “between 5 and 20 Mbps.” Given that the FCC recently decreed that high-speed broadband should be reclassified to a minimum of 100 Mbps down and 20 Mbps up, Starlink’s internet speeds may be a bit slow for data center-level inference.
Then there’s latency, or the time that it takes for information to be transmitted from the source computer to its destination computer over a network (including the internet.) Verizon says its FIOS service can achieve as low as 6.95 milliseconds of latency, whereas Starlink calls for between 25 and 60 ms for fixed connections. Remote locations can experience even more. Comparatively, Verizon says its 5G cellular service has about 10 ms of latency.
Moving large amount of data back and forth could be problematic, but it sounds like Tesla isn’t aiming for intensive workloads that will require tons of data throughput per job.
Ashok Elluswamy, the director of Autopilot software, says that Tesla could focus on batch document workloads sent its way from Language Learning Model companies (think ChatGPT) that rent the distributed computing services to “chunk through” the workload which would ideally require minimal throughput.
What’s Next?
All of Tesla’s ambitious dreams seem plausible, possible, and (frankly) groundbreaking for the auto industry. It could bring in additional revenue streams for Tesla, bolstering its earnings when new car sales are hurting, and help solidify Musk’s claims that Tesla isn’t a car company—it’s a tech company that just so happens to build cars. However, limitations with current hardware and networking in Tesla’s vehicles make things not so cut-and-dry as Musk and his team made it seem on the earnings call.
It’s clear that Tesla’s earnings call was attempting to lift the brand by showing that Tesla is more than a car company, and Musk is trying to show his value to the automaker amid a shareholder vote for his $56 billion payday.
Tesla’s value is intrinsically linked to being a disruptor of the auto industry—first, it was EVs and charging, now it’s AI. But Tesla still hasn’t solved self-driving, which has analysts a bit wary of the company’s valuation (especially as Tesla attempted to place its cheap $25,000 EV on the backburner). So is distributed computing another way that Tesla can generate revenue?
Tesla isn’t the first company to think of this distributed computing method for AI purposes. Uber released Petastorm in 2018, and there’s Petals and other similar projects. From a non-AI perspective, distributed computing has existed for ages (think Folding@Home or any form of cryptocurrency mining). That’s also not to say it’s a bad idea. It might not hurt to give it a shot if Tesla has access to the extra computing resources, the service works well, and it earns owners money.
Or maybe, just maybe, Tesla could use this excess computing to figure out how to fix its auto wipers.