Brains vs. computers: Efficiency comparison
The bitter lesson of AI implies that progress in artificial intelligence can be almost entirely attributed to increases in compute power. In light of this observation, it would be interesting to do an up-to-date comparison of raw compute power of human brain and commercially available computers.
These estimates are very rough. I didn't bother narrowing margin of error below one order of magnitude and I guess that's not even possible due to our limited understanding of human brain. The numbers presented here are therefore good enough only for measuring differences on logarithmic scale, i.e. in orders of magnitude.
Human brain
Memory
Human brain contains about 86G neurons (I will use standard K/M/G/T/P units to avoid confusion between short and long scale). Typical neuron has thousands of synapses. Total number of synapses in the brain is estimated between 100T and 500T. Every synapse has several kinds of short-term and long-term memory, so at least several bytes per synapse. Total raw memory capacity of the brain is therefore likely to be at least 1 PB.
Compute
Computation in the brain is mostly tied to synapses in one way or another. Synapses aren't just dumb plugs connecting neurons. Synapses are tiny molecular circuits that perform some amount of computation, for example desensitization and plasticity. It is not known how much computation synapses perform, but it's certainly at least several floating-point operations per spike. I have heard wildly higher estimates for molecular computation, but let's use this conservative estimate. Spiking patterns vary, but it is fairly safe to assume that average firing rate is at least 10Hz. Putting this together with the estimate of 100-500T synapses in the brain, we can roughly estimate raw compute power of human brain to be at least 10 PFLOPS.
Bandwidth
Since synapses have to be in the vicinity of the axon supplying them, brain effectively has a kind of locality of reference. It is similar to cache line or disk block in computers, but synapses afford a degree of freedom in local addressing. We will measure bandwidth in firing synapses rather than firing axons for the same reasons bandwidth in computers is measured in B/s rather than in blocks per second. Every spike is worth several bits, assuming timing of spikes is somewhat important. Global (axon-mediated) bandwidth is therefore at least 1 PB/s. This does not include synapse-local and intra-neuron bandwidth.
Computers
Computers come in many sizes. To make the comparison fair, let's consider only performance that can fit in brain's 20W power envelope.
Memory
DRAM specs rarely mention power draw. Crucial site caps it at 3W per 8GB. We could probably cram 64GB in 10W by picking from the latest modules. SSDs do not consume any power at rest, so the only limit is cost. Largest reasonably priced SSDs can store 10 TB.
Compute
Tensor cores on Nvidia H100 are spec'd for about 10 TFLOPS per watt (FP8). If we spend 10W of our power budget on compute, that's 100 TFLOPS.
Bandwidth
Memory bandwidth varies by type of memory (L1-L3 cache, HBM memory, SSD). For high locality models, L2 cache throughput is relevant. Assuming the above mentioned H100 has cache fast enough to keep its tensor cores busy, we can estimate 100 TB/s from L2 cache within 10W power budget.
The problem with computers however is that they have relatively few processing units operating at high frequency, so they end up being limited by speed of fetching weights from memory. They cheat by reusing weights, which reduces neural networks to matrix-matrix multiplications, which cost N3 in computations but only N2 in RAM accesses. If weight reuse is not possible (like in the brain), bandwidth is limited by HBM or even SSD, which are unlikely to provide more than 1 TB/s or 1 GB/s respectively within 10W budget.
Comparison
Brain wins in every category by several orders of magnitude:
- Memory: 100x (SSD) to 1,500x (RAM)
- Compute: 100x
- Bandwidth: 10x (L2 cache) through 1,000x (HBM) to 1,000,000x (SSD)
Since increases in hardware performance are slow and they are getting slower as we approach physical limits, it will probably take several decades for computers to catch up with human brain. Meantime, it is possible to approximate raw performance of human brain by wasting about 20kW on a GPU cluster.