AI projects are on pace to take over most of the world’s computing
OpenAI has determined that the computing used in Artificial Intelligence projects has doubled every 3 months since 2012. This has been an increase of 300,000 times since 2012 versus 12 times if it was following a Moore’s law doubling every 18 months.
The largest AI projects now use computing resources that cost in the single-digit millions. The world has a total computer hardware budget of $1 trillion per year. AI are about halfway in time from taking over a majority of the world’s hardware. Another 300,000 times increase in resources needed would put AI at needing more of the world’s computing resources. This would take AI projects from single digit millions to hundreds of billions. If dedicated artificial intelligence clouds became about 30% of the overall world hardware then this AI projects would need those resources around 2025. AI would need another 3 months to needing over 50% of the world’s hardware.
AI was using GPU chips which can improve faster then Moore’s law and has shifted to custom ASIC like chips which can be even more energy efficient. AI has also used 16-bit precision chips which can be made faster than higher precision computing. Hyper-efficient AI chips, AI hardware and systems along with quantum computer acceleration could delay the world hardware takeover a few years. The hardware takeover would not happen if value produced by AI systems did not justify or need the additional resources.
Above – The OpenAI chart shows the total amount of compute, in petaflop/s-days, that was used to train selected results that are relatively well known, used a lot of compute for their time, and gave enough information to estimate the compute used. A petaflop/s-day (pfs-day) consists of performing 1015 neural net operations per second for one day, or a total of about 1020 operations. The compute-time product serves as a mental convenience, similar to kW-hr for energy. OpenAU does not measure peak theoretical FLOPS of the hardware but instead try to estimate the number of actual operations performed.
OpenAi calculating the doubling time for line of best fit shown is 3.43 months.
OpenAI identified four eras of AI projects
The can roughly see four distinct eras:
* Before 2012: It was uncommon to use GPUs for ML, making any of the results in the graph difficult to achieve.
* 2012 to 2014: Infrastructure to train on many GPUs was uncommon, so most results used 1-8 GPUs rated at 1-2 TFLOPS for a total of 0.001-0.1 pfs-days.
* 2014 to 2016: Large-scale results used 10-100 GPUs rated at 5-10 TFLOPS, resulting in 0.1-10 pfs-days. Diminishing returns on data parallelism meant that larger training runs had limited value.
* 2016 to 2017: Approaches that allow greater algorithmic parallelism such as huge batch sizes, architecture search, and expert iteration, along with specialized hardware such as TPU’s and faster interconnects, have greatly increased these limits, at least for some applications.
OpenAI sees many reasons why the trend in the graph could continue. Many hardware startups are developing AI-specific chips, some of which claim they will achieve a substantial increase in FLOPS/Watt (which is correlated to FLOPS/$) over the next 1-2 years. There may also be gains from simply reconfiguring hardware to do the same number of operations for less economic cost.
Many of the recent algorithmic innovations described above could be combined multiplicatively. The architecture search and massively parallel SGD could be scaled to larger sizes.
If the AI performance gains are worth it, then the well funded global AI teams will be able to command more resources and will get larger teams to fully use the resources.
Billion dollar projects will not be an issue since that is the scale of exaFLOP supercomputing projects. The billion-dollar AI supercomputer projects will get to several hundreds of times greater performance of regular supercomputers because of the need for lower precision.
China and others are setting up funding tens of billions of dollars for AI projects and resources in the early 2020s.
Around 2021-2023, there will be giant multi-exaFLOP AI supercomputers and there will be some very large AI dedicated cloud networks.