AI Scaling Laws Guide Building to Superhuman Level AI
Scaling laws are as important to artificial intelligence (AI) as the law of gravity is in the world around us. Cerebras makes wafer scale chips that are optimized for AI. Cerebras wafer chips can host large language models (LLMs).They are using open-source data that can be reproduced by developers across the world.
James Wang was an ARK Invest analyst. James is now a product marketing specialist at Cerebras.
James is interviewed about LLM development and why the generative pre-trained transformer (GPT) innovation taking place in this field is like nothing that has ever come before it (and has seemingly limitless possibilities). He also explains the motivation behind Cerebras’ unique approach and the benefits that their architecture and models are providing to developers.
What James believes to be the most significant natural law that has been discovered in this century?
OpenAI found that the large language models had performance scaling across seven orders of magnitude. They made the AI models 10 million times bigger and performance scaled.
The anti-law to this was the Deepmind Chinchilla paper. This said that your LLM models are optimial with a ratio of 20 tokens per parameter. This was a hugely influential paper (March 2022). Instead of more and more parameters, there was a race to more and more tokens.
Why Cerebras wants to get state-of-the-art LLM data into the hands of as many people as possible.
Cerebras made all of the state of art AI (LLM) work open source.
Cerebras has confirmed the transferability of scaling laws to tasks. This enables to determine how much compute and training is needed to achieve human or super human level performance. This also enables to design and load adequate AI performance into an iPhone, a laptop or an edge computing device.
The Cerebras CS2 is optimized for the training problem. The Cerebras chips enable people to work and train on trillion parameter models without a bunch of problems and delays. This simplifies the training. They re-architected the wafer chips so that compute is independent of memory size. They can arbitrarily large language models without blowing up the chip. They can pair large compute with petabytes of memory.
– How the Cerebras approach differs from the approach taken by other companies (NVIDIA and Dojo, for example).
– Cerebras offerings that are available to be used by the public.
– The current GPU cloud shortage (and why this adds to the appeal of Cerebras software).
– Why the progress being made in the GPT space is incomparable to developments that have come before it.
– Potential directions that the world could be heading in as a result of AI developments (and why James is optimistic about it all).
– The AI use case that is keeping James up at night.
Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.
Source: Next Big Future