Cerebras unveils world’s largest AI training supercomputer with 54M cores

Head over to our on-demand library to view sessions from VB Transform 2023. Register Here


Cerebras Systems, the AI accelerator pioneer, and UAE-based technology holding group G42, have unveiled the world’s largest supercomputer for AI training, named Condor Galaxy.

The network of nine interconnected supercomputers promises to reduce AI model training time significantly, with a total capacity of 36 exaFLOPs, thanks to the first AI supercomputer on the network, Condor Galaxy 1 (CG-1), which has 4 exaFLOPs and 54 million cores, said Andrew Feldman, CEO of Cerebras, in an interview with VentureBeat.

Rather than make individual chips for its centralized processing units (CPUs), Cerebras takes entire silicon wafers and prints its cores on the wafers, which are the size of pizza. These wafers have the equivalent of hundreds of chips on a single wafer, with many cores on each wafer. And that’s how they get to 54 million cores in a single supercomputer.

In our interview, Feldman said, “AI is not just eating the U.S. AI is eating the world. There’s an insatiable demand for compute. Models are proliferating. And data is the new gold. This is the foundation.”

Event

VB Transform 2023 On-Demand

Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.

Register Now

With this supercomputer, you get results twice as fast, using half the energy, said Feldman.

“We’re the largest in the world. We’ve sold it to a company called G42, based in Abu Dhabi. We deployed it in Santa Clara, California and are currently running AI work,” Feldman said. “We manage and operate it through our cloud. It’s used by G42 for internal work and any excess capacity is resold by them or by us. This is the first of three U.S.-based supercomputers we intend to build for them in the next year. And the first nine, we intend to build for them in the next 18 months. And when these nine are connected, that will be a 36 exaflop constellation of supercomputers.”

Cerebras CEO Andrew Feldman with packaged Condor Galaxy

Condor Galaxy is the name of the supercomputer, which scales from one to 32 CS-2 computers made possible by the company’s Memory X and Swarm X technology. The machine was stood up in Santa Clara in 10 days and it’s already one of the largest supercomputers in the world, Feldman said.

The second machine will be in Austin Texas and the third one will be in Asheville, North Carolina. Phase two’s deal value is in excess of $100 million.

“It’s pretty crazy. When we’re done, we will have nine supercomputers, each of four exaFLOPs interconnected to create a distributed 36 exaFLOP AI constellation. That’s nearly 500 million cores across 576 CS-2s with 3,490 terabytes of internal bandwidth. And we will need more than half a billion AMD Epyc cores just to feed us data.”

Condor Galaxy-1 in Colovore side view.

Cerebras and G42 will deploy two more such supercomputers, CG-2 and CG-3, in the U.S. in early 2024. With this unprecedented supercomputing network, they plan to revolutionize AI advancement globally.

Located in Santa Clara, California, CG-1 links 64 Cerebras CS-2 systems together into an easy-to-use AI supercomputer with a training capacity of 4 exaFLOPs, which is offered as a cloud service. CG-1 is designed to enable G42 and its cloud customers to train large, ground-breaking models quickly and easily, thereby accelerating innovation.

The Cerebras-G42 strategic partnership has already advanced state-of-the-art AI models in Arabic bilingual chat, healthcare, and climate studies. CG-1 offers native support for training with long sequence lengths, up to 50,000 tokens out of the box, without any special software libraries. Feldman said that programming CG-1 is done entirely without complex distributed programming languages, and even the largest models can be run without weeks or months spent distributing work over thousands of GPUs.

The partnership between G42 and Cerebras delivers on all three elements required for training large models: huge amounts of compute, vast datasets, and specialized AI expertise. They are democratizing AI, enabling simple and easy access to the industry’s leading AI compute, and G42’s work with diverse datasets across healthcare, energy, and climate studies will enable users of the systems to train new cutting-edge foundational models.

Cerebras and G42 bring together a team of hardware engineers, data engineers, AI scientists, and industry specialists to deliver a full-service AI offering to solve customers’ problems. This combination will produce groundbreaking results and turbocharge hundreds of AI projects globally.

G42 is a conglomerate in Abu Dhabi with 22,000 employees across nine companies in 25 countries.

“Now, if you want to run a same model with 40 billion parameters on 1,000, GPUs, you have to write an additional 27,215 lines of code. Obviously, that’s not easy,” Feldman said. “Now, Cerebras with a 1 billion parameter model takes about 1200 lines of code to put it on one CS-1. But if you want to run a 40 billion parameter model, or 100 billion parameter model, you use the same 1200 lines of code. That’s it. And so you don’t have to write 27,215 lines of code.”

“Now this takes our cloud to a new level where we’re operating and running. We’re making them available through our cloud. We’re offering AI supercomputers a service. If you want normal AI clusters, we have those too. This really takes our cloud to a new level.”

The machine is named after the Condor Galaxy, which is about five times larger than our own Milky Way.

Cereabras now has about 335 people and it’s “hiring like crazy,” Feldman said.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Source