Cerebras launches new AI supercomputing processor with 2.6 trillion transistors

Join Transform 2021 this July 12-16. Register for the AI event of the year.


Cerebras Systems has unveiled its new Wafer Scale Engine 2 processor with a record-setting 2.6 trillion transistors and 850,000 AI-optimized cores. It’s built for supercomputing tasks, and it’s the second time since 2019 that Los Altos, California-based Cerebras has unveiled a chip that is basically an entire wafer.

Chipmakers normally slice a wafer from a 12-inch-diameter ingot of silicon to process in a chip factory. Once processed, the wafer is sliced into hundreds of separate chips that can be used in electronic hardware.

But Cerebras, started by SeaMicro founder Andrew Feldman, takes that wafer and makes a single, massive chip out of it. Each piece of the chip, dubbed a core, is interconnected in a sophisticated way to other cores. The interconnections are designed to keep all the cores functioning at high speeds so the transistors can work together as one.

Twice as good as the CS-1

Above: Comparing the CS-1 to the biggest GPU.

Image Credit: Cerebras

In 2019, Cerebras could fit 400,000 cores and 1.2 billion transistors on a wafer chip, the CS-1. It was built with a 16-nanometer manufacturing process. But the new chip is built with a high-end 7-nanometer process, meaning the width between circuits is seven billionths of a meter. With such miniaturization, Cerebras can cram a lot more transistors in the same 12-inch wafer, Feldman said. It cuts that circular wafer into a square that is eight inches by eight inches, and ships the device in that form.

“We have 123 times more cores and 1,000 times more memory on chip and 12,000 times more memory bandwidth and 45,000 times more fabric bandwidth,” Feldman said in an interview with VentureBeat. “We were aggressive on scaling geometry, and we made a set of microarchitecture improvements.”

Now Cerebras’ WSE-2 chip has more than twice as many cores and transistors. By comparison the largest graphics processing unit (GPU) has only 54 billion transistors — 2.55 trillion fewer transistors than the WSE-2. The WSE-2 also has 123 times more cores and 1,000 times more high performance on-chip high memory than GPU competitors. Many of the Cerebras cores are redundant in case one part fails.

“This is a great achievement, especially when considering that the world’s third largest chip is 2.55 trillion transistors smaller than the WSE-2,” said Linley Gwennap, principal analyst at The Linley Group, in a statement.

Feldman half-joked that this should prove that Cerebras is not a one-trick pony.

“What this avoids is all the complexity of trying to tie together lots of little things,” Feldman said. “When you have to build a cluster of GPUs, you have to spread your model across multiple nodes. You have to deal with device memory sizes and memory bandwidth constraints and communication and synchronization overheads.”

The CS-2’s specs

Above: TSMC put the CS-1 in a chip museum.

Image Credit: Cerebras

The WSE-2 will power the Cerebras CS-2, the industry’s fastest AI computer, designed and optimized for 7 nanometers and beyond. Manufactured by contract manufacturer TSMC, the WSE-2 more than doubles all performance characteristics on the chip — the transistor count, core count, memory, memory bandwidth, and fabric bandwidth — over the first generation WSE. The result is that on every performance metric, the WSE-2 is orders of magnitude larger and more performant than any competing GPU on the market, Feldman said.

TSMC put the first WSE-1 chip in a museum of innovation for chip technology in Taiwan.

“Cerebras does deliver the cores promised,” Patrick Moorhead, an analyst at Moor Insights & Strategy. “What the company is delivering is more along the lines of multiple clusters on a chip. It does appear to give Nvidia a run for its money but doesn’t run raw CUDA. That has become somewhat of a de facto standard. Nvidia solutions are more flexible as well as they can fit into nearly any server chassis.”

With every component optimized for AI work, the CS-2 delivers more compute performance at less space and less power than any other system, Feldman said. Depending on workload, from AI to high-performance computing, CS-2 delivers hundreds or thousands of times more performance than legacy alternatives, and it does so at a fraction of the power draw and space.

A single CS-2 replaces clusters of hundreds or thousands of graphics processing units (GPUs) that consume dozens of racks, use hundreds of kilowatts of power, and take months to configure and program. At only 26 inches tall, the CS-2 fits in one-third of a standard datacenter rack.

“Obviously, there are companies and entities interested in Cerebras’ wafer-scale solution for large data sets,” said Jim McGregor, principal analyst at Tirias Research, in an email. “But, there are many more opportunities at the enterprise level for the millions of other AI applications and still opportunities beyond what Cerebras could handle, which is why Nvidia has the SuprPod and Selene supercomputers.”

He added, “You also have to remember that Nvidia is targeting everything from AI robotics with Jenson to supercomputers. Cerebras is more of a niche platform. It will take some opportunities but will not match the breadth of what Nvidia is targeting. Besides, Nvidia is selling everything they can build.”

Lots of customers

Above: Comparing the new Cerebras chip to its rival, the Nvidia A100.

Image Credit: Cerebras

And the company has proven itself by shipping the first generation to customers. Over the past year, customers have deployed the Cerebras WSE and CS-1, including Argonne National Laboratory; Lawrence Livermore National Laboratory; Pittsburgh Supercomputing Center (PSC) for its Neocortex AI supercomputer; EPCC, the supercomputing center at the University of Edinburgh; pharmaceutical leader GlaxoSmithKline; Tokyo Electron Devices; and more. Customers praising the chip include those at GlaxoSmithKline and the Argonne National Laboratory.

Kim Branson, senior vice president at GlaxoSmithKline, said in a statement that the company has increased the complexity of the encoder models it generates while decreasing training time by 80 times. At Argonne, the chip is being used for cancer research and has reduced the experiment turnaround time on cancer models by more than 300 times.

“For drug discovery, we have other wins that we’ll be announcing over the next year in heavy manufacturing and pharma and biotech and military,” Feldman said.

The new chips will ship in the third quarter. Feldman said the company now has more than 300 engineers, with offices in Silicon Valley, Toronto, San Diego, and Tokyo.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Source

Leave a Comment