Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more
At its re:Invent 2021 conference today, Amazon announced Graviton3, the next generation of its custom ARM-based chip for AI inferencing. Soon to be available in Amazon Web Services’ C7g instances, the company says that the processors are optimized for workloads including high-performance compute, batch processing, media encoding, scientific modeling, ad serving, and distributed analytics.
Alongside Graviton3, Amazon unveiled Trn1, a new instance for training deep learning models in the cloud, including models for apps like image recognition, natural language processing, fraud detection, and forecasting. It’s powered by Trainium, which the company last year claimed would offer the most teraflops of any machine learning instance in the cloud. (A teraflop translates to a chip being able to process 1 trillion calculations a second.)
AWS CEO Adam Selipsky says that Graviton3 is up to 25% faster for general-compute workload and provides two times faster floating point performance for scientific workloads, two times faster performance for cryptographic workloads, and three times faster performance for machine learning workloads. Moreover, the processors use up to 60% less energy for the same performance compared with the previous generation, he said.
Graviton3 also includes a new pointer authentication feature that’s designed to improve security. Before return addresses are pushed on to the stack, they’re first signed with a secret key and additional context information, including the current value of the stack pointer. When the signed addresses are popped off the stack, they’re validated before being used. An exception is raised if the address isn’t valid, thereby blocking attacks that work by overwriting the stack contents with the address of harmful code.
As before, all Graviton processors include dedicated cores and caches for each virtual CPU along with additional security features. C7g instances will be available in multiple sizes, including bare metal, and Amazon claims that they’re first in the cloud industry to be equipped with DDR5 memory.
On the network side, C7g instances will offer up to 30 Gbps of network bandwidth and elastic fabric adapter support.
According to Selipsky, Trn1 delivers up to 800Gbps of networking and bandwidth, making it suited for large-scale, multi-node distributed training use cases. Customers can leverage up to tens of thousands of clusters of Trn1 instances for training models containing trillions of parameters.
Amazon is quoting 30% higher throughput and 45% lower cost-per-inference compared with the standard AWS GPU instances. Trn1 supports popular frameworks including Google’s TensorFlow, Facebook’s PyTorch, and MxNet and uses the same Neuron SDK as Inferentia, the company’s cloud-hosted chip for machine learning inference.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more