Alibaba’s Shocking New AI Solution Cuts GPU Needs by 82%! What This Means for the Future

Imagine slashing your tech costs by a staggering 82%! Alibaba Group is making waves with a groundbreaking computing pooling solution that promises just that. Known as Aegaeon, this innovative system dramatically reduces the number of Nvidia graphics processing units (GPUs) required to run their artificial intelligence models, and it’s turning heads in the tech world.
After rigorous beta testing for over three months within Alibaba Cloud’s model marketplace, researchers have unveiled that the number of Nvidia H20 GPUs needed to deploy dozens of AI models—some boasting up to 72 billion parameters—plummeted from a hefty 1,192 to just 213. This revelation was highlighted in a research paper presented at the 31st Symposium on Operating Systems Principles (SOSP) held in Seoul, South Korea, leaving tech enthusiasts buzzing with excitement.
According to the research team from Peking University and Alibaba Cloud, “Aegaeon is the first work to reveal the excessive costs associated with serving concurrent LLM workloads on the market.” This is not just about numbers; it's a wake-up call for the industry, showcasing how inefficiencies in resource allocation have persisted.
Alibaba Cloud, the AI and cloud services arm of Alibaba, is at the forefront of this innovation, led by Chief Technology Officer Zhou Jingren, who also contributed to the research paper. Currently, cloud service providers like Alibaba Cloud and ByteDance's Volcano Engine juggle thousands of AI models that users access simultaneously. However, the reality is that only a select few models, including Alibaba’s Qwen and DeepSeek, dominate the spotlight for inference, leaving other models seldom tapped.
Research findings indicate a staggering inefficiency where 17.7% of GPUs were tied up serving a mere 1.35% of requests in Alibaba Cloud’s marketplace. This new pooling strategy promises a more streamlined approach, allowing a single GPU to handle multiple models, which could revolutionize the way cloud services operate.