Keysight introduces AI Data Centre Builder

Validates the performance of AI infrastructure by emulating real-world workloads.

Keysight Technologies, introduces Keysight AI (KAI) Data Centre Builder, an advanced software suite that emulates real-world workloads to evaluate how new algorithms, components, and protocols impact the performance of AI training. KAI Data Centre Builder’s workload emulation capability integrates large language model (LLM) and other artificial intelligence (AI) model training workloads into the design and validation of AI infrastructure components – networks, hosts, and accelerators. This solution enables tighter synergy between hardware design, protocols, architectures, and AI training algorithms, boosting system performance.

AI operators use various parallel processing strategies, also known as model partitioning, to accelerate AI model training. Aligning model partitioning with AI cluster topology and configuration enhances training performance. During the AI cluster design phase, critical questions are best answered through experimentation. Many of the questions focus on data movement efficiency between the graphics processing units (GPUs). Key considerations include:

Scale-up design of GPU interconnects inside an AI host or rack

Scale-out network design, including bandwidth per GPU and topology

Configuration of network load balancing and congestion control

Tuning of the training framework parameters

The KAI Data Centre Builder workload emulation solution reproduces network communication patterns of real-world AI training jobs to accelerate experimentation, reduce the learning curve necessary for proficiency, and provide deeper insights into the cause of performance degradation, which is challenging to achieve through real AI training jobs alone. Keysight customers can access a library of LLM workloads like GPT and Llama, with a selection of popular model partitioning schemas like Data Parallel (DP), Fully Sharded Data Parallel (FSDP), and three-dimensional (3D) parallelism.

Using the workload emulation application in the KAI Data Centre Builder enables AI operators to:

Experiment with parallelism parameters, including partition sizes and their distribution over the available AI infrastructure (scheduling)

Understand the impact of communications within and among partitions on overall job completion time (JCT)

Identify low-performing collective operations and drill down to identify bottlenecks

Analyse network utilisation, tail latency, and congestion to understand the impact they have on JCT

The KAI Data Centre Builder's new workload emulation capabilities enable AI operators, GPU cloud providers, and infrastructure vendors to bring realistic AI workloads into their lab setups to validate the evolving designs of AI clusters and new components. They can also experiment to fine-tune model partitioning schemas, parameters, and algorithms to optimise the infrastructure and improve AI workload performance.

Ram Periakaruppan, Vice President and General Manager, Network Test & Security Solutions, Keysight, said: "As AI infrastructure grows in scale and complexity, the need for full-stack validation and optimisation becomes crucial. To avoid costly delays and rework, it's essential to shift validation to earlier phases of the design and manufacturing cycle. KAI Data Centre Builder’s workload emulation brings a new level of realism to AI component and system design, optimising workloads for peak performance.”

KAI Data Centre Builder is the foundation of the Keysight Artificial Intelligence (KAI) architecture, a portfolio of end-to-end solutions designed to help customers scale artificial intelligence processing capacity in data centres by validating AI cluster components using real-world AI workload emulation.

Lawhive's acquisition of Woodstock Legal heralds a new era of AI-augmented legal services focused...
Recent data reveals a slight dip in AI adoption, pointing to strategic reevaluations among major...
Exploring disaggregated inference infrastructure and NVIDIA Rubin CPX's pivotal role in scaling AI...
Exabeam integrates Google Cloud technology to tackle insider threats posed by AI agents.
RingCentral acquires CommunityWFM to boost its contact centre platform with advanced AI workforce...
Zoho expands its UK footprint, investing in local teams and technology, with plans for a new office...
The UK’s AI sector commands record investments, bolstering its status as a global technology...
Eviden launches the JUPITER Booster, Europe's most potent supercomputer, set to revolutionise AI...