Papers and reports

What we’ve published.

2026

Talks

Toward HPC on AI Hardware: SpMV and Sparse Kernels on the Cerebras Wafer-Scale Engine
Johannes Gebert, Jonathan Schäfer, Daniel Renschler — HLRS, Germany
2026 SIAM Conference on Parallel Processing for Scientific Computing (PP26) · Programme entry

Abstract

Wafer-scale architectures, such as the Cerebras Wafer-Scale Engine (WSE), offer tight integration of compute and memory, making them compelling candidates for converged HPC–AI systems. However, realizing their potential for traditional HPC workloads requires rethinking the implementation of core numerical kernels and is limited by inter-node bandwidth constraints. We present initial efforts to map sparse linear algebra operations onto the WSE, focusing on sparse matrix–vector multiplication (SpMV). Furthermore, we describe our parallelization strategy and on-wafer communication mechanisms. Performance results on current Cerebras hardware characterize the suitability of wafer-scale architectures for sparse workloads and identify optimization opportunities, representing a first step toward full HPC solver deployment on AI-optimized hardware.

Posters

SpMV for the Cerebras Wafer-Scale Engine
Daniel Renschler, Jonathan Schäfer, Johannes Gebert, Mark Parsons — HLRS Stuttgart · EPCC Edinburgh
ISC High Performance 2026, Hamburg · Poster (PDF)

Abstract

Sparse Basic Linear Algebra Subprograms (BLAS) routines are a cornerstone of high-performance scientific computing, providing highly optimized, standardized operations for manipulating sparse matrices and vectors. One particularly important BLAS routine is the Sparse Matrix-Vector Product (SpMV).

BLAS routines, especially the SpMV, are essential in traditional applications such as finite element analysis (FEA) and computational fluid dynamics (CFD), as well as in applications where large, sparse systems of linear equations dominate the computational workload, including an increasing number of specialized and emerging algorithms.

On the hardware side, today’s and tomorrow’s high-performance computing architectures are increasingly leaning towards artificial intelligence and machine learning workloads, including systems based on domain-specific accelerators such as those developed by Cerebras and other AI-focused vendors. While these clusters are often marketed for AI applications, computing centers extend their usage beyond deep learning: they also accelerate traditional simulation-driven workloads, hybrid workflows, and data-intensive tasks that benefit from the parallelism and memory hierarchies optimized for AI.

Our poster presents an SpMV implementation for the AI-specific Cerebras hardware that supports arbitrary sparsity patterns without matrix reordering. We present initial scaling results and discuss possible improvements.