tnzr.org

Projects

An online textbook guiding readers from AArch64 assembly all the way to a primitive-based tensor compiler. Covers Arm Neon and the Scalable Matrix Extension, just-in-time code generation, and real hardware.

tensorscompilerJITAArch64textbook

xdna

Hello XDNA

High-performance tensor kernels on AMD's XDNA NPU

Explores AMD XDNA1 and XDNA2 NPU microarchitectures and demonstrates writing tensor contraction kernels for them. Covers the VLIW ISA, BF16/BFP16 matrix instructions, and kernels reaching up to 1760 BFP16 GFLOPS.

XDNANPUVLIWHPCAI

sme

Hello SME

Benchmarking Apple M4 using the Scalable Matrix Extension

Documents SME microbenchmarks on Apple M4 — the first publicly available silicon with SME support. Achieves 1833 GFLOPS for an FP32 GEMM, and covers JIT primitive generation and upstreaming into LIBXSMM. Companion to an SC'24 paper.

SMEGEMMJITApple M4HPC

Apps

etops

Python API exposing the Tiled Execution IR (TEIR)

A Python package built on top of the einsum_ir C++ backend. Provides a Pythonic API to define, configure, optimize, and execute complex tensor contractions and elementwise operations. Supports dimension fusion, splitting, multiple backends, and a built-in contraction optimizer.

PythontensorsTEIRHPClibrary

etops-cfg

etops Config Generator

GUI for building tensor operation configurations

A self-contained single-page app that generates etops.TensorOperationConfig objects for the etops Python package. Specify backends, data types, primitive types, dimension types, execution modes, and strides — then export the encoded config string.

tensorscompilerReacttools

einsum-vis

Einsum Visualizer

Interactive GUI for tensor contraction trees

A browser-based tool for constructing and inspecting Einstein summation contraction trees. Supports drag-and-drop reordering of tensor indices, dimension type annotation (C/M/N/K), permutation node insertion, and responsive layouts for mobile and desktop.

einsumvisualizationtensorsReact

gemm

GEMM Benchmark Viewer

Interactive GEMM performance analysis from CSV data

A browser-based tool for visualizing GEMM benchmark files. Renders interactive D3.js line charts of GFLOPS vs. matrix dimensions, supports dimension filtering, overlays user-defined performance models, and exports charts to SVG or PDF.

GEMMbenchmarkingvisualizationD3.js