Abstract: The Transformer architecture, despite its scaling law, faces expensive computational cost challenges as the number of parameters increases. Quantization methods like Ternary-BERT and BitNet ...
Abstract: Exploiting matrix symmetry to halve memory footprint offers an opportunity for accelerating memory-bound computations like Sparse Matrix-Vector Multiplication (SpMV). However, symmetric SpMV ...