Improving the Efficiency of Interpolation-Based Scientific Data Compressors with Adaptive Quantization Index Prediction

Published in IPDPS, 2025

Abstract

Large-scale scientific simulations produce unprecedented amounts of data using high-performance computing systems, leading to severe problems in data storage, I/O, and communication. To address the data movement challenge, errorcontrolled lossy compression has been proposed to significantly reduce the data size while retaining the data quality. Recently, interpolation-based compressors, including MGARD, SZ3, QoZ, and HPEZ, have stood out due to their efficiency in obtaining relatively high compression ratios with decent compression and decompression throughput. Nevertheless, these methods focus on data decorrelation in the compression pipeline yet overlook the correlation of the quantization indices generated after decorrelation. In this paper, we develop a generic framework that can use the correlation of quantization indices to significantly improve the compression ratios for state-of-the-art interpolation-based error-bounded lossy compressors. Our contributions are threefold: (1) We carefully characterized the quantization index array produced by the interpolation-based compressors and identified the unused correlation; (2) We designed a generic quantization index prediction method to exploit such correlation, which leads to improved compression ratio with only minor degradation in throughput; (3) We integrate our method into 4 state-of-theart interpolation-based compressors and evaluate them using 5 real-world datasets. Experimental results demonstrate that the proposed method improves the compression ratios of the base compressors by up to 95% while keeping the same quality. It also leads to 16% improvement in end-to-end data transfer performance under a parallel setting.