Bloscpack: a compressed lightweight serialization format for numerical data

Valentin Haenel

This paper introduces the Bloscpack file format and the accompanying Python reference implementation. Bloscpack is a lightweight, compressed binary file-format based on the Blosc codec and is designed for lightweight, fast serialization of numerical data. This article presents the features of the file-format and some some API aspects of the reference implementation, in particular the ability to handle Numpy ndarrays. Furthermore, in order to demonstrate its utility, the format is compared both feature- and performance-wise to a few alternative lightweight serialization solutions for Numpy ndarrays. The performance comparisons take the form of some comprehensive benchmarks over a range of different artificial datasets with varying size and complexity, the results of which are presented as the last section of this article.

