Compressors#

Altay Sansal

Apr 29, 2024

2 min read

Dataset Compression#

MDIO relies on numcodecs for data compression. We provide good defaults based on opinionated and limited heuristics for each compressor for various energy datasets. However, using these data models, the compression can be customized.

Numcodecs is a project that a convenient interface to different compression libraries. We selected the Blosc and ZFP compressors for lossless and lossy compression of energy data.

Blosc#

A high-performance compressor optimized for binary data, combining fast compression with a byte-shuffle filter for enhanced efficiency, particularly effective with numerical arrays in multi-threaded environments.

For more details about compression modes, see Blosc Documentation.

Blosc

Data Model for Blosc options.

ZFP#

ZFP is a compression algorithm tailored for floating-point and integer arrays, offering lossy and lossless compression with customizable precision, well-suited for large scientific datasets with a focus on balancing data fidelity and compression ratio.

For more details about compression modes, see ZFP Documentation.

ZFP

Data Model for ZFP options.

Model Reference#

Blosc
pydantic model mdio.schemas.compressors.Blosc#

Data Model for Blosc options.

Show JSON schema
{
   "title": "Blosc",
   "description": "Data Model for Blosc options.",
   "type": "object",
   "properties": {
      "name": {
         "default": "blosc",
         "description": "Name of the compressor.",
         "title": "Name",
         "type": "string"
      },
      "algorithm": {
         "allOf": [
            {
               "$ref": "#/$defs/BloscAlgorithm"
            }
         ],
         "default": "lz4",
         "description": "The Blosc compression algorithm to be used."
      },
      "level": {
         "default": 5,
         "description": "The compression level.",
         "maximum": 9,
         "minimum": 0,
         "title": "Level",
         "type": "integer"
      },
      "shuffle": {
         "allOf": [
            {
               "$ref": "#/$defs/BloscShuffle"
            }
         ],
         "default": 1,
         "description": "The shuffle strategy to be applied before compression."
      },
      "blocksize": {
         "default": 0,
         "description": "The size of the block to be used for compression.",
         "title": "Blocksize",
         "type": "integer"
      }
   },
   "$defs": {
      "BloscAlgorithm": {
         "description": "Enum for Blosc algorithm options.",
         "enum": [
            "blosclz",
            "lz4",
            "lz4hc",
            "zlib",
            "zstd"
         ],
         "title": "BloscAlgorithm",
         "type": "string"
      },
      "BloscShuffle": {
         "description": "Enum for Blosc shuffle options.",
         "enum": [
            0,
            1,
            2,
            -1
         ],
         "title": "BloscShuffle",
         "type": "integer"
      }
   },
   "additionalProperties": false
}

field algorithm: BloscAlgorithm = BloscAlgorithm.LZ4#

The Blosc compression algorithm to be used.

field blocksize: int = 0#

The size of the block to be used for compression.

field level: int = 5#

The compression level.

Constraints:
  • ge = 0

  • le = 9

field name: str = 'blosc'#

Name of the compressor.

field shuffle: BloscShuffle = BloscShuffle.SHUFFLE#

The shuffle strategy to be applied before compression.

make_instance()#

Translate parameters to compressor kwargs..


class mdio.schemas.compressors.BloscAlgorithm#

Enum for Blosc algorithm options.

BLOSCLZ = 'blosclz'#
LZ4 = 'lz4'#
LZ4HC = 'lz4hc'#
ZLIB = 'zlib'#
ZSTD = 'zstd'#

class mdio.schemas.compressors.BloscShuffle#

Enum for Blosc shuffle options.

NOSHUFFLE = 0#
SHUFFLE = 1#
BITSHUFFLE = 2#
AUTOSHUFFLE = -1#
ZFP
pydantic model mdio.schemas.compressors.ZFP#

Data Model for ZFP options.

Show JSON schema
{
   "title": "ZFP",
   "description": "Data Model for ZFP options.",
   "type": "object",
   "properties": {
      "name": {
         "default": "zfp",
         "description": "Name of the compressor.",
         "title": "Name",
         "type": "string"
      },
      "mode": {
         "$ref": "#/$defs/ZFPMode"
      },
      "tolerance": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Fixed accuracy in terms of absolute error tolerance.",
         "title": "Tolerance"
      },
      "rate": {
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Fixed rate in terms of number of compressed bits per value.",
         "title": "Rate"
      },
      "precision": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Fixed precision in terms of number of uncompressed bits per value.",
         "title": "Precision"
      },
      "writeHeader": {
         "default": true,
         "description": "Encode array shape, scalar type, and compression parameters.",
         "title": "Writeheader",
         "type": "boolean"
      }
   },
   "$defs": {
      "ZFPMode": {
         "description": "Enum for ZFP algorithm modes.",
         "enum": [
            "fixed_rate",
            "fixed_precision",
            "fixed_accuracy",
            "reversible"
         ],
         "title": "ZFPMode",
         "type": "string"
      }
   },
   "additionalProperties": false,
   "required": [
      "mode"
   ]
}

field mode: ZFPMode [Required]#
field name: str = 'zfp'#

Name of the compressor.

field precision: int | None = None#

Fixed precision in terms of number of uncompressed bits per value.

field rate: float | None = None#

Fixed rate in terms of number of compressed bits per value.

field tolerance: float | None = None#

Fixed accuracy in terms of absolute error tolerance.

field writeHeader: bool = True#

Encode array shape, scalar type, and compression parameters.

make_instance()#

Translate parameters to compressor kwargs..


class mdio.schemas.compressors.ZFPMode#

Enum for ZFP algorithm modes.

FIXED_RATE = 'fixed_rate'#
FIXED_PRECISION = 'fixed_precision'#
FIXED_ACCURACY = 'fixed_accuracy'#
REVERSIBLE = 'reversible'#
property int_code: int#

Return the integer code of ZFP mode.