API Reference¶

Data Converters¶

Seismic Data¶

Note

By default, the SEG-Y ingestion tool uses Python’s multiprocessing to speed up parsing the data. This almost always requires a __main__ guard on any other Python code that is executed directly like python file.py. When running inside Jupyter, this is NOT needed.

if __name__ == "__main__":
    segy_to_mdio(...)

When the CLI is invoked, this is already handled.

See the official multiprocessing documentation here and here.

Conversion from SEG-Y to MDIO v1 format.

mdio.converters.segy.segy_to_mdio(segy_spec, mdio_template, input_path, output_path, overwrite=False, grid_overrides=None, segy_header_overrides=None)¶

A function that converts a SEG-Y file to an MDIO v1 file.

Ingest a SEG-Y file according to the segy_spec. This could be a spec from registry or custom.

Parameters:

segy_spec (SegySpec) – The SEG-Y specification to use for the conversion.
mdio_template (AbstractDatasetTemplate) – The MDIO template to use for the conversion.
input_path (UPath | Path | str) – The universal path of the input SEG-Y file.
output_path (UPath | Path | str) – The universal path for the output MDIO v1 file.
overwrite (bool) – Whether to overwrite the output file if it already exists. Defaults to False.
grid_overrides (GridOverrides | dict[str, Any] | None) – Option to add grid overrides. Prefer a mdio.GridOverrides instance; dict is still accepted (deprecated as of 1.2, planned for removal in a future release) but logs a deprecation warning.
segy_header_overrides (SegyHeaderOverrides | None) – Option to override specific SEG-Y headers during ingestion.

Return type:

None

Conversion from to MDIO various other formats.

mdio.converters.mdio.mdio_to_segy(segy_spec, input_path, output_path, selection_mask=None, client=None)¶

Convert MDIO file to SEG-Y format.

We export N-D seismic data to the flattened SEG-Y format used in data transmission.

The input headers are preserved as is, and will be transferred to the output file.

Input MDIO can be local or cloud based. However, the output SEG-Y will be generated locally.

A selection_mask can be provided (same shape as spatial grid) to export a subset.

Parameters:

segy_spec (SegySpec) – The SEG-Y specification to use for the conversion.
input_path (UPath | Path | str) – Store or URL (and cloud options) for MDIO file.
output_path (UPath | Path | str) – Path to the output SEG-Y file.
selection_mask (np.ndarray) – Array that lists the subset of traces
client (distributed.Client) – Dask client. If None we will use local threaded scheduler. If auto is used we will create multiple processes (with 8 threads each).

Raises:

ImportError – if distributed package isn’t installed but requested.
ValueError – if cut mask is empty, i.e. no traces will be written.

Return type:

None

Examples

To export an existing local MDIO file to SEG-Y we use the code snippet below. This will export the full MDIO (without padding) to SEG-Y format.

>>> from upath import UPath
>>> from mdio import mdio_to_segy
>>>
>>> input_path = UPath("prefix2/file.mdio")
>>> output_path = UPath("prefix/file.segy")
>>> mdio_to_segy(input_path, output_path)

Grid Overrides¶

pydantic model mdio.GridOverrides¶

Type-safe configuration for grid override operations during SEG-Y ingestion.

Show JSON schema

{
   "title": "GridOverrides",
   "description": "Type-safe configuration for grid override operations during SEG-Y ingestion.",
   "type": "object",
   "properties": {
      "AutoChannelWrap": {
         "default": false,
         "description": "Streamer: auto-detect channel-wrap geometry (Type A vs B).",
         "title": "Autochannelwrap",
         "type": "boolean"
      },
      "AutoShotWrap": {
         "default": false,
         "description": "Streamer: derive dense shot_index from interleaved shot_point values.",
         "title": "Autoshotwrap",
         "type": "boolean"
      },
      "CalculateShotIndex": {
         "default": false,
         "description": "OBN: derive dense shot_index from sparse shot_point values per shot_line.",
         "title": "Calculateshotindex",
         "type": "boolean"
      },
      "NonBinned": {
         "default": false,
         "description": "Collapse selected dims into a single trace dimension without spatial binning.",
         "title": "Nonbinned",
         "type": "boolean"
      },
      "HasDuplicates": {
         "default": false,
         "description": "Add a trace dimension (chunksize 1) to disambiguate duplicate trace indices.",
         "title": "Hasduplicates",
         "type": "boolean"
      },
      "chunksize": {
         "anyOf": [
            {
               "exclusiveMinimum": 0,
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Chunk size for the trace dimension when `non_binned` is True.",
         "title": "Chunksize"
      },
      "non_binned_dims": {
         "anyOf": [
            {
               "items": {
                  "type": "string"
               },
               "type": "array"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Dimension names to collapse into the trace dimension when `non_binned` is True.",
         "title": "Non Binned Dims"
      }
   },
   "additionalProperties": false
}

field AutoChannelWrap: bool = False¶: Streamer: auto-detect channel-wrap geometry (Type A vs B).

field AutoShotWrap: bool = False¶: Streamer: derive dense shot_index from interleaved shot_point values.

field CalculateShotIndex: bool = False¶: OBN: derive dense shot_index from sparse shot_point values per shot_line.

field chunksize: int | None = None¶

Chunk size for the trace dimension when non_binned is True.

Constraints:

gt = 0

field HasDuplicates: bool = False¶: Add a trace dimension (chunksize 1) to disambiguate duplicate trace indices.

field NonBinned: bool = False¶: Collapse selected dims into a single trace dimension without spatial binning.

field non_binned_dims: list[str] | None = None¶: Dimension names to collapse into the trace dimension when non_binned is True.

to_legacy_dict()¶

Dump to the legacy CamelCase dict shape stored in dataset metadata.

Return type:: dict[str, Any]

Core Functionality¶

Dimensions¶

Dimension (grid) abstraction and serializers.

class mdio.core.dimension.Dimension(coords, name)¶

Dimension class.

Dimension has a name and coordinates associated with it. The Dimension coordinates can only be a vector.

Parameters:

coords (list | tuple | NDArray | range) – Vector of coordinates.
name (str) – Name of the dimension.

coords¶

Vector of coordinates.

Type:: list | tuple | NDArray | range

name¶

Name of the dimension.

Type:: str

max()¶

Get maximum value of dimension.

Return type:: NDArray[float]

min()¶

Get minimum value of dimension.

Return type:: NDArray[float]

property size: int¶: Size of the dimension.

Optimization¶

Optimize MDIO seismic datasets for fast access patterns using ZFP compression and Dask.

This module provides tools to create compressed, rechunked transpose views of seismic data for efficient access along dataset dimensions. It uses configurable ZFP compression based on data statistics and supports parallel processing with Dask Distributed.

pydantic model mdio.optimize.access_pattern.OptimizedAccessPatternConfig¶

Configuration for fast access pattern optimization.

Show JSON schema

{
   "title": "OptimizedAccessPatternConfig",
   "description": "Configuration for fast access pattern optimization.",
   "type": "object",
   "properties": {
      "optimize_dimensions": {
         "additionalProperties": {
            "items": {
               "type": "integer"
            },
            "type": "array"
         },
         "description": "Optimize dims and desired chunks.",
         "title": "Optimize Dimensions",
         "type": "object"
      },
      "processing_chunks": {
         "additionalProperties": {
            "type": "integer"
         },
         "description": "Chunk sizes for processing the original variable.",
         "title": "Processing Chunks",
         "type": "object"
      },
      "compressor": {
         "anyOf": [
            {
               "$ref": "#/$defs/Blosc"
            },
            {
               "$ref": "#/$defs/ZFP"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Compressor to use for access patterns.",
         "title": "Compressor"
      }
   },
   "$defs": {
      "Blosc": {
         "additionalProperties": false,
         "description": "Data Model for Blosc options.",
         "properties": {
            "name": {
               "default": "blosc",
               "description": "Name of the compressor.",
               "title": "Name",
               "type": "string"
            },
            "cname": {
               "$ref": "#/$defs/BloscCname",
               "default": "zstd",
               "description": "Compression algorithm name."
            },
            "clevel": {
               "default": 5,
               "description": "Compression level (integer 0\u20139)",
               "maximum": 9,
               "minimum": 0,
               "title": "Clevel",
               "type": "integer"
            },
            "shuffle": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/BloscShuffle"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Shuffling mode before compression."
            },
            "typesize": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "The size in bytes that the shuffle is performed over.",
               "title": "Typesize"
            },
            "blocksize": {
               "default": 0,
               "description": "The size (in bytes) of blocks to divide data before compression.",
               "title": "Blocksize",
               "type": "integer"
            }
         },
         "title": "Blosc",
         "type": "object"
      },
      "BloscCname": {
         "description": "Enum for compression library used by blosc.",
         "enum": [
            "lz4",
            "lz4hc",
            "blosclz",
            "zstd",
            "snappy",
            "zlib"
         ],
         "title": "BloscCname",
         "type": "string"
      },
      "BloscShuffle": {
         "description": "Enum for shuffle filter used by blosc.",
         "enum": [
            "noshuffle",
            "shuffle",
            "bitshuffle"
         ],
         "title": "BloscShuffle",
         "type": "string"
      },
      "ZFP": {
         "additionalProperties": false,
         "description": "Data Model for ZFP options.",
         "properties": {
            "name": {
               "default": "zfp",
               "description": "Name of the compressor.",
               "title": "Name",
               "type": "string"
            },
            "mode": {
               "$ref": "#/$defs/ZFPMode"
            },
            "tolerance": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Fixed accuracy in terms of absolute error tolerance.",
               "title": "Tolerance"
            },
            "rate": {
               "anyOf": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Fixed rate in terms of number of compressed bits per value.",
               "title": "Rate"
            },
            "precision": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Fixed precision in terms of number of uncompressed bits per value.",
               "title": "Precision"
            }
         },
         "required": [
            "mode"
         ],
         "title": "ZFP",
         "type": "object"
      },
      "ZFPMode": {
         "description": "Enum for ZFP algorithm modes.",
         "enum": [
            "fixed_rate",
            "fixed_precision",
            "fixed_accuracy",
            "reversible"
         ],
         "title": "ZFPMode",
         "type": "string"
      }
   },
   "required": [
      "optimize_dimensions",
      "processing_chunks"
   ]
}

field compressor: Blosc | ZFP | None = None¶: Compressor to use for access patterns.

field optimize_dimensions: dict[str, tuple[int, ...]] [Required]¶: Optimize dims and desired chunks.

field processing_chunks: dict[str, int] [Required]¶: Chunk sizes for processing the original variable.

mdio.optimize.access_pattern.optimize_access_patterns(dataset, config, n_workers=1, threads_per_worker=1)¶

Optimize MDIO dataset for fast access along dimensions.

Optimize an MDIO dataset by creating compressed, rechunked views for fast access along configurable dimensions, then append them to the existing MDIO file.

This uses ZFP compression with tolerance based on data standard deviation and the provided quality level. Requires Dask Distributed for parallel execution. It will try to grab the existing distributed.Client or create its own. Existing Client will be kept running after optimization.

Parameters:

dataset (Dataset) – MDIO Dataset containing the seismic data.
config (OptimizedAccessPatternConfig) – Configuration object with quality, access patterns, and processing chunks.
n_workers (int) – Number of Dask workers. Default is 1.
threads_per_worker (int) – Threads per Dask worker. Default is 1.

Raises:

ValueError – If required attrs/stats are missing or the dataset is invalid.

Return type:

None

Examples

For Post-Stack 3D seismic data, we can optimize the inline, crossline, and depth dimensions.

>>> from mdio import optimize_access_patterns, OptimizedAccessPatternConfig
>>> from mdio import open_mdio
>>>
>>> conf = OptimizedAccessPatternConfig(
>>>     optimize_dimensions={
>>>         "inline": (4, 512, 512),
>>>         "crossline": (512, 4, 512),
>>>         "time": (512, 512, 4),
>>>     },
>>>     processing_chunks= {"inline": 512, "crossline": 512, "time": 512}
>>> )
>>>
>>> ds = open_mdio("/path/to/seismic.mdio")
>>> optimize_access_patterns(ds, conf, n_workers=4)