Data Types#

Altay Sansal

Apr 29, 2024

5 min read

Scalar Type#

Scalar types are used to represent numbers and boolean values in MDIO arrays.

ScalarType

Scalar array data type.

These numbers can be integers (whole numbers without a decimal point, like 1, -15, 204), floating-point numbers (numbers with a fractional part, like 3.14, -0.001, 2.71828) in various 16-64 bit formats like float32 etc.

It is important to choose the right type for the content of the data for type safety, memory efficiency, performance, and accuracy of the numbers represented. Most scientific datasets are float16, float32, or float64 values. However, there are many good use cases for integer and complex values as well.

The ScalarTypes MDIO supports can be viewed below with the tabs.

Data Type

Options

Example Value

bool

False, True

True

Data Type

Range

Example Value

int8

-128 to 127

45

int16

-32,768 to 32,767

1,234

int32

-2,147,483,648 to 2,147,483,647

2,024

int64

-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

987,654,321

Data Type

Range

Example Value

uint8

0 to 255

200

uint16

0 to 65,535

50,000

uint32

0 to 4,294,967,295

3,000,000

uint64

0 to 18,446,744,073,709,551,615

5,000,000,000

Data Type

Range

Example Value

float16

-65,504 to 65,504

10.10

float32

-3.4028235e+38 to 3.4028235e+38

0.1234567

float64

-1.7976931348623157e+308 to 1.7976931348623157e+308

3.1415926535897932

Precision

  • float16: 2 decimal places

  • float32: 7 decimal places

  • float32: 16 decimal places

Data Type

Range

Example Value

complex64

-3.4028235e+38 to 3.4028235e+38

3.14+2.71j

complex128

-1.7976931348623157e+308 to 1.7976931348623157e+308

2.71828+3.14159j

Ranges are for both real and imaginary parts.

Structured Type#

Structured data type organizes and stores data in a fixed arrangement, allowing memory efficient access and manipulation.

StructuredType

Structured array type with packed fields.

StructuredField

Structured array field with name, format.

Structured data types are an essential component in handling complex data structures, particularly in specialized domains like seismic data processing for subsurface imaging applications. These data types allow for the organization of heterogeneous data into a single, structured format.

They are designed to be memory-efficient, which is vital for handling large seismic datasets. Structured data types are adaptable, allowing for the addition or modification of fields.

A StructuredType consists of StructuredFields. Fields can be different numeric types, and each represent a specific attribute of the seismic data, like coordinate, line numbers, and time stamps.

Each StructuredField must specify a name and a data format (format).

All the structured fields will be packed and there will be no gaps between them.

Examples#

The table below illustrate ScalarType ranges and shows an example each type.

Variable foo with type float32.

{
  "name": "foo",
  "dataType": "float32",
  "dimensions": ["x", "y"]
}

Variable bar with type uint8.

{
  "name": "bar",
  "dataType": "uint8",
  "dimensions": ["x", "y"]
}

Below are a couple examples of StructuredType with varying lengths.

We can specify a variable named headers that holds a 32-byte struct with four int32 values.

{
  "name": "headers",
  "dataType": {
    "fields": [
      { "name": "cdp-x", "format": "int32" },
      { "name": "cdp-y", "format": "int32" },
      { "name": "inline", "format": "int32" },
      { "name": "crossline", "format": "int32" }
    ]
  },
  "dimensions": ["inline", "crossline"]
}

This will yield an in-memory or on-disk struct that looks like this (for each element):

 ←─ 4 ─→ ←─ 4 ─→ ←─ 4 ─→ ←─ 4 ─→  = 16-bytes
┌───────┬───────┬───────┬───────┐
│ int32  int32  int32  int32   (next sample)
└───────┴───────┴───────┴───────┘
  └→ cdp-x └→ cdp-y └→ inline └→crossline

The below example shows mixing different data types.

{
  "name": "headers",
  "dataType": {
    "fields": [
      { "name": "cdp", "format": "uint32" },
      { "name": "offset", "format": "int16" },
      { "name": "cdp-x", "format": "float64" },
      { "name": "cdp-y", "format": "float64" }
    ]
  },
  "dimensions": ["inline", "crossline"]
}

This will yield an in-memory or on-disk struct that looks like this (for each element):

 ←── 4 ──→  2  ←─── 8 ───→ ←─── 8 ───→  = 24-bytes
┌─────────┬─────┬───────────┬───────────┐
│  int32  ╎int16╎  float64    float64    (next sample)
└─────────┴─────┴───────────┴───────────┘
    └→ cdp  └→ offset └→ cdp-x    └→ cdp-y

Model Reference#

Scalar Types
class mdio.schemas.dtype.ScalarType#

Scalar array data type.

BOOL = 'bool'#
INT8 = 'int8'#
INT16 = 'int16'#
INT32 = 'int32'#
INT64 = 'int64'#
UINT8 = 'uint8'#
UINT16 = 'uint16'#
UINT32 = 'uint32'#
UINT64 = 'uint64'#
FLOAT16 = 'float16'#
FLOAT32 = 'float32'#
FLOAT64 = 'float64'#
LONGDOUBLE = 'longdouble'#
COMPLEX64 = 'complex64'#
COMPLEX128 = 'complex128'#
CLONGDOUBLE = 'clongdouble'#
Structured Type
pydantic model mdio.schemas.dtype.StructuredType#

Structured array type with packed fields.

Show JSON schema
{
   "title": "StructuredType",
   "description": "Structured array type with packed fields.",
   "type": "object",
   "properties": {
      "fields": {
         "items": {
            "$ref": "#/$defs/StructuredField"
         },
         "title": "Fields",
         "type": "array"
      }
   },
   "$defs": {
      "ScalarType": {
         "description": "Scalar array data type.",
         "enum": [
            "bool",
            "int8",
            "int16",
            "int32",
            "int64",
            "uint8",
            "uint16",
            "uint32",
            "uint64",
            "float16",
            "float32",
            "float64",
            "longdouble",
            "complex64",
            "complex128",
            "clongdouble"
         ],
         "title": "ScalarType",
         "type": "string"
      },
      "StructuredField": {
         "additionalProperties": false,
         "description": "Structured array field with name, format.",
         "properties": {
            "format": {
               "$ref": "#/$defs/ScalarType"
            },
            "name": {
               "title": "Name",
               "type": "string"
            }
         },
         "required": [
            "format",
            "name"
         ],
         "title": "StructuredField",
         "type": "object"
      }
   },
   "additionalProperties": false,
   "required": [
      "fields"
   ]
}

field fields: list[StructuredField] [Required]#

pydantic model mdio.schemas.dtype.StructuredField#

Structured array field with name, format.

Show JSON schema
{
   "title": "StructuredField",
   "description": "Structured array field with name, format.",
   "type": "object",
   "properties": {
      "format": {
         "$ref": "#/$defs/ScalarType"
      },
      "name": {
         "title": "Name",
         "type": "string"
      }
   },
   "$defs": {
      "ScalarType": {
         "description": "Scalar array data type.",
         "enum": [
            "bool",
            "int8",
            "int16",
            "int32",
            "int64",
            "uint8",
            "uint16",
            "uint32",
            "uint64",
            "float16",
            "float32",
            "float64",
            "longdouble",
            "complex64",
            "complex128",
            "clongdouble"
         ],
         "title": "ScalarType",
         "type": "string"
      }
   },
   "additionalProperties": false,
   "required": [
      "format",
      "name"
   ]
}

field format: ScalarType [Required]#
field name: str [Required]#