Skip to content

Dataset Structure

ITKIT assumes and supports a specific dataset structure for all operations. This standardized format ensures consistency and simplifies preprocessing workflows.

Standard Dataset Format

The following dataset structure is supported and assumed in all ITKIT operations:

dataset_root/
├── image/
│   ├── case001.mha
│   ├── case002.mha
│   ├── case003.mha
│   └── ...
├── label/
│   ├── case001.mha
│   ├── case002.mha
│   ├── case003.mha
│   └── ...
└── meta.json

Key Requirements

Folder Structure

  • image/: Contains all image files
  • label/: Contains all corresponding label/segmentation files
  • meta.json: (Optional) Metadata file with dataset information

File Naming

  • Image and label files must have matching names (e.g., case001.mha in both image/ and label/ folders)
  • File extensions can be: .mha, .mhd, .nii, .nii.gz, or .nrrd

File Pairing

ITKIT operations expect paired image-label samples. Each image file in the image/ folder should have a corresponding label file with the same name in the label/ folder.

Supported File Formats

ITKIT supports the following medical image formats:

  • MetaImage (.mha): Single file format (recommended)
  • MetaImage Header (.mhd): Header + separate .raw data file
  • NIfTI (.nii): Uncompressed NIfTI format
  • NIfTI Compressed (.nii.gz): Compressed NIfTI format (common)
  • NRRD (.nrrd): Nearly Raw Raster Data format

Metadata File (meta.json)

The optional meta.json file can contain dataset-level metadata:

{
  "dataset_name": "MyDataset",
  "modality": "CT",
  "num_samples": 100,
  "spacing": [1.0, 1.0, 1.0],
  "labels": {
    "0": "background",
    "1": "liver",
    "2": "tumor"
  }
}

This file is automatically generated by some ITKIT commands (e.g., itk_resample, itk_convert).

Image and Label Requirements

Image Files

  • Can be 2D or 3D medical images
  • Supported data types: int16, uint16, float32, float64
  • Must have spatial metadata (spacing, origin, direction)

Label Files

  • Must have the same dimensions as the corresponding image
  • Typically integer types (uint8, int16, uint16)
  • Values represent class indices (0 = background, 1+ = structures)
  • Must have matching spatial metadata (spacing, origin, direction)

Examples

Valid Dataset Example

my_ct_dataset/
├── image/
│   ├── patient001.mha
│   ├── patient002.mha
│   └── patient003.mha
├── label/
│   ├── patient001.mha
│   ├── patient002.mha
│   └── patient003.mha
└── meta.json

Multi-Format Dataset

ITKIT can work with mixed formats within the same dataset:

dataset/
├── image/
│   ├── case001.nii.gz
│   ├── case002.mha
│   └── case003.nrrd
├── label/
│   ├── case001.nii.gz
│   ├── case002.mha
│   └── case003.nrrd

Patched Dataset Structure

After running itk_patch, the output follows a nested structure:

patched_dataset/
├── case001/
│   ├── image_patch_000.mha
│   ├── image_patch_001.mha
│   ├── label_patch_000.mha
│   ├── label_patch_001.mha
│   └── ...
├── case002/
│   └── ...
└── crop_meta.json

Converting from Other Formats

ITKIT provides dataset conversion tools to transform datasets from popular frameworks into ITKIT format:

From MONAI Format

# Convert MONAI Decathlon format to ITKIT format
itk_convert monai /path/to/monai_dataset /path/to/itkit_dataset

From TorchIO Format

# Convert TorchIO format to ITKIT format
itk_convert torchio /path/to/torchio_dataset /path/to/itkit_dataset

Creating a Dataset from Scratch

From DICOM Series

If you have DICOM files, you can use ITKIT's IO tools:

from itkit.io import dcm_toolkit
import SimpleITK as sitk

# Read DICOM series
image = dcm_toolkit.read_dicom_series("/path/to/dicom/folder")

# Save as ITKIT format
sitk.WriteImage(image, "/dataset/image/case001.mha")

From Individual Files

If you have individual image files in various formats:

# Use itk_convert to standardize format
itk_convert format mha /path/to/mixed_format /path/to/itkit_dataset

Validating Dataset Structure

Use itk_check to validate your dataset:

# Check if dataset meets requirements
itk_check check /path/to/dataset

# Check with specific constraints
itk_check check /path/to/dataset \
    --min-size 32 32 32 \
    --max-size 512 512 512 \
    --min-spacing 0.5 0.5 0.5 \
    --max-spacing 2.0 2.0 2.0

Best Practices

  1. Use consistent naming: Keep file names simple and consistent (e.g., case001, case002, ...)
  2. Match image-label pairs: Ensure every image has a corresponding label with the same name
  3. Use .mha format: For simplicity, use the single-file .mha format
  4. Document metadata: Include a meta.json file with dataset information
  5. Validate before processing: Run itk_check before starting preprocessing workflows
  6. Keep originals: Always work on copies, never modify your original data