Dataset Structure
ITKIT assumes and supports a specific dataset structure for all operations. This standardized format ensures consistency and simplifies preprocessing workflows.
Standard Dataset Format
The following dataset structure is supported and assumed in all ITKIT operations:
dataset_root/
├── image/
│ ├── case001.mha
│ ├── case002.mha
│ ├── case003.mha
│ └── ...
├── label/
│ ├── case001.mha
│ ├── case002.mha
│ ├── case003.mha
│ └── ...
└── meta.json
Key Requirements
Folder Structure
- image/: Contains all image files
- label/: Contains all corresponding label/segmentation files
- meta.json: (Optional) Metadata file with dataset information
File Naming
- Image and label files must have matching names (e.g.,
case001.mhain bothimage/andlabel/folders) - File extensions can be:
.mha,.mhd,.nii,.nii.gz, or.nrrd
File Pairing
ITKIT operations expect paired image-label samples. Each image file in the image/ folder should have a corresponding label file with the same name in the label/ folder.
Supported File Formats
ITKIT supports the following medical image formats:
- MetaImage (.mha): Single file format (recommended)
- MetaImage Header (.mhd): Header + separate .raw data file
- NIfTI (.nii): Uncompressed NIfTI format
- NIfTI Compressed (.nii.gz): Compressed NIfTI format (common)
- NRRD (.nrrd): Nearly Raw Raster Data format
Metadata File (meta.json)
The optional meta.json file can contain dataset-level metadata:
{
"dataset_name": "MyDataset",
"modality": "CT",
"num_samples": 100,
"spacing": [1.0, 1.0, 1.0],
"labels": {
"0": "background",
"1": "liver",
"2": "tumor"
}
}
This file is automatically generated by some ITKIT commands (e.g., itk_resample, itk_convert).
Image and Label Requirements
Image Files
- Can be 2D or 3D medical images
- Supported data types: int16, uint16, float32, float64
- Must have spatial metadata (spacing, origin, direction)
Label Files
- Must have the same dimensions as the corresponding image
- Typically integer types (uint8, int16, uint16)
- Values represent class indices (0 = background, 1+ = structures)
- Must have matching spatial metadata (spacing, origin, direction)
Examples
Valid Dataset Example
my_ct_dataset/
├── image/
│ ├── patient001.mha
│ ├── patient002.mha
│ └── patient003.mha
├── label/
│ ├── patient001.mha
│ ├── patient002.mha
│ └── patient003.mha
└── meta.json
Multi-Format Dataset
ITKIT can work with mixed formats within the same dataset:
dataset/
├── image/
│ ├── case001.nii.gz
│ ├── case002.mha
│ └── case003.nrrd
├── label/
│ ├── case001.nii.gz
│ ├── case002.mha
│ └── case003.nrrd
Patched Dataset Structure
After running itk_patch, the output follows a nested structure:
patched_dataset/
├── case001/
│ ├── image_patch_000.mha
│ ├── image_patch_001.mha
│ ├── label_patch_000.mha
│ ├── label_patch_001.mha
│ └── ...
├── case002/
│ └── ...
└── crop_meta.json
Converting from Other Formats
ITKIT provides dataset conversion tools to transform datasets from popular frameworks into ITKIT format:
From MONAI Format
# Convert MONAI Decathlon format to ITKIT format
itk_convert monai /path/to/monai_dataset /path/to/itkit_dataset
From TorchIO Format
# Convert TorchIO format to ITKIT format
itk_convert torchio /path/to/torchio_dataset /path/to/itkit_dataset
Creating a Dataset from Scratch
From DICOM Series
If you have DICOM files, you can use ITKIT's IO tools:
from itkit.io import dcm_toolkit
import SimpleITK as sitk
# Read DICOM series
image = dcm_toolkit.read_dicom_series("/path/to/dicom/folder")
# Save as ITKIT format
sitk.WriteImage(image, "/dataset/image/case001.mha")
From Individual Files
If you have individual image files in various formats:
# Use itk_convert to standardize format
itk_convert format mha /path/to/mixed_format /path/to/itkit_dataset
Validating Dataset Structure
Use itk_check to validate your dataset:
# Check if dataset meets requirements
itk_check check /path/to/dataset
# Check with specific constraints
itk_check check /path/to/dataset \
--min-size 32 32 32 \
--max-size 512 512 512 \
--min-spacing 0.5 0.5 0.5 \
--max-spacing 2.0 2.0 2.0
Best Practices
- Use consistent naming: Keep file names simple and consistent (e.g., case001, case002, ...)
- Match image-label pairs: Ensure every image has a corresponding label with the same name
- Use .mha format: For simplicity, use the single-file .mha format
- Document metadata: Include a meta.json file with dataset information
- Validate before processing: Run
itk_checkbefore starting preprocessing workflows - Keep originals: Always work on copies, never modify your original data