itk_evaluate
Evaluate segmentation predictions against ground truth by calculating comprehensive metrics and volume statistics.
Usage
itk_evaluate <gt_folder> <pred_folder> <save_folder> [options]
Parameters
gt_folder: Folder containing ground truth masks (.mha, .nii, .nii.gz, .nrrd files)pred_folder: Folder containing prediction masks (same format as gt_folder)save_folder: Folder to save evaluation results (created if not exists)--format {csv,excel}: Output format (default: excel)csv: Saves 5 separate CSV filesexcel: Saves 1 Excel file with 5 sheets--mp: Enable multiprocessing for faster evaluation--workers N: Number of worker processes (default: half of CPU cores)
Features
- Automatic Resampling: Predictions are automatically resampled to match ground truth spacing/size if they differ
- Consistent Orientation: All samples are oriented to LPI for consistent evaluation
- Multi-class Support: Handles multi-class segmentation with per-class metrics
- Volume Statistics: Calculates volumes in cubic millimeters for both GT and predictions
- Multiple Aggregation Views: Provides metrics in 5 different aggregation formats
Metrics Calculated
For each class, the following metrics are computed:
- Dice Coefficient: Measures overlap between GT and prediction (F1-score)
- IoU (Jaccard): Intersection over Union
- F-score: Same as Dice for segmentation tasks
- Recall (Sensitivity): True positive rate
- Precision: Positive predictive value
- Accuracy: Overall pixel-wise accuracy (global metric)
Volume Statistics
For each class in each sample:
- Volume_GT: Volume in cubic millimeters calculated from ground truth
- Volume_Pred: Volume in cubic millimeters calculated from prediction
Volumes are computed as: voxel_count × (spacing_x × spacing_y × spacing_z)
Output Tables
The tool generates 5 different views of the evaluation results:
1. Per-Class Sample-Averaged (PerClass_SampleAvg)
Mean metrics for each class across all samples.
| metric | class_0 | class_1 | class_2 | ... |
|---|---|---|---|---|
| dice | 0.95 | 0.92 | 0.88 | ... |
| iou | 0.91 | 0.85 | 0.79 | ... |
| recall | 0.94 | 0.91 | 0.87 | ... |
2. Per-Sample Per-Class (PerSample_PerClass)
Detailed metrics for each sample and each class (most granular view).
| sample | class_0_dice | class_0_iou | class_1_dice | ... | accuracy |
|---|---|---|---|---|---|
| case001 | 0.96 | 0.92 | 0.94 | ... | 0.95 |
| case002 | 0.94 | 0.89 | 0.90 | ... | 0.93 |
3. Per-Sample Class-Averaged (PerSample_ClassAvg)
Mean metric across all classes for each sample.
| sample | dice | iou | fscore | recall | precision | accuracy |
|---|---|---|---|---|---|---|
| case001 | 0.95 | 0.91 | 0.95 | 0.94 | 0.96 | 0.95 |
| case002 | 0.92 | 0.85 | 0.92 | 0.91 | 0.93 | 0.93 |
4. Volume_GT
Volume in cubic millimeters for each class in each sample (from ground truth).
| sample | class_0 | class_1 | class_2 | ... |
|---|---|---|---|---|
| case001 | 1250.5 | 3420.8 | 890.2 | ... |
| case002 | 1180.3 | 3350.1 | 910.5 | ... |
5. Volume_Pred
Volume in cubic millimeters for each class in each sample (from prediction).
| sample | class_0 | class_1 | class_2 | ... |
|---|---|---|---|---|
| case001 | 1245.2 | 3380.5 | 885.1 | ... |
| case002 | 1175.8 | 3320.3 | 915.2 | ... |