BIAFLOWS NEUBIAS-WG5 : user guide documentation

Introduction

To perform benchmarking, ground truth annotations should be encoded in a format that is specific to the associated problem class. BIA workflows are also expected to output results in the same format.

Currently 9 problem classes are supported in BIAFLOWS and their respective annotation formats and computed benchmark metrics are described below.

Note: each problem class has a long name (explicit) and short name (e.g. Object Segmentation / ObjSeg). The same hold for metrics (e.g. DICE / DC).

A description of each benchmark is available on the workflow runs result table by clicking on the symbol.

Problem Class

Problem Class	Tasks	Shortname	Annotation	Example	Metrics	Tools
Object Segmentation	Delineate objects or isolated regions	ObjSeg	Label masks	sample	Mean Average Precision computed by Data Science Bowl 2018 Python code DICE (DC), computed by VISCERAL executable (archived here) AVERAGE_HAUSDORFF_DISTANCE (AHD), computed by VISCERAL executable (archived here) Fraction overlap (FOVL) computed by custom Python code	Paintera (3D) ITK-snap (3D) Cytomine (2D) QuPath (2D) 3D Cell annotator Ilastik (3D)
Pixel/Voxel classification	Estimate pixels class	PixCla	Label masks	sample	F1_SCORE (F1) computed by custom Python code RECALL (RE) computed by custom Python code ACCURACY (ACC) computed by custom Python code PRECISION (PR) computed by custom Python code	Paintera (3D) ITK-snap (3D) Cytomine (2D) QuPath (2D) 3D Cell annotator Ilastik (3D)
Spot/Object Counting	Estimate the number of objects	SptCnt	Binary masks	sample	RELATIVE_ERROR_COUNT (REC), computed by custom Python code.	Paintera (3D) ITK-snap (3D) Cytomine (2D) QuPath (2D) Ilastik (3D) ImageJ with Multi-point Tool
Spot/Object Detection	Detect objects in an image (e.g. nucleus)	ObjDet	Binary masks	sample	F1_SCORE (F1) computed by Particle Tracking Challenge metric Java code (particle matching only, archived here in `bin/DetectionPerformance.jar` CONFUSION_MATRIX (TP, FN, FP) computed by Particle Tracking Challenge metric Java code (particle matching only, archived here in `bin/DetectionPerformance.jar` PRECISION (PR) computed by Particle Tracking Challenge metric Java code (particle matching only, archived here in `bin/DetectionPerformance.jar` RECALL (RE) computed by Particle Tracking Challenge metric Java code (particle matching only, archived here in `bin/DetectionPerformance.jar` Distance RMSE (RMSE) computed by Particle Tracking Challenge metric Java code (particle matching only, archived here in `bin/DetectionPerformance.jar`	Paintera (3D) ITK-snap (3D) Cytomine (2D) QuPath (2D) Ilastik (3D) ImageJ with Multi-point Tool
Filament Tree Tracing	Estimate the medial axis of a connected filament tree network (one per image)	TreTrc	SWC	sample SWC format	NetMets metrics: Geometric False Negative rate (FNR), Geometric False Positive rate (FPR) computed by NetMets Python code. UNMATCHED_VOXEL_RATE (UVR), computed by custom Python code GATING_DIST (UVR): Maximum distance between skeleton voxels in reference and prediction skeletons to be considered as matched (default = 5 pix) Sigma (NetMets): tolerance in centerline position (default: 5 pix).	ImageJ / SNT Vaa3D Neutube (2D) Neurolucida
Filament Networks Tracing	Estimate the medial axis of one or several connected filament network(s)	LooTrc	Skeleton binary masks	sample	NetMets metrics: Geometric False Negative rate (FNR), Geometric False Positive rate (FPR) computed by NetMets Python code. UNMATCHED_VOXEL_RATE (UVR), computed by custom Python code GATING_DIST (UVR): Maximum distance between skeleton voxels in reference and prediction skeletons to be considered as matched (default = 5 pix) Sigma (NetMets): tolerance in centerline position (default: 5 pix) Skeleton sampling distance (NetMets): skeletons are sampled to be converted to OBJ models. (default: 3 voxels, default Z Ratio: 1).
Landmark Detection	Estimate the position of specific feature points	LndDet	Label masks	sample	Mean distance from predicted landmarks to closest reference landmarks with same class (MRE). Number of reference / predicted landmarks (NREF, NPRED) All metrics computed by custom Python code.
Particle Tracking	Estimate the tracks followed by particles (no division)	PrtTrk	Label masks	sample	Full normalized pairing score beta (FNPSB) Normalized pairing score alpha (NPSA) Number of reference tracks (NRT) Number of candidate tracks (NCT) Jaccard Similarity Tracks (JST) Number of paired tracks (NPT) Number of missed tracks (NMT) Number of spurious tracks (NST) Number of reference detections (NRD) Number of candidate detections (NCD) Jaccard similarity detections (JSD) Number of paired detections (NPD) Number of missed detections (NMD) Number of spurious detections (NSD) All metric computed by Particle Tracking Challenge Java code (archived here). GATING_DIST (default = 5, maximum distance between particle detections in reference / prediction tracks to be considered as matching)	ImageJ Manual Tracking ImageJ TrackMate
Object Tracking	Estimate object tracks and segmentation masks (with possible divisions)	ObjTrk	Label masks + Division text file	sample	Segmentation measure (SEG), implementation archived here Tracking measure (TRA), implementation archived here All computed from Cell Tracking Challenge metric command-line executables.	ImageJ Manual Tracking ImageJ TrackMate

Problem class, ground truth annotations and reported metrics

Introduction

Problem Class