Segmentation Results

When compared against other variants of similarity and continuity using different color spaces and distance metrics, the proposed approach allows to obtain more robust segmentations as presented by the Figure bellow:

Figure 1 - Comparison among other Mumford-Shah variants: From the left to right: the original image, the Mumford-Shah model based on HSV cylindrical measure,
the CIELab 2000 metric, the CIELab CMC metric, the Mahalanobis distance, and the proposed supervised approach.

The figures used in our experiments were selected from the Berkeley Image Dataset, and the training set was defined over the object of interest (except in (a), where the background was used). For each input image a fixed number of desired regions was used, being 2, 4, 10, 4, 5, and 8 regions for the rows (a) to (f), respectively. The q-orders used to produce these results were the 2nd, 2nd, 4th, 1st, 4th and 2nd for the images (a) to (f).

The results we obtained have in general shown a better delimitation for the region of interest and the background, while in other methods we frequently lost some region to the background. The amount of execution parameters in our approach was reduced to the number of desired regions and the q-order polynomial map, using the RGB as color space. In variants supplied by the Lab2000 and LabCMC metrics the combination of the parameters may vary depending on lightness and chromacity.

Methodology adopted in the experiments
In our approach we performed an empirical evaluation using images from the Berkleley Image Dataset, which provides for each image 5 to 8 reference ground-truth images. Based on these ground-truth's the following evaluation methods were used in our experiment: Rand index, Fowlkes-Mallows, Jacard and Dongen.

These evaluation methods are a cathegory of image segmentation evaluation techniques based on clustering comparisons. These methods provide dissimilarity indexes varing on a range of 0 for a complete correlation between the segmented image and the ground-truth, and 1 for dissimilarity.

The procedure adopted to evaluate our method was the follow: 60 images from the Berkeley Image Dataset. We priorized to select those images where the regions were well defined in the ground-truth image, and the ambiguity among different observers is low (e.g., in some textured images the ambiguity is higher due the large amount of regions and gradient variations). For each input image, we produced 22 segmentation results (were stored the results containing as input parameter the number of regions equal to 250,225,200,180,150,120,100,80,60,50,40,30,20,10,9,8,7,6,5,4,3 and 2), reducing the input image to only 2 final regions. With this procedure we obtained 1.386 segmentation results, where each one was compared against their respective ground-truth. For the 60 input images used in our experiment the mean of ground-truth's by image was of 5.48, resulting in 7.601 comparisons for all the set.

Each comparison resulted in a general dissimilarity index, computed from the mean values of the indexes Rand, Fowlkes-Mallows, Jacard and Dongen. Since each image presented a set of diverging ground-truth's produced by different observers, the resulting evaluation scores for each individual segmentation result were calculated as the mean scores obtained after processing all ground-truth's of a given image. Finally, the number of evaluation indexes obtained in our experiment was 30.404, and all these indexes and segmentation results are available HERE.

This experiment resulted for each image in a dissimilarity graph as illustrated in the Figure below (left): in (a) it is shown the original image with the training set defined by the white paths, and in (b) the segmentation result containing 4 regions. From (c) to (g) the respective ground truth images are shown, varying into 4 to 10 regions. This segmentation result represented by the 4 most significant regions have the best resemblance to their respective ground-truth images, and this conclusion can be straightforwardly addressed by the evolution graphs shown in the right side (top graph): the lower mean evaluation index is the criteria used for this choice, indicating the number of regions need to achieve that. In these graphs a another question arises: if the ground-truth's present a very high correlation among the observers, and if the segmentation method is able to produce a similar segmentation result, the general mean dissimilarity index for the best segmentation will be near to 0. The best segmentation result previosly shown contains 4 regions with a mean evaluation index of 0.032, and a standard deviation of 0.0062. By reducing the number of regions to 3, the litte background located between the animals will be lost, and the mean evaluation index is discreetly penalized to 0.034, and with only two regions the differences penalize the mean index to 0.118.

Figure 1 Image 207056 GTS

Figure 2 Image 207056 Evaluation

On the other hand, in the next Figure -- image 3096 -- a considerable ambiguity level is presented in one observer, which delimited the background into many regions (d). In this case, the segmentation results can achieve a good correlation index in different times over this graph: approximately with 20 or 30 regions the segmentation result matches with this ambiguous ground-truth; and by reducing this current number of regions to 2, the index then matches to the other observers. The general evaluation index is then minimized when a lower number of regions is achieved, according to the consensus among the observers. It is shown by the evolution graph in (f), where the general evaluation index was penalized (the mean evaluation indexes were 0.0109, 0.0470, 0.4840, 0.0087, 0.0092, for the observer ID's 1105, 1107, 1121, 1123 and 1132, respectively) to 0.112 with a standard deviation of 0.2086, for the 2 most significative regions. An individual analysis of this specific result will shown the ambiguous ground-truth's and its observers, as illustrated in (http://www.lapix.ufsc.br/sms/segresults.html).

Figure 1 Image 3096 GTS

Figure 2 Image 3096 Evaluation

Other results with more variability among observers are shown by the figures below, where a few segmentation results can match to these ground truth images.

Figure 1 Image 69015 GTS

Figure 2 Image 69015 Evaluation

Figure 1 Image 304034 GTS

Figure 2 Image 304034 Evaluation

  • All these experiments were categorized into: