Joint Analysis on Visual Restoration and Object Detection

Within-Domain Performance

In this test, detectors’ training and evaluation are based on identical data domain. The following analysis will unveil two points: (1) Domain quality has an ignorable effect on detection performance; (2) restoration is a thankless method for improving within-domain detection performance, because of the problem of low recall efficiency. Note that low recall efficiency means low precision under the condition of the same recall rate [22].

Numerical Analysis

At first, we train and evaluate SSD with different input sizes (i.e., 320 and 512) and backbones (i.e., VGG16 [29], MobileNet [30], and ResNetlOl [31]). As shown in Table 6.2, on domain-O, domain-F, and domain-G, SSD320-VGG16 achieves mAP of 69.3%, 67.8%, and 65.9% and SSD512- VGG16 obtains mAP of 72.9%, 71.3%, and 69.5%. It is seen that the accuracy decreases with the rise of restoration intensity. From backbone- variable assessments, the same trend emerges. Note that ResNetlOl performs inferiorly to VGG16 and MobileNet, because the large receptive field in ResNetlOl is unfavorable to an immense number of small objects in URPC2018. Referring to Table 6.2, all of RetinaNet512, RefineDet512, and DRNet512 can achieve the highest mAP on domain-O and see the lowest mAP on domain-G. Thus, in terms of mAP, detection accuracy is negatively correlated with domain quality. However, mAP cannot reflect accuracy details, so the following analysis will continue investigating within-domain performance.

Visualization of Convolutional Representation

The human perceives domain quality based on object saliency. As a result, compared to the low-quality domain, the human can more easily detect objects in the high-quality domain since high-quality samples contain salient object representation. Thereby, we are inspired to investigate object saliency in CNN-based detectors. Figure 6.4 demonstrates multi-scale features in SSD and DRNet. These features serve as the input of detection heads, so they are final convolutional features for detection. Referring to Figure 6.4, despite domain diversity, there is relatively little difference in object saliency in multi-scale feature maps. It is seen that different from a human’s perception mechanism, convolution is able to capture salient object representation from low-quality data domain. Hence, in terms of object saliency, domain quality has an ignorable effect on convolutional representation.

Precision-Recall Analysis

As shown in Figure 6.5, precision-recall curves are employed for further analysis of detection performance. It can be seen that precision-recall

Method

Train data

Test data

mAP

trepang

echinus

shell

starfish

SSD320-VGG16

train

test

69.3

67.8

84.9

44.7

79.7

train-F

test-F

67.8

68.9

82.3

42.2

78.0

train-G

test-G

65.9

65.4

82.3

39.0

76.9

SSD512-VGG16

train

test

72.9

70.2

87.1

50.8

83.5

train-F

test-F

71.3

68.9

85.8

48.5

82.1

train-G

test-G

69.5

67.2

84.7

45.3

80.9

SSD512-MobileNet

train

test

70.7

65.3

87.1

47.5

82.8

train-F

test-F

68.9

63.7

85.1

45.4

81.7

train-G

test-G

67.4

61.5

84.9

42.6

80.5

SSD512-ResNetl01

train

test

67.0

59.8

86.3

41.7

80.3

train-F

test-F

65.6

61.1

84.7

37.5

79.1

train-G

test-G

64.6

60.1

83.7

38.6

76.2

RetinaNet512-VGGl 6

train

test

74.0

69.8

88.1

54.7

83.4

train-F

test-F

72.5

69.1

87.1

50.7

82.9

train-G

test-G

71.0

67.3

86.9

48.9

81.1

RefineDet512-V GG16

train

test

76.0

73.8

90.2

54.1

85.8

train-F

test-F

72.9

72.0

88.6

46.4

84.6

train-G

test-G

72.0

71.4

88.4

46.3

81.8

DRNet512-VGG16

train

test

77.1

75.6

91.1

55.1

86.7

train-F

test-F

75.4

73.6

89.8

52.7

85.6

train-G

test-G

73.8

72.0

89.8

49.9

83.5

Visualization of convolutional representation for objects

FIGURE 6.4 Visualization of convolutional representation for objects. Each row contains input image and multi-scale features. High-level features are shown on the right. All features are processed with L2 norm across channel and then they are normalized for visualization. For a fair comparison, the same normalization factor is used for scale-identical features.

curves have two typical appearances. On the one hand, the high-precision part contains high-confident detection results, and here domain- related curves are highly overlapped. Referring to “echinus” detected by DRNet512-VGG16, curves of domain-0, domain-F, and domain-G cannot be separated when the recall rate is less than 0.6. That is, when detecting high-confident objects, domain difference is negligible for detection accuracy. On the other hand, curves are separated in the low-precision part.

Precision-recall curves

FIGURE 6.5 Precision-recall curves. For high precision (e.g., >0.9), domain difference has an ignorable effect on detection performance. Overall, domain-F and domain-G reduce recall efficiency so that lower average precision is induced.

In detail, the curve of domain-F is usually below that of domain-O, while the curve of domain-G is usually below that of domain-F. That is, when detecting hard objects (i.e., low-confident detection results), false positive increases with the rise of domain quality. For example, when the recall rate equals 0.8 in “starfish” detected by SSD512-VGG16, the precision of domain-F is lower than that of domain-O, and the precision of domain- G is lower than that of domain-F. Therefore, recall efficiency is gradually reduced with increasing restoration intensity.

Based on aforementioned analysis, it can be concluded that visual restoration impairs recall efficiency and is unfavorable for improving within- domain detection. In addition, because domain-related mAP is relatively close and high-confident recall is far more important than low-confident recall in robotic perception, we conclude that domain quality has an ignorable effect on within-domain object detection.

 
Source
< Prev   CONTENTS   Source   Next >