HEp-2 Cell Image Representation in the Adaptive CoDT Feature Space

In the previous section, we model the CoDT feature space of HEp-2 cell images as a GMM, and learn the adaptive parameters X = {wt, fit, Et, t = 1, 2,T} of the GMM. The samples X can be described by the following gradient vector, a.k.a. score function:

The gradient describes how the parameters X should be justified to best fit the input X. To measure the similarity between two HEp-2 cell images, a Fisher Kernel (FK) [15] is calculated as

where FX is the Fisher Information Matrix (FIM) formulated as

The superscript T means the transpose of GX. Fisher information is a measurement about the amount of information that X carries with respect to parameters X.

As FX is symmetric and positive semi-deflnite, and Ff1 can be decomposed as Ffl = LTkLX, the FK can be rewritten as


The normalized gradients with respect to the weights wt, the mean fit and covariance Xt also correspond respectively to 0-order, 1st-order and 2nd-order statistics.

Let z(t) denote the occupancy probability of the CoDT feature xn for the t-th Gaussian:

It can be also regarded as the soft assignment of xn to the t-th Gaussian.

To avoid enforcing explicitly the constraints in (7.6), we use a parameter et to re-parameterize the wight parameter wk following the soft-max formalism, which is defined as:

The gradients of a single CoDT feature xn w.r.t the parameters et, /ut and at of the GMM can be formulated as

where the superscript d denotes the d-th dimension of the input vector.

Then, the normalized gradients are computed by multiplying the square-root inverse of the diagonal FIM. Let fe,, f^ and fad be the entry on the diagonal of F

corresponding to ySt log p(xn |X), y^d log p(xn |X) and yad log p(xn |X) respectively, and calculated approximately as Д = wt, f^ = wt/(ad )2 and fad = 2wt/(atd)2. Therefore, the corresponding gradients as follows:

The Fisher representation is the concatenation of all the gradients for d = 1, 2,D dimension of the CoDT feature and for T Gaussians. In our cases, we only consider the gradients with respect to the mean and covariance, i.e., G^d (X) and Gad (X), since the gradient with respect to the weights is verified that bring little additional information [13]. Therefore the dimension of the resulting representation is 2DT. The CoDT features are embedded in a higher-dimensional feature space which is more suitable for linear classification.

To avoid dependence on the sample size, we normalize the final image representation by the size of CoDT features from the HEp-2 cell image, N, i.e., G (X) = NGx (X). After that, two additional normalization steps [23] are conducted in order to improve the results, that are the power normalization and ^-normalization.

Power normalization is performed in each dimension as:

In this study, we choose the power coefficient т = 2. The motivation of power normalization is to “unsparsify” the Fisher representation which becomes sparser while the number of Gaussian components of the GMM is increasing.

^-normalization is defined as:

Our proposed AdaCoDT method has several advantages over the BoW framework [13, 23]. Firstly, it is a generalization of the BoW framework. The resulting representation is not limited to the occurrences of each visual word. It additionally includes the information about the distribution of the CoDT features. It overcomes the information loss raised by the quantization procedure of the BoW framework. Secondly, it defines a kernel from a generative model of the data. Thirdly, it can be generated from a much smaller codebook and therefore it reduces the computational cost compared with the BoW framework. And lastly, with the same size of vocabulary, it is much larger than the BoW representation. Hence, it assures an excellent performance with a simple linear classifier.

< Prev   CONTENTS   Source   Next >