Feature Selection

Identifying the right features and extracting the same is the major role in object tracking. This selects the features which are very important and makes the object unique. It is done manually. Feature selection and object representation are strongly correlated. Contour representation uses object features like object edges [13]. Some are listed below:


Edges are defined as the region or outline between the object and the background. An object boundary can be easily located using human eyes. Changes in intensity are strong near the boundaries. Change in intensity can be recognized using the edge feature and then the tracking algorithm tracks the objects. Edge features are not sensitive to change in light intensities [10].

Location of the pixel changed but brightness remains same [20]

FIGURE 3.5 Location of the pixel changed but brightness remains same [20].

Optical Flow

Motion features are sometimes defined using brightness patterns in a visual scene. We experience this phenomenon in our daily life when driving and looking outside the window. The objects viewed from the car (streets, buildings, trees, etc.) looks like they are moving backward. Apparent motion is determined using pixels movement among frames. The limitation here is brightness, which leads to reliable brightness in different frames. Figure 3.5 illustrates this process.


Diverse colors are used to store the different object feature information. Generally, color is described using RGB color space. Sometimes in computer vision it can be represented by YCbCr and HSV color space. The major problem of using color is its dependency to changes in illumination. Object color is not only affected by the illumination factor, it is also affected by the object reflection properties [13].


Intensity variation of a surface can be measured using regularity or smoothness properties. The object descriptors are generated using a pre-processing step [13]. Texture descriptors, for example, require the corresponding stages to level, edge, spot, etc. [14].

Object Tracking Techniques

Object tracking methods [21] are listed below:

i. Point

ii. Kernel

iii. Silhouette

Point-based tracking [10]

FIGURE 3.6 Point-based tracking [10].


The point method is an accurate method and the algorithm is reliable and robust [10]. The moving object can be identified with the features extracted. Here, the moving object is represented by its feature points (Figure 3.6). The principle used here considers objects in the consecutive frame and points on the previous frame are used [18]. It is able deal with very small objects. A disadvantage is the presence of false detections. Some approaches of point tracking are as below:

i. Kalman Filter

This filter is constructed based on the optimum recursive data processing algorithm [10]. It composes two steps: estimate and update. Next state can be predicted w'ith current state and update the estimated measurements. It is able to do the following:

  • • Optimal solutions
  • • Noise handling
  • • Tracks the single and multiple objects

ii. Particle Filter

The particle filter is able to track under linear or non-linear conditions [19]. It is able to produce better results than other methods [19]. The procedure is described below:

  • 1. Generate samples to represent the initial probability.
  • 2. Predict the next state using the prior equation.
  • 3. Get the weights for the states computed using the observation, (from step 2). Predicted states along with the weights collectively represent the state distribution.
  • 4. Resample the distribution to achieve the uniformly distributed current state omitting the least-significant representation.
  • 5. Continue Steps 2 through 4 until all the observations are exhausted.

iii. Multiple Hypothesis Tracking

Motion from several frames is combined in multiple hypothesis tracking (MHT). It produces good results if correspondence is established by observing several frames instead of considering the successive frames. MHT provides several features for an object. Tracking algorithm over a period is an iterative algorithm [22].

Kernel-Based Tracking Approach

This approach is mainly based on motion (parametric) and the origin of the object. It is computed using the subsequent frames [19]. Another advantage of this method is that it easily predicts its estimation. A trajectory of an object is required to find whether the object is stationary or moving. Identifying the object region covering the object is very important [13].

Kernel refers to the appearance or shape of the object. Various original shapes are available (rectangular and ellipse), to represent the object [23]. Kernel tracking has four types. Every method differs on the representation and number of objects.

i. Simple Template Matching

This is a [19, 24] fundamental procedure for investigating the particular area in the video frame.

A reference frame is compared with the target frame in a video sequence. A single object can be tracked within a video and semi overlapping is done. This method helps to find even small components in an image that match with every successive frame.

  • • Single object tracking.
  • • Incomplete occlusion.
  • • Need of an external initialization.

ii. Mean Shift Method

The main objective of this method [13] is to identify a particular part in a video frame that is most identical to the model previously prepared. Histogram representation is used for tracking. The gradient ascent method can also be used for tracking and provides similar results. The target object can be tracked using rectangular or elliptical structures. The target model can be framed using a PDF function. An asymmetric kernel has been used to regularize the target model.

Silhouette Approach

Objects with complex shapes can't be correctly characterized. Silhouette methodology will give accurate shape sketch of an objects. A tracking model can be developed by finding the region of the object based on past frames. It is classified into two classes: i. shape matching, and ii. contour tracking. It has the following advantages.

i. It supports many objects shapes

ii. It handles object occlusion

iii. It deals with splitting and merging

i. Contour Tracking

Contour tracking is also called boundary tracking. It can create a unique contour from the previous frame and current frame. Contour tracking is similar to state space models [22]. Edge-based features are used because they are unaffected to lighting conditions. This creates as strong contour. Since the object boundary is small, its speed increases. It has two techniques. The first one uses the state space model, which increases the shape and motion, and the other one directly grows the contour. A gradient descent procedure is used because that maximizes a similarity score between the model and the current image region.

ii. Shape Matching

The current frame object model can be developed. Template and shape matching both perform the same. Shape matching also discovers the matching silhouettes. It is similar to point matching. It is capable of:

  • 1. Tracking single objects using edge-based features.
  • 2. Using the Hough transform to solve the occlusion.

Introduction to Deep and Machine Learning Techniques

Both machine learning (ML) and deep learning (DL) are subdivisions of artificial intelligence. The computer or machine performs specific actions based on the object patterns and inferences without clear instructions. For this an algorithm and a model has to be carefully developed. The model becomes efficient only when the training date set is huge [25, 26]. The model must be trained through iterations with the labeled training data set so as to produce the output. Once the training is over the model can be tested with unlabeled data. The term ML was created by Arthur Samuel [27]. Alan Turing proposed the question “Can machines think?”

Supervised learning of a mathematical model has been developed for a set of labeled data, whereas reinforcement methods deal with unlabeled training data. Examples of a classification algorithm are separating “spam” mails. Regression uses continuous outputs, e.g., frequency, voltage, or product prize. Unsupervised learning method works with training data that does not have a label. If the application is a development of a robot, the procedures will have its own procedures from previous learning. Dimensionality reduction is one of the techniques used to reduce the number of features.

Deep Learning vs Machine Learning Techniques

Modern ML tools incorporate neural networks in the sequence of layers to learn from the training data set. Computational intelligence (Cl) has been developed as a powerful method for making a machine learn. It works well in the field of neural networks, fuzzy systems, and evolutionary algorithms. Recent advances in DL have been playing an important role in dealing with huge amounts of unlabeled data. Due to the remarkable successes of DL techniques, we are now able to boost quality of service (QoS) significantly. More complex features can be easily extracted by deep neural network (DNN) (also called deep belief network and convolutional neural network (CNN)) and used to efficiently learn their representations. However, implementing deep learning faces many implementation challenges such as large data sets (needed to ensure desired results), high complexity of the network, high computational power, etc., which need to be addressed to effectively implement deep learning to solve real world image processing problems.


Vehicle Detection

Vehicle detection uses computer vision technique for tracking vehicles. Vehicle detection has an important role in autonomous driving applications like forward collision detection, adaptive travel control, and automatic lane keeping. The following example shows vehicle detection using the DL method. Deep learning is the dominant tool that automatically extracts image features. The vehicle data set is loaded and the convolution neural network is designed. First the inputs are split into training (60%) and testing (30%). This example uses 295 vehicle images. To get accurate classification more data are required. Each image has a label value. An example image from the training data [27] set is displayed in Figure 3.7.

A CNN forms the foundation for the R-CNN detector [28]. It can be created using the neural network toolbox. It has three layers: input, hidden, and output. However, for detection, input size must be small. Here the object size is greater than [13], so the input size is selected as. Next we have to decide the middle layers of CNN. It contains a number of layers such as convolutional, rectified linear units, and pooling. These are the fundamental blocks of CNN. The final output layers contain fully connected and a soft-max layer.

Figure 3.8 shows testing a single image and provides a good promising result. The detector performance can be measured by considering the entire dataset to evaluate the performance of the detector test with the entire dataset. Computer vision toolbox in MATLAB provides this feature [27]. Precision is shown in Figure 3.9.

Training of a Cascade Detector

The vision cascade object detector has many pre-trained classifiers for detecting frontal regions and the upper part of the body. The kind of object that can be detected

Testing with single image

FIGURE 3.8 Testing with single image.

with this category includes objects with a constant aspect ratio. Aspect ratios of faces, stop signs, and cars are constant. The cascade detector contains a window that slides over an entire image. Presence of the object can be estimated using the cascade detector. Since the aspect ratio of 3D object is not fixed, the detector is particular in plane rotation. A single detector cannot detect an object in all three dimensions. Negative samples should be rejected as soon as possible.

Visualization of the HOG features of a bicycle

FIGURE 3.10 Visualization of the HOG features of a bicycle.

Feature Types Available for Training

According to ML and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon being observed. Histogram of oriented gradients (HOG) features is used to detect objects. HOG, is a feature descriptor that is often used to extract features from image data. The outline of a bicycle can be viewed using HOG features (Figure 3.10).

Supply Positive Samples

The Training Image Labeler app is used to create positive samples easily (Figures 3.11 and 3.12).

Stop sign as positive example

FIGURE 3.12 Stop sign as positive example.

Positive samples can be supplied used the following methods:

  • 1. Identify rectangular regions.
  • 2. Crop the object of interest.

Supply Negative Images

The train cascade object detector does not specify negative samples.

Region of interest can be obtained by defining the type of detector input image and region of interest correctly [28-32] (Figures 3.13 and 3.14)

Upper portion of body detection in an image

FIGURE 3.14 Upper portion of body detection in an image.

  • 1. Construct the cascade object detector.
  • 2. Call the cascade object detector for the input image

Region of interest can be obtained by defining the type of detector input image and region of interest correctly [28-32].


This chapter provides a thorough survey on visual-tracking method. A detailed study of the connecting methods is also discussed. Moreover, different methods adopted for visual tracking are introduced. The chapter highlights the features of algorithms for researchers in the field of visual tracking. Future work focuses on object detection with a non-static background and having multiple cameras which can be used in real-time surveillance applications.


  • 1. Dana H. Ballard and Christopher M. Brown. Computer Vision. Prentice Hall. 1982. ISBN 978-0-13-165316-0.
  • 2. T. Huang and Carlo E. Vandoni. (ed.). Computer vision: Evolution and promise (PDF). 19th CERN School of Computing. Geneva. CERN, pp. 21-25. 1996. doi:10.5170/ CERN-1996-008.21. ISBN 978-9290830955.
  • 3. Milan Sonka. Vaclav Hlavac, and Roger Boyle. Image Processing, Analysis, and Machine Vision. Thomson. 2008. ISBN 978-0-495-08252-1.
  • 4. Reinhard Klette. Concise Computer Vision. Springer, 2014. ISBN 978-1-4471-6320-6.
  • 5. Linda G. Shapiro, and George C. Stockman. Computer Vision. Prentice Hall, 2001. ISBN 978-0-13-030796-5.
  • 6. Tim Morris. Computer Vision and Image Processing. Palgrave Macmillan. 2004. ISBN 978-0-333-99451-1.
  • 7. Bernd Jahne and Horst Haubecker. Computer Vision and Applications, A Guide for Students and Practitioners. Academic Press, 2000. ISBN 978-0-13-085198-7.
  • 8. Jae-Yeong Lee and Wonpil Yu. Visual tracking by partition-based histogram backpro- jection and maximum support criteria. Robotics and Biomimetics (ROBIO). 2011 IEEE International Conference on. 7-11 December, pp. 2860, 2865, 2011.
  • 9. Grandham Sindhuja and Renuka Devi. A survey on detection and tracking of objects in video sequence. International Journal of Engineering Research and General Science, 3(2), 2015. ISSN2091-2730
  • 10. Alper Yilmaz, Omar Javed, and Mubarak Shah. Object tracking: A survey. ACM Computing Surveys (CSUR), 38(4): 13, 2006.
  • 11. Anshul Vishwakarma and Amit Khare. “Vehicle detection and tracking for traffic surveillance applications: A review paper”, IJCSE. 6(7), 2008. ISSN 2347-2693.
  • 12. Sanna Agren. Object tracking methods and their areas of application: A meta-analysis. A thorough review and summary of commonly used object tracking methods, 2017.
  • 13. Sandeep Kumar Patel and Agya Mishra. Moving object tracking techniques: A critical review. Indian Journal of Computer Science and Engineering, 4(2):95-102, 2013.
  • 14. Pennsylvania State University. Probability density functions, https://onlinecourses.sci- ence.psu.edu/stat414/node/97, 2016. [Online: accessed 06 December 2016].
  • 15. Michael J Black and Allan D Jepson. Eigentracking: Robust matching and tracking of articulated objects using a view-based representation. International Journal of Computer Vision, 26(l):63-84. 1998.
  • 16. Rupali S. Rakibe and Bharati D. Patil. Background subtraction algorithm based human motion detection. International Journal of Scientific and Research Publications, 3(5), May 2013, ISSN 2250-3153.
  • 17. K. Srinivasan, K. Porkumaran. and G. Sainarayanan. Improved background subtraction techniques for security in video applications. International Conference on Anticounterfeiting, Security, and Identification in Communication. 2009. ISSN: 2163-5048.
  • 18. Sen-Ching S. Cheung and Chandrika Kamath. Robust techniques for background subtraction in urban traffic video.
  • 19. Joshan Athanesious J and Suresh P. Implementation and comparison of kernel and silhouette based object tracking. International Journal of Advanced Research in Computer Engineering & Technology: 1298-1303, March 2013.
  • 20. Min Sun and Krstic Srdjan. Optical flow, http://www.cs.princeton.edu/courses/archive/ fall08/cos429/optiflow.pdf, 2016. [Online: accessed 06 December 2016].
  • 21. Nirav D. Modi. Moving object detection and tracking in video. International Journal of Electrical, Electronics and Data Communication, 2(3), March 2014, ISSN 2320-2084.
  • 22. J. Joshan Athanesious and P. Suresh. Systematic survey on object tracking methods in video. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET). 242-247. October 2012.
  • 23. R. Hemangi Patil and K. S. Bhagat. Detection and tracking of moving object: A survey. International Journal of Engineering Research and Applications, 5(11):138—142, 2015.
  • 24. S. Saravanakumar. A. Vadivel, and C.G. Saneem Ahmed. Multiple human object tracking using background subtraction and shadow removal techniques. Signal and Image Processing (ICSIP), 2010 International Conference on 15-17 December, vol„ pp.79, 84. 2010.
  • 25. R. John. Forrest H. Bennett, David Andre, and Martin A. Keane. Automated design of both the topology and sizing of analog electrical circuits using genetic programming. Artificial Intelligence in Design ’96. Springer, Dordrecht, pp. 151-170, 1996.
  • 26. С. M. Bishop. Pattern Recognition and Machine Learning, Springer. 2006. ISBN 978-0-387-31073-2.
  • 27. Mathworks.com.
  • 28. Shaoqing Ren. et al. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 2015.
  • 29. R. Lienhart. A. Kuranov, and V. Pisarevsky. Empirical analysis of detection cascades of boosted classifiers for rapid object detection. Proceedings of the 25th DAGM Symposium on Pattern Recognition, Magdeburg, Germany, 2003.
  • 30. Ojala Timo, Pietikainen Matti. and Maenpaa Topi. Multi-resolution gray-scale and rotation invariant texture classification with local binary patterns. In IEEE Transactions on Pattern Analysis and Machine Intelligence. 24(7):971—987, 2002.
  • 31. H. Kruppa. M. Castrillon-Santana. and B. Schiele. Fast and robust face finding via local context. Proceedings of the Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 157-164. 2003.
  • 32. Marco Castrillon, Oscar Deniz, Cayetano Guerra, and Mario Hernandez. ENCARA2: Real-time detection of multiple faces at different resolutions in video streams. Journal of Visual Communication and Image Representation. 18(2): 130-140. 2007.
< Prev   CONTENTS   Source   Next >