Tag Archives: opencv

Summary of the work done at Google Summer of Code, 2016, for the OpenDetection Organization

This post is a brief compilation of the details related to the work done as a part of GSoC 2016, for the opensource library OpenDetection. It is a library with a specific focus on the subject of object localization, recognition, and detection in static as well as dynamic sequence of images.

For GSoC, 2016, I was selected to append Convolutional Neural Network related object classification and recognition modules into the library. The Proposal to the library can be accessed at: GSoC Proposal Link.

The following is a brief summary of the work done in the span of three and half months starting from May, 2016,  till Mid-August, 2016.

The modules added to the library are:

  • Building the library on CPU as well as GPU Platforms.
  • Integration of caffe library.
  • Addition of image classification module through c++ version of caffe.
  • Addition of CNN training module through c++ version of caffe.
  • A customized GTKMM based GUI for creating solver file for training.
  • A customized GTKMM based GUI for creating a network file for training/testing.
  • A customized GTKMM based GUI for annotating/cropping an image file.
  • A Segnet library based image classifier using a python wrapper.
  • An Active Appereance model prediction for face images using caffe library.
  • Selective Search based object localization algorithm module.



Work1: The primary task undertaken was to make sure that the library compiled on both GPU and non-GPU based platforms. Earlier, the library was restricted to only GPU based platforms, due to the fact that, irrespective of the fact whether cuda library is installed in the system, the library fetched for headers from cuda. With a set of 45 additions and 1 deletions over 7 files, this task was undertaken.

Work2: The next target was to include caffe library components into the opendetection library. Opendetection library is like a pool of object detection elements, and without the integration of Convolutional Neural Networks, it would remain incomplete. there exist a lot of opensource library which support training and classification using CNN, like caffe, keras, torch, theone, etc, of which after we selected caffe, because of its simplicity in usage and availability of blogs, tutorials and high end documentation on the same.

Work3: Once the library was included, the next was to include CNN based image classifier, over a c++ based code. Usually, researchers use the python wrapper provided by the caffe library to train a network or to use trained wieghts and network in classifying an image, i.e., assigning a predicted label to the test image. Herein, the task was completed with around 400 lines of code over 7 files. Any python wrapper reduces the speed of execution and in turn provides a lag in real time based applications. Also, the transfer of memory from cpu to gpu, on gpu based systems, is quite slow when the upper level code is in python. For this reason, we have directly accessed the c++ code files from the library and linked to our opendetection ODDetector class. As an example we have provided the standard Mnist digit classifier. In the example the user just needs to point a network file, trained weights and a test image, and the classification result will be obtained.

Work4: Just adding the classification abilities would make the library only half complete. Hence, for this reason we added the module which would enable the users to train their own module. With a total of around 250 changes made to 5 files, this training class was added to ODTrainer. User would only need to point towards the network and the solver file. Here again, a training example is added using Mnist digit dataset.

Work5: As stated above, a cnn based training requires a network file and a solver file. Any solver in caffe library has around 20 parameters. It is a tedious job to write the solver file from scratch, everytime a training has to be commenced. For this reason, to facilitate user feasibility over the solver properties a GUI has been introduced. This GUI has all the parameters involved in solver file. Also the user while using the gui has the facility to include or exclude a parameter. This particular commit had changes added or additions made to 9 files. The most crucial one was to add gtkmm library files to the source. GTKMM, link to understand gtkmm, is a library for involvement of gui based applications. We decided to move with GUI inclusion because, to make user handle solver file in an effective way, a set of 19 parameters had to be handled. If it were upto the c++ arguments to facilitate these 19 parameters, the outcome would have been a very cumbersome application. Also, not all parameters were to be added to the solver always, so a GUI appeared to be the most feasible option from the user’s end. A set of around 1250 lines of code made this module integrated into the opendetection library. The following are a few features of the GUI:

  • The above code promts the user if any mistake is made from user-end.
  • Pressing update button every time may be time consuming, hence the latest commits involve the fact that without pressing the buttons the parameters cab ne edited
  • The main function of the update buttons after every parameter is make sure that, for future developments, if the intermediate parameters are to be accessed, the current version enables it.
  • Not many open source libraries had this functionality

Work6: After solver, the next important thing to training is network file. A network file in CNN has the structure of the CNN, the layers, their individual properties, weight initializers, etc. Like the solver maker, we have created a module which provides a GUI to make this network. Every network has lot many properties, writing them manually into the file is a time consuming process. For this reason, the GUI was implemented, so that with just a few clicks and details any layer could be added to the network. a) The activation category includes the following activation layers

  • Absolute Value (AbsVal) Layer
  • Exponential (Exp) Layer
  • Log Layer
  • Power Layer
  • Parameterized rectified linear unit (PReLU) Layer
  • Rectified linear unit (ReLU) Layer
  • Sigmoid Layer
  • Hyperbolic tangent (TanH) Layer

b) The critical category includes the most crucial layers

  • Accuracy Layer
  • Convolution Layer
  • Deconvolution layer
  • Dropout Layer
  • InnerProduct (Fully Connected) Layer
  • Pooling Layer
  • Softmax classification Layer

c) The weight initializers include the following options

  • Constant
  • Uniform
  • Gaussian
  • Positive Unit Ball
  • Xavier
  • MSRA
  • Bilinear

d) Normalization layer includes the following options

  • Batch Normalization (BatchNorm) Layer
  • Local Response Normalization (LRN) Layer
  • Multivariate Response Normalization (MVN) Layer

e) Loss Layer includes the followin optons: -Hinge Loss Layer

  • Contrastive Loss Layer
  • Eucledean Loss Layer
  • Multinomial Logistig Loss Layer
  • Sigmoid Cross Entropy Loss Layer

f) Data and Extra Layers:

  • Maximum Argument (ArgMAx) Layer
  • Binomial Normal Log Likelihood (BNLL) Layer
  • Element wise operation (Eltwise) Layer
  • Image Data Layer
  • LMDB/LEVELDB Data Layer

g) Every Layer has all the parameters listed in the GUI, of which the non compulsory parameters can be kept commented using the radiobutton in the GUI,

h) One more important feature included is that user can display the layers. The facility to delete any particular layer, or add any layer in the end or in between two already implemented layers is also feasible through the usage of the GUI.

These properties of the GUI were made possible with a set of aorund 6500 lines of code over a range of arounf 12-15 files.

Work7: Active Appereance Model feature points over the face have had many application like emotion detection, face recognition etc. It’s of the personal researches we have undertaken which is based on finding these feature points using Convolutional Neural Networks. The network and the trained weights presented in the example in the library is one of the base models we have used. The main reason to add this feature was to show as to how widespread the uses of the integration of caffe library with opendetection could be to the users. Very few works exist on this end, and hence the purpose behind taking up the research. This is a very crude and preliminary model of the research, just for the young users to be encouraged as to the extent to which cnn may work and how opendetection algorithm would help facilitate the same.

Work8: Object reconition has two components: object localization and then classification. Classification module has already be included in the system, the localization part is introduced in this work. The task of object localization has been completed using selective search algorithm. The algo, when put simply, involves, Graph based image segmentation, followed by finding different features of the all the segmented parts, then finding closeness between the features of the neighboring parts and finally merging the closest parts and continuing futher till the algorithm is breaked. The image segmentation was adopted from Graph based image segementation mentioned here with proper permissions. The next part involved image preprocessing, which had conversion of BGR image to YCrCb, equalizing the first channel and reconversion of equalized YCrCb image to BGR color type. This was followed by the steps: image is stored in “.ppm” format as the segmentation code only prefers image in that format. Image is then segmented using the segment_image function and to find the number of segments, num, it is converted to grayscale and the number of colors there then represent the number of segments. The next step is to create a list of those segments. It is not often possible to create an uchar grayscale image mask with opencv here, because, opencv supports color version from 0 to 255 and in most cases the segments are greater than 255. Thus, we first store, every pixel’s value in the previous rgb image with the pixel’s location into a text file named “segmented.txt”.Finally, the steps were adopted, calculating histogram of the different features ( hessian matrix, orientation matrix, color matrix, differential excitation matrix), finding neighbors for each of the clustered region, finding similarities( or closure distance) between two regions based on the histogram of different features, merging the closest regions removing very small and very big clusters, and adding ROIs to images based on merged regions. This selective search has a set of 13 parameters which drive the entire algo here. The work here was completed with addition of around 2000 lines of code.

Work9: Segnet is a caffe derived library used for object recognition and segmentation purposes. It is a widely used library and the components are very much similar to caffe library. Thus there existed this logical compulsion to include the library so that the users may use segnet based training/classification/segmentation examples through opendetection wrapper. Addition of this library would allow segnet library users to attach it to opendetection in way as done with caffe library. Herein, the example included for now, is a python wrapper based image segmentation preview. The network and the weights are adopted from segnet example module.

Work10: Any image classifier training requires the dataset to be annotated. For this reason, we have added an annotation tool, which will enable users to label, crop or create bounding boxes over an object in image. The output of this tool is customized in a way which is required by the caffe library.

The features and some usage points involved are:

  • User may load a single image from a location using the “Select the image location” button or the user may point towards a complete image dataset folder.
  • Even if the user points to a dataset folder, there exists an option of choosing an image from some another location while the annotation process is still on.
  • Even if user selects a single image, the user may load more single images without changing the type of annotation.
  • The first type of annotation facility is, annotating one bounding box per image.
  • The second, annotating and cropping one bounding box per image.
  • The third one, annotating multiple bounding boxes per image, with attached labels.
  • The fourth one, cropping multiple sections from same image, with attached labels.
  • The fifth one, annotationg a non rectangular ROI, with attached labels.
  • If a user makes mistake in annotation, the annotation can be reset too.

Note: Every image that is loaded, is resized to 640×480 dimensions, but the output file has points of the bounding boxes as the original image size

The output files generated in the cases have annotation details as,

  • First case, every line in the output text file has a image name followed by four points x1 y2 x2 y2, first two representing top left coordinate of the box and the last two representing bottom right coordinates of the box.
  • Second case, every line in the output text file has a image name followed by four points x1 y2 x2 y2, first two representing top left coordinate of the box and the last two representing bottom right coordinates of the box. The cropped images are stored in the same folder as the original image, with name, <original_image_name>_cropped.<extension_of_the_original_image>
  • Third case, every line in the output text file has a image name followed by a lebel and then the four points x1 y2 x2 y2, first two representing top left coordinate of the box and the last two representing bottom right coordinates of the box. If there are multiple bounding boxes, then after image name there is a label, then four points, followed another label, and the corresponding four points and so on.
  • Fourth case, Once the file is saved, the cropped images will be saved in the same forlder as the original image with name as <original_image_name>_cropped_<label>_<unique_serial_id>.<extension_of_the_original_image>.
  • Fifth case, The output of the file will be saved as filename, followed by an unique id to the ROI, label of the roi, set of points in the roi, then again another id, its label and the points and so on.

To select any of these cases, select the image/dataset and then press the “Load the image” button.

First case usage

  • Select the image or the dataset folder.
  • Press the “Load the image” button.
  • To create any roi, first left click on top left point of the supposed roi and then right click on the bottom right point of the supposed roi. A green rectangular box will appear.
  • Now, if its not the one you meant it, please click “Reset Markings” Button and repoint the new roi.
  • If the ROI is fine, press “Select the ROI” button.
  • Now, load another image or save the file.

Second case usage

  • Select the image or the dataset folder.
  • Press the “Load the image” button.
  • To create any roi, first left click on top left point of the supposed roi and then right click on the bottom right point of the supposed roi. A green rectangular box will appear.
  • Now, if its not the one you meant it, please click “Reset Markings” Button and repoint the new roi.
  • If the ROI is fine, press “Select the ROI” button.
  • Now, load another image or save the file.

Third case usage

  • Select the image or the dataset folder.
  • Press the “Load the image” button.
  • To create any roi, first left click on top left point of the supposed roi and then right click on the bottom right point of the supposed roi. A green rectangular box will appear.
  • Now, if its not the one you meant it, please click “Reset Markings” Button and repoint the new roi.
  • If the ROI is fine, please type an integer label in the text box and press “Select the ROI” button.
  • Now, you may draw another roi, or load another image, save the file.
  • Note: In the third case, the one with multiple ROIs per image, if a boundix box is selected for an image and you are trying to make another and press the reset button, the selected roi will not be deleted. Any selected roi cannot be deleted as of now.

Fourth case usage

  • Select the image or the dataset folder.
  • Press the “Load the image” button.
  • To create any roi, first left click on top left point of the supposed roi and then right click on the bottom right point of the supposed roi. A green rectangular box will appear.
  • Now, if its not the one you meant it, please click “Reset Markings” Button and repoint the new roi.
  • If the ROI is fine, please type an integer label in the text box and press “Select the ROI” button.
  • Now, you may draw another roi, or load another image, save the file.
  • Once the file is saved, the cropped images will be saved in the same forlder as the original image with name as <original_image_name>_cropped_<label>_<unique_serial_id>.<extension_of_the_original_image>

Fifth case usage

  • Select the image or the dataset folder.
  • Press the “Load the image” button.
  • To create any roi, Click on the points needed only with left click.
  • Now, if its not the one you meant it, please click “Reset Markings” Button and repoint the new roi.
  • If the ROI is fine, please type an integer label in the text box and press “Select the ROI” button. A gree color marking covering the region and passing through the points you have selected will appear.
  • Now, you may draw another roi, or load another image, save the file.

Thus, this tool, is an extremely important addition to the project and was added as a set of 1600 lines of code on around 6-8 files in the opendetection library.

The corresponding source-codes, brief tutorials and commits, can be accessed here

For Compilation of the library, refer to the link here

Upcoming Work:

a) Resolve the issue of cpp version of AAM and segnet based classifier

b) Heat map generator using cnn ( will require time as its is quite research intensive part)

c) Work to be integrated with Giacomo’s work and to be pushed to master.

d) API Documentation for the codes added.

e) Adding video Tutorials to the blog.


Happy Coding 🙂 !!!


Basic Image Feature Extraction Tools

Authors : Abhishek Kumar Annamraju, Akashdeep Singh, Devprakash Satpathy, Charanjit Nayyar

Hello Friends,

I think its been almost 6-7 months since my last post came up. Well I will make sure that this doesn’t happen now. To state my research this semester, I will post some cool stuffs on image filtering techniques, advanced bio-medical Image processing techniques, implementation of neural networks with image processing, object detection, tracking, and 3D representation techniques and a touch-up of basic mosaicing techniques. Its a long way to go……………

Today its the time to brush up some basics. The main aim of this post is to introduce the basic image feature extraction tools. Tools!!!!!! , by tools I mean the simple old school algorithms which bring out the best from images and help the process of advanced image processing.

Lets start with understanding the meaning of image feature extraction, In machine learning, pattern recognition and in image processing, feature extraction starts from an initial set of measured data and builds derived values (features) intended to be informative, non redundant, facilitating the subsequent learning and generalization steps, in some cases leading to better human interpretations. In various computer visions, feature extraction applications widely used is the process of retrieving desired images from a large collection on the basis of features that can be automatically extracted from the images themselves. Feature extraction is related to dimensionality reduction:
1)It involves building derived values from the pool of data which is called as the information.
2)It is not non redundant set of data.
3)It is related to dimensionality reduction.

Here are some basic feature extraction codes and respective results and mind my words, I will be playing with your heads,or to put it simply extracting main features of your brain…confused???

a)BRISK Features: Binary Robust invariant scalable keypoints. A comprehensive evaluation on benchmark datasets reveals BRISK’s adaptive, high quality performance as in state-of-the-art algorithms, albeit at a dramatically lower computational cost (an order of magnitude faster than SURF in cases). The key to speed lies in the application of a novel scale-space FAST-based detector in combination with the assembly of a bit-string descriptor from intensity comparisons retrieved by dedicated sampling of each keypoint neighborhood.

Here’s the research paper to BRISK :

Lets get to the code : https://drive.google.com/file/d/0B-_KU2rDr3aTT3JLdGFzM2x1ZEU/view?usp=sharing


main_image                 feature_brisk

I think now you got what I meant by extracting features off your brain!!!!!!!!!!!!!!!!!!!!

Here in the code you will find two main things, one is a constructor while other is the respective operator,

BRISK brisk(30, 3, 1.0f);
brisk(image, Mat(), keypoints, result, false );
BRISK brisk(int thresh, int octaves, float patternScale);

Application based analysis of Parameters:
1) Thresh – Greater the values, lesser are the features detected, that doesn’t mean you will keep it to 0, because in that case the features detected may be redundant.
2) Octaves: Value varies from 0 to 8, greater the values, more the image will be scaled to extract the features
3) PatternScale: Lesser the value more the features as well as redundancies.

b) Fast Features : Features from Accelerated Segment Test. The algorithm operates in two stages 2 : in the first step, a segment of the test based on the relative brightness is applied to each pixel of the processed image; the second stage refines and limit the results by the method of non-maximum suppression. As the non maximal suppression is only performed to a small
subset of image points, which passed the first segment test, the processing time remains short.
FAST.pdf : https://drive.google.com/file/d/0B-_KU2rDr3aTLV9ndlJidHRBUmM/view?usp=sharing

Lets get to the code : https://drive.google.com/file/d/0B-_KU2rDr3aTLVpwSFNjSi1RaEE/view?usp=sharing

main_image          feature_fast

In the code you will find this segment

FASTX(InputArray image, vector&amp;amp;lt;KeyPoint&amp;amp;gt;&amp;amp;amp; keypoints, int threshold, bool nonmaxSuppression, int type);

Application based analysis of Parameters:
1) Threshold: Lesser the value more the features as well as redundancies.
2) nonmaxSuppression: Non-maximum supression is often used along with edge detection algorithms. The image is scanned along the image gradient direction, and if pixels are not part of the local maxima they are set to zero. This has the effect of suppressing all image information that is not part of local maxima. when true, the algorithm is applied.
3) Type: FastFeatureDetector::TYPE_a_b : For every feature point with respect to “a” neighbour pixels, store the “b” pixels around it as a vector.

c)Harris Corner Detector: Harris corner detector is based on the local autocorrelation function of a signal which measures the local changes of the signal with patches shifted by a small amount in different directions.
Harris Corner.pdf : https://drive.google.com/file/d/0B-_KU2rDr3aTYjhOeVpCeWtWQ0k/view?usp=sharing

Code: https://drive.google.com/file/d/0B-_KU2rDr3aTelFLVkZ3dzk4UDg/view?usp=sharing

Results :

main_image          feature_harris_corner

cornerHarris( image_gray, dst, blockSize, apertureSize, k, BORDER_DEFAULT );

Application based analysis of Parameters:
1) blockSize: More the size, more is the blurring and lesser are the detected corners
apertureSize: Its the kernel size, greater the value, greater is filtering of detected corners
2) k: greater the value, greater the edges are preserved and lesser are the corners detected

d) ORB Features : Oriented BRIEF Features. RB (Oriented FAST and Rotated BRIEF) is a fast robust local feature detector, first presented by Ethan Rublee et al. in 2011, that can be used in computer vision tasks like object recognition or 3D reconstruction. It is based on the visual descriptor BRIEF (Binary Robust Independent Elementary Features) and the FAST keypoint detector. Its aim is to provide a fast and efficient alternative to SIFT.
ORB.pdf : https://drive.google.com/file/d/0B-_KU2rDr3aTeC1UUkNBNlhoRFU/view?usp=sharing

Code : https://drive.google.com/file/d/0B-_KU2rDr3aTbWVhZEtpY1dqeWc/view?usp=sharing

Results :

main_image     feature_orb

Here again, in the code you will find two main things, one is a constructor while other is the respective operator,

ORB orb(500, 1.2f, 8, 31, 0, 2, ORB::HARRIS_SCORE, 31);
	orb(image, Mat(), keypoints, result, false );
ORB(int nfeatures, float scaleFactor, int nlevels, int edgeThreshold, int firstLevel, int WTA_K, int scoreType=ORB::HARRIS_SCORE, int patchSize);

Application based analysis of Parameters:
1) nfeatures: Indicates maximum number of features to be detected
scaleFactor: Pyramid decimation ratio, greater than 1. scaleFactor==2 means the classical pyramid, where each next level has 4x less pixels than the previous, but such a big scale factor will degrade feature matching scores dramatically. On the other hand, too close to 1 scale factor will mean that to cover certain scale range you will need more pyramid levels and so the speed will suffer (as per OPENCV WEBSITE).
2) nlevels: The number of pyramid levels. The smallest level will have linear size equal to input_image_linear_size/pow(scaleFactor, nlevels)
3) edgeThreshold: greater the value, lesser are the feature points
4) WTA_K : The number of points that produce each element of the oriented BRIEF descriptor. The default value 2 means the BRIEF where we take a random point pair and compare their brightnesses, so we get 0/1 response. Other possible values are 3 and 4. For example, 3 means that we take 3 random points (of course, those point coordinates are random, but they are generated from the pre-defined seed, so each element of BRIEF descriptor is computed deterministically from the pixel rectangle), find point of maximum brightness and output index of the winner (0, 1 or 2). Such output will occupy 2 bits, and therefore it will need a special variant of Hamming distance, denoted as NORM_HAMMING2 (2 bits per bin). When WTA_K=4, we take 4 random points to compute each bin (that will also occupy 2 bits with possible values 0, 1, 2 or 3) (as per OPENCV WEBSITE).

e) Shi Tomasi Corner Detector : We have come up with this earlier, https://abhishek4273.com/2014/07/20/motion-tracking-using-opencv/

f) SIFT Features: Scale Invarient Feature Transform. SIFT keypoints of objects are first extracted from a set of reference images[1] and stored in a database. An object is recognized in a new image by individually comparing each feature from the new image to this database and finding candidate matching features based on Euclidean distance of their feature vectors. From the full set of matches, subsets of keypoints that agree on the object and its location, scale, and orientation in the new image are identified to filter out good matches. The determination of consistent clusters is performed rapidly by using an efficient hash table implementation of the generalized Hough transform. Each cluster of 3 or more features that agree on an object and its pose is then subject to further detailed model verification and subsequently outliers are discarded. Finally the probability that a particular set of features indicates the presence of an object is computed, given the accuracy of fit and number of probable false matches. Object matches that pass all these tests can be identified as correct with high confidence.

sift1.pdf : https://drive.google.com/file/d/0B-_KU2rDr3aTTWk3M2xJVnU4SkE/view?usp=sharing
sift2.pdf : https://drive.google.com/file/d/0B-_KU2rDr3aTbDlZQkcycXNzdmc/view?usp=sharing

code: https://drive.google.com/file/d/0B-_KU2rDr3aTWThQdTJiWktDaUk/view?usp=sharing

Results :

main_image        feature_sift

g) SURF Features : Speeded Up Robust Features. SURF is a detector and a high-performance descriptor points of interest in an image where the image is transformed into coordinates, using a technique called multi-resolution. Is to make a copy of the original image with Pyramidal Gaussian or Laplacian Pyramid shape and obtain image with the same size but with reduced bandwidth. Thus a special blurring effect on the original image, called Scale-Space is achieved. This technique ensures that the points of interest are scale invariant. The SURF algorithm is based on the SIFT predecessor.
surf.pdf: https://drive.google.com/file/d/0B-_KU2rDr3aTMDhvanl0TlhLVEU/view?usp=sharing

code : https://drive.google.com/file/d/0B-_KU2rDr3aTNWVDNU10aGJjQTA/view?usp=sharing

Results :

main_image       feature_surf

So this is it from my side with respect to basic feature detection. Keep looking forward for my posts.

Thank you guys!!!!
Adios Amigos!!!!!!!


Hello Friends,

While researching about various trackers in my hexapod project I came across a very simple python code that was tracking on the basis of movements. But it was based on old Matlab API. So I wanted to implement it in OpenCV. Tracking any object in a video is a very important part in the field of Robotics. For eg. suppose you want to track moving vehicles at traffic signals(Project Transpose,IIM Ahmedabad), track moving obstacles for an autonomous robot( Project Hexapod,Bits Pilani KK Birla Goa Campus), finding life existence in unmanned areas, etc.

You can download the code from here: https://github.com/abhi-kumar/OPENCV_MISC/blob/master/track_motion.cpp

Lets go through the major snippets of the code.

#include <stdio.h>
#include <cv.h>
#include <highgui.h>

These are the libraries for the old C based OpenCV modules

#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/video/tracking.hpp"

These are the libraries for the new C++ based OpenCV modules

#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>

The standard C++ libraries

float MHI_DURATION = 0.05;                
float MAX_TIME_DELTA = 12500.0;
float MIN_TIME_DELTA = 5;
int visual_trackbar = 2;

These are the parameters to be used in the tracking function. Please note that they may change according to the type of camera being used.
1.Timestamp– Current time in milliseconds or other units.
2.MHI_DURATION-Maximal duration of the motion track in the same units as timestamp
3.DELTA_TIME-Minimal (or maximal) allowed difference between mhi values within a pixel neighborhood.

calcMotionGradient(motion_history, mg_mask, mg_orient, 5, 12500.0, 3);
segmentMotion(motion_history, seg_mask, seg_bounds, timestamp, 32);

To understand these three major lines you must go through these links
1. http://docs.opencv.org/modules/video/doc/motion_analysis_and_object_tracking.html#updatemotionhistory

2. http://docs.opencv.org/modules/video/doc/motion_analysis_and_object_tracking.html#calcmotiongradient

3. http://docs.opencv.org/modules/video/doc/motion_analysis_and_object_tracking.html#segmentmotion

Now the compilation and running the application in ubuntu
1.Download the code
2.Open a terminal an traverse to the folder containing the code
(Assuming you named the code file as “track_motion.cpp”)
a)chmod +x track_motion.cpp
b)g++ -ggdb `pkg-config –cflags opencv` -o `basename track_motion.cpp .cpp` track_motion.cpp `pkg-config –libs opencv`

The default trackbar will be set to binary view, any motion detected will be tracked in white color. Changing the trackbar position to number “1” will provide a grayscale view and in the same way number “0” is RGB and number “3” is in HSV.

Here is a demo video link to get an overview of the different views of the application:

I hope you benefit from this code.

Thank you 🙂


Hello Friends,

Till today I was working with OpenCV on ubuntu platform. But we should also have knowledge about using the OpenCV libraries with Visual studio in Windows. Here I will be explaining how to integrate OpenCV with Windows 32-bit and 64-bit versions.


Download OpenCV from here: http://opencv.org/downloads.html

Double click on the downloaded exe file and when it is is being extracted select the folder name as “opencv” and extract it in your C drive.


Now we need to add the path of the extracted libraries to Environment variables.
Go to “Control Panel” >> “System and Security” >> “System” and click on “advanced system properties” and a window will appear(the one inside the red rectangle):
Screenshot (18)

Now click on the “environment variable”, in the “system variables” select “Path” and edit it by adding these lines to it:
64-Bit users

32-Bit users

Assuming that you are using Visual Studio 2010
You are basically adding path to bin in the environment variables, so make sure the path is correct and make appropriate changes if necessary.

Click on “OK” in every window that has been opened to make the changes.

Open Visual studio 2010 and create a new visual c++ win32 console application project. Name it something and create it.

Now in that select the “View” menu and click on “Property Manager”.

Only for 64-Bit version users.

Select the “Project” Menu and click on “Properties”.
Screenshot (19)

Click on the “Configuration Manager”
Select the “Active Solution Configuration” as “Release”
Select the “Active Solution Platform” and click on “”
Screenshot (20)

And in that Select the new platform as “X64”, click on “ok” and close the configuration manager.

Now in the properties window,
Select the “Configuration Properties” >> “C/C++” >> “General” and in that edit the “Additional Include libraries” by adding these two lines to it:
Screenshot (21)

Here we are adding path to include folders,make sure the path is correct as per your computer.

Select the “Configuration Properties” >> “C/C++” >> “Preprocessor” and in that select the “Preprocessor definition”,edit it by adding this to it :

Now in the properties window,
Select the “Configuration Properties” >> “Linker” >> “General” and select and edit the “Additional Library Directories”
Screenshot (22)

64-Bit version users add this line to it:

32-Bit version users add this line to it:

Make sure the path to lib is correct as per your settings.

Select the “Configuration Properties” >> “Linker” >> “Input” and click on “Additional dependencies” and edit it:
Screenshot (23)


Note: In …..246.lib,246 is the version of opencv,for me its OpenCV-2.4.6 ,So make appropriate changes according to the version you have downloaded.

Click on “apply” and “ok”

Now in the code DELETE everything and copy the test code from here: https://github.com/abhi-kumar/OPENCV_MISC/blob/master/tracker.cpp
Note:on the top of the code add this line: #include “stdafx.h”

Keep the mode as release and run it:
Screenshot (24)

Get the details of the code from : https://abhishek4273.wordpress.com/2014/07/05/track-the-region-of-interest/

So, now you have integrated OpenCV with Windows Visual Studio

Thanks 🙂


Authors: Abhishek Kumar Annamraju, Akashdeep Singh, Adhesh Shrivastava

Hello Friends,

Lets go through interesting stuff that computer vision can provide. Confused by the title????

With the application I am going to introduce, you can track down a region from a live streaming video. Suppose you take a live stream from your web-cam, and in that window you draw a rectangle using your mouse,then in the next coming frames the application will track down that region unless and untill that part of the region is in the frame. The main crux of the application is “Good features to track” and “Optical flow”

Seems interesting!!!!!!

Download the code from here:

Now lets understand the code:

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <string.h>
#include <iostream>
#include <stdio.h>
#include <stdlib.h>

These libraries are the standard C and C++ libs.

#include <cv.h>
#include <highgui.h>

These libs are the ones from opencv C

#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/video/tracking.hpp"

These libs are the ones from opencv C++,also caleed as opencv2

using namespace cv;
using namespace std;

Including the standard namespaces.

IplImage* frame, * img1;
CvPoint point;
int drag = 0;
int x_point,width_point,y_point,height_point;

Initialising the parameters to capture a video from webcam and to set the mouse.

int key = 0;
CvRect rect;
Rect region_of_interest;
int test;
Mat src,src_gray,image,src_gray_prev,src1,src_gray1,copy,copy1,frames,copy2;
int maxCorners = 23;
RNG rng(12345);
vector<Point2f> corners,corners_prev,corners_temp;
double qualityLevel = 0.01;
double minDistance = 10;
int blockSize = 3;
bool useHarrisDetector = false;
double k = 0.04;
vector<uchar> status;
vector<float> err;
float x_cord[100];
float y_cord[100];

There are the parameters we will use in “Good features to track” and “Optical flow”.

void mouseHandler(int event, int x, int y, int flags, void* param)
    if (event == CV_EVENT_LBUTTONDOWN && !drag)
        point = cvPoint(x, y);
        drag = 1;
    if (event == CV_EVENT_MOUSEMOVE && drag)
        img1 = cvCloneImage(frame);
        cvRectangle(img1,point,cvPoint(x, y),CV_RGB(255, 0, 0),1,8,0);
        cvShowImage("result", img1);
    if (event == CV_EVENT_LBUTTONUP && drag)
        rect = cvRect(point.x,point.y,x-point.x,y-point.y);
		x_point = point.x;
		y_point = point.y;
		width_point = x-point.x;
		height_point = y-point.y;
        cvShowImage("result", frame);
        drag = 0;

    if (event == CV_EVENT_RBUTTONUP)
        drag = 0;

This is the code to draw and select a region of interest from the video window.

int main(int argc, char *argv[])
    capture = cvCaptureFromCAM( 0 ); 
    if ( !capture ) {
        printf("Cannot open initialize webcam!\n" );
    cvNamedWindow( "result", CV_WINDOW_AUTOSIZE );
	int small,big; //declares integer
	int x = 1;

The above snippet captures the first frame.

while( key != 'q' )
        frame = cvQueryFrame( capture );

These line make sure that video is available till you press the key “q”

if (rect.width>0)

To check if the rectangle has been chosen or not

if(corners.size() == 0 || x==0)
				Mat frames(frame);
				src = frames.clone();
				cvtColor( src, src_gray, CV_BGR2GRAY );
				cv::Mat mask1 = cv::Mat::zeros(src.size(), CV_8UC1);  
				cv::Mat roi(mask1, cv::Rect(x_point,y_point,width_point,height_point));
				roi = cv::Scalar(255, 255, 255);
				copy1 = src.clone();		
				goodFeaturesToTrack( src_gray,
    	           k );

				int rad = 3;
  				for( int i = 0; i < corners.size(); i++ )
  				   { circle( copy1, corners[i], rad, Scalar(rng.uniform(0,255), rng.uniform(0,255),
  			            rng.uniform(0,255)), -1, 8, 0 );
				IplImage test1 = copy1;
			  	IplImage* test2 = &test1;
				x = 1;

			    cvShowImage("result", test2);

If the rectangle has just been drawn in the previous frame,the above code finds good features and saves it.

					src_gray_prev = src_gray.clone();
					corners_prev = corners;
					Mat framess(frame);
					src = framess.clone();
					cvtColor( src, src_gray, CV_BGR2GRAY ); 
					cv::Mat mask = cv::Mat::zeros(src.size(), CV_8UC1);  
					cv::Mat roi(mask, cv::Rect(x_point,y_point,width_point,height_point));
					roi = cv::Scalar(255, 255, 255);	
					Mat copy;
  					copy = src.clone();
					goodFeaturesToTrack( src_gray,
    		           k );
					calcOpticalFlowPyrLK(src_gray_prev, src_gray, corners_prev, corners, status, err);
  					int r = 3;
  					for( int i = 0; i < corners.size(); i++ )
    				 { circle( copy, corners[i], r, Scalar(rng.uniform(0,255), rng.uniform(0,255),
    			          rng.uniform(0,255)), -1, 8, 0 );
					 x_cord[i] = corners[i].x;
					 y_cord[i] = corners[i].y;
					IplImage test3 = copy;
					IplImage* test4 = &test3;
					cvShowImage("result", test4);			

Now once the features have been saved,it is tracked.

vSetMouseCallback("result", mouseHandler, NULL);
        key = cvWaitKey(10);
        if( (char) key== 'r' )
		rect = cvRect(0,0,0,0); cvResetImageROI(frame);
		x = 0;
        cvShowImage("result", frame);

Calling the mouse handler function and setting up the key to reset the region of interest.

This is major explanation gor the code.

Compilation and running:(for ubuntu users)
1)Save the and name it tracker.cpp
2)Open a terminal and traverse to that folder where you saved the code and type:
a)chmod +x tracker.cpp
b)g++ -ggdb `pkg-config –cflags opencv` -o `basename mouse2.cpp .cpp` mouse2.cpp `pkg-config –libs opencv`

Now the video will open,select the box and play with the tracker.If you want to reset the tracker to draw a new window press the key “r”.

I hope you like the application. Will be back with a more revised and more robust application in a few days.

Thank you 🙂


Hello friends,

After doing a lot of research in point cloud library,I came up with the successful integration of computer vision with point cloud library.

I hope you have opencv and pcl installed by now.If not see:

This post is the result of inspiration from http://ramsrigoutham.com/2012/06/28/integrating-pcl-and-opencv-passthrough-filter-example/

Since I was not successful in compiling and running the code from the blog mentioned above in ubuntu 12.04 I came up with another solution.

The code below filters the point cloud from a PCD file and visualizes it.

The code will work properly on every PCD file after you make sure the filter_limit parameters are set in a proper way.

Download the code from here : https://github.com/abhi-kumar/OPENCV_MISC/blob/master/opencv_pcl_filter.cpp

Here goes the explaination:

#include <pcl/point_cloud.h>
#include <pcl/io/pcd_io.h>
#include <pcl/io/ply_io.h>
#include <pcl/point_types.h>
#include <pcl/filters/passthrough.h>
#include <pcl/visualization/pcl_visualizer.h>
#include <vtkSmartPointer.h>

These are the required PCL libraries

int a = 22;
int b = 12;
int c=  10;

These are pre-set values for the trackbar

pcl::visualization::PCLVisualizer viewer ("Get the view here");

This is the window to display/visualize stuff

pcl::PointCloud<pcl::PointXYZ>::Ptr cloud(new pcl::PointCloud<pcl::PointXYZ>);
pcl::PointCloud<pcl::PointXYZ>::Ptr cloud_filtered(new pcl::PointCloud<pcl::PointXYZ>);

Here we have declared two point clouds of type POINTXYZ,you can change the type according to your PCD file.

pcl::io::loadPCDFile (argv[1], *cloud);
pcl::copyPointCloud( *cloud,*cloud_filtered);

Now we have loaded the file into the cloud and then copied into the other cloud.

cv::namedWindow( "picture");
cvCreateTrackbar("X_limit", "picture", &a, 30, NULL);
cvCreateTrackbar("Y_limit", "picture", &b, 30, NULL);
cvCreateTrackbar("Z_limit", "picture", &c, 30, NULL);

This will look familiar to OpenCV users.For those who dont know this,it is a way of creating trackbars to control any parameter,here,filter limits.

pcl::PassThrough<pcl::PointXYZ> pass;

Creating an object of class PassThrough filter

while (!viewer.wasStopped ())

		pcl::copyPointCloud(*cloud_filtered, *cloud);

        i = 0.1*((float)a);
        j = 0.1*((float)b);
        k = 0.1*((float)c);

//        cout << "i = " << i << " j = " << j << " k = " << k << endl;

        pass.setInputCloud (cloud);
	pass.setFilterFieldName ("y");
        pass.setFilterLimits (-j, j);
        pass.filter (*cloud);

        pass.setInputCloud (cloud);
        pass.setFilterFieldName ("x");
        pass.setFilterLimits (-i, i);
        pass.filter (*cloud);

	pass.setInputCloud (cloud);
        pass.setFilterFieldName ("z");
        pass.setFilterLimits (-k,k);
        pass.filter (*cloud);

	viewer.addPointCloud (cloud, "scene_cloud");
	viewer.spinOnce ();

Iterating though the cloud,setting filter parameters and linking it to the trackbars.

Now for the compiling part.
1)Download this file and save it as pcl.sh in the same folder where you have saved opencv_pcl_filter.cpp : https://github.com/abhi-kumar/OPENCV_MISC/blob/master/pcl.sh

2)Download this sample PCD file and save as test.pcd in the same folder : https://github.com/abhi-kumar/OPENCV_MISC/blob/master/test.pcd

3) open a terminal and type
a)chmod +x pcl.sh
b)./pcl.sh opencv_pcl_filter.cpp
c)./opencv_pcl_filter test.pcd

And you will see something like this:
Screenshot from 2014-07-01 17:29:43

And thats it,we have successfully created a filter in PCL.

Thanks 🙂


Hello Friends,

This is the partial test implementation of http://blindperception.wordpress.com/2014/05/18/object-detection-in-multi-modal-images-using-genetic-programming/

Please visit the above link to understand what is given below.

Today I would like to share a part with you the current project am doing.In this post I will just be showing how a series of operators can be able to detect a region/object of interest from a given image.

Later I will be introducing genetic programming into it,but for now lets just see the basic(core) part of the project. We will stick to testing on satellite images.

My result:-

A T72 tank image from a satellite:

Its ground truth detection:

My detection:

The series of operators used were :-

Download the testing code from:-

The entire implementation will be up in a few days.

Thank you 🙂