Friday, November 18, 2016

Testing Classifier Performance

As I previously mentioned I wrote a python script to test how well our classifier detects narrow helixes. I wanted to test a small sample size, so I took 5 ear samples from a folder within our research gdrive to test against. The results were a bit discouraging, but I realized there's a few things I can do to better train the cascade. (source code)


Trial 1 - 4 Narrow Helixes Detected
Trial 2 - 7 Narrow Helixes Detected

Trial 3- 4 Narrow Helixes Detected
Trial 4 - 5 Narrow Helixes Detected

Trial 5 - 5 Narrow Helixes Detected

The results from testing were very inaccurate. There should be only one narrow helix detected in a sample image. A few things that I believe will help are providing more samples to train with. With this classifier there are two positive samples per negative sample. In resources I found many cascades were trained with a high ratio of negative samples compared to positives. Also in my test script I specify minNeighbors to equal 5. Which means that my program will detect at least 5 objects before it declares that a narrow helix is found. I believe if I increase the minimum neighbors the detection will be more accurate.

Thursday, November 17, 2016

Testing Our Classifier to Detect Narrow Helixes

To test the performance of our newly created classifier. I created a python script that runs the classifier against a set of ear samples that were not used to train the classifier. None of the positive samples that went into building our positive vector file were included. Our wonderful mentor Dr. Washington stressed to "Don't test on what you train!" doing so will greatly skew the results and not provide an accurate depiction of the classifier's quality.

Classifier Format

The train cascade tool created and converted our cascade to multiple xml files. Our narrow_helix_cascade contains xml files for each stage ran during the training (stage0.xml, stage1.xml, stage2.xml, etc..) params.xml contains the supplied arguments to the train_cascade command, and the classifier.xml file has the features and results from all stages of training.

Training Our Classifier

After constructing our vector file. Our next task involves using the file as input for training our classifier. This is done by using the opencv_traincascade command line tool.

opencv_traincascade -data classifier -vec narrow_positives.vec -bg narrow_negatives.txt\
  -numStages 3 -minHitRate 0.999 -maxFalseAlarmRate 0.5 -numPos 20\
  -numNeg 10 -w 24 -h 24 -mode ALL -precalcValBufSize 256\
  -precalcIdxBufSize 256

In our case classifier is the location we want the classifier files to be stored. The -vec flag requires the vec file we generated in our last step.  We supplied the file that specifies paths to all the negative sample files we created

PrecalcValBufSize indicates the amount of memory we allow for executing the program 256MB in our case. If we had a larger sample size more memory would make processing faster, but since we have a small sample size and this is one of our first trials we won't need much. The number of negative and positive samples is given and The amount of stages that we want the classifier to undergo is given with -nstages parameter.  MinHitRate stands for the minimal desired hit rate for each stage of the classifier

When trying to train the classifier, we ran into a few issues.

The attempt above, failed at the first stage. At first I was missing or incorrectly giving values to the command's parameter. I also believe that finding the right amount of stages to train effected the outcome as well as this trial's lack of negatives samples compared to the number of positives.

Eventually we were able to get a successful run. (see below)

cascadeDirName: classifier
vecFileName: narrow_positives.vec
bgFileName: narrow_negatives.txt
numPos: 20
numNeg: 10
numStages: 3
precalcValBufSize[Mb] : 256
precalcIdxBufSize[Mb] : 256
acceptanceRatioBreakValue : -1
stageType: BOOST
featureType: HAAR
sampleWidth: 24
sampleHeight: 24
boostType: GAB
minHitRate: 0.999
maxFalseAlarmRate: 0.5
weightTrimRate: 0.95
maxDepth: 1
maxWeakCount: 100
mode: ALL
Number of unique features given windowSize [24,24] : 261600

===== TRAINING 0-stage =====
POS count : consumed   20 : 20
NEG count : acceptanceRatio    10 : 1
Precalculation time: 0
|  N |    HR   |    FA   |
|   1|        1|        0|
Training until now has taken 0 days 0 hours 0 minutes 1 seconds.

===== TRAINING 1-stage =====
POS count : consumed   20 : 20
NEG count : acceptanceRatio    10 : 0.217391
Precalculation time: 0
|  N |    HR   |    FA   |
|   1|        1|        0|
Training until now has taken 0 days 0 hours 0 minutes 2 seconds.

===== TRAINING 2-stage =====
POS count : consumed   20 : 20
NEG count : acceptanceRatio    4 : 0.1
Required leaf false alarm rate achieved. Branch training terminated.

Constructing a Vec File Based on Positive Narrow Helix Samples

After creating our description file of positive and negative samples, the next step towards building our classifiers includes packing the positive samples into a vec file.

Building a vector file is done via the opencv_createsamples utility. Opencv_createsamples allows us to generate a large number of samples from a small number of input images by applying distortions and transformations to positive samples.

We wrote shell scripts to automate using a few of the opencv command line tools. The shell script for createsamples is below.

Part of our vec file generated

Creating Our Negative Samples + Negative Description File

When researching different ways to develop negative samples, we found that we obtain the best results for the classifier by having a slight variant of the features we wish to detect embedded in an image that does not contain any characteristics of the image.
Negative images can be anything, but the classifier is more accurate if it includes a variant of a positive sample. Ideally negative images would look exactly like the positive samples, except they wouldn't contain the object we want to recognize.

 Using Gimp, an image manipulation program we placed images of ears in the foreground of a background/backdrop.

Examples of Negative Samples:

Negative Description File:

Friday, November 4, 2016

Creating our description file of positive narrow helix samples

After collecting positive training images of narrow helixes we cropped our sample images of ears to just the portion that contained the helix. This cropping was done using an open source object marker tool written in python.

The object marker allows us to specify the region of interest by drawing a bounded rectangle object in each positive image then produces a text file description of the coordinates corresponding to the location of the helixes.

This data will be used to construct our positive vector file to eventually train our classifier.

Thursday, November 3, 2016

Next Steps

The Next Steps for our project:
  • Work through Haar-training tutorials
  • Generate XML file for Helix haartraining
  • Verify + Test Helix Classifier by feeding dummy images
    • Test classifier against sample images like trucks and other vehicles to ensure matches aren't returned.

Introduction to Haar Cascades

Now that we're starting to build our extraction tool we needed to gain more background information to acquire a better understanding of how haarcascades work.

Background Info:

A HaarCascasde is used to detect objects within images. This feature based classifier was first introduced in the Viola Jones Algorithm explained in the paper "Rapid Object Detection using a Boosted Cascade of Simple Features" by Paula Viola and Michael Jones. The detection method is based off of machine learning and applying a cascade function that is trained from negative and positive images. After the cascade function is trained it can be used to detect the desired object within sample images.

Viola Jones Algorithm

The Viola Jones algorithm detection algorithm depends on "Haar features" to detect the presence of a desired image in a sample, an "Integral Image" which is a representation of the original image. The integral image allows a detector to evaluate features quickly, several operation are performed per pixel from an image. After each pixel is computed any Haar feature can be detected in current time regardless of the position in the image or scale of the image. "AdaBoost" is another vital part of the algorithm and is used for feature feature selection. Adaboost increases the speed of classification by excluding irrelevant features by focusing on a subset of Haar-like features. Cascading as previously mentioned is one of the major contributions to object detection from the algorithm. Cascading increases the speed of the classifier by focusing on the critical portions of the image. Non-promising regions of a sample are disregarded. Increasingly complex processing is applied only once a feature of interest is found.

Explored Sources:
"Rapid Object Detection using a Boosted Cascade of Simple Features"
Face Detection using Haar Cascades

Wednesday, November 2, 2016

Helix Distinction: Wide vs. Narrow

The helix is located in the upper portion of the ear that consists of cartilage and resembles a y-shaped curve (see diagram of ear below).

For feature extraction to analyze the helix portion of the ear we created two categories of helix; wide or narrow. We distinguish between the two categories through looking at the amount of cartilage contained in the sample. Sample images where the helix seems to have a lot of cartilage are considered wide whereas helixes that are small and the outer rim is very defined are considered narrow in our classifier.