Skip to content

Image Classification

Image classification is performed using a pre-trained model, NASNet Mobile 224, that we have chosen because of its size, performance and accuracy. To get a basic understanding of how this works, you can read Image Classification using Deep Neural Networks.

In addition, we manually matched the model classification with the labels you see in our UI:

cat:
  label: cat
  threshold: 0.3
  priority: 5
  categories:
    - animal

tabby cat:
  see: cat

This was necessary because we didn't find a taxonomy suitable for consumers (mainly just scientific ones) and needed a lot of control to fine tune terms and their probability thresholds. The raw results were not useful to a typical user. Indexing too many words, categories and alternatives also negatively affects performance and leads to noise.

It took us several months of testing until we were happy with the results and there are still labels to improve.

Updating labels

After editing or adding labels in rules.yml, you now have to run make generate in the main project directory to generate native Go source from this file.

Pre-trained Models

See also: TensorFlow Hub

Source: https://github.com/tensorflow/models/blob/master/research/slim/README.md

Neural nets work best when they have many parameters, making them powerful function approximators. However, this means they must be trained on very large datasets. Because training models from scratch can be a very computationally intensive process requiring days or even weeks, there are various pre-trained models available. These CNNs have been trained on the ILSVRC-2012-CLS image classification dataset.

Note that the VGG and ResNet V1 parameters have been converted from their original caffe formats (here and here), whereas the Inception and ResNet V2 parameters have been trained internally at Google. Also be aware that these accuracies were computed by evaluating using a single image crop. PhotoPrism uses three crops, except for square images.

Model TF-Slim File Checkpoint Top-1 Accuracy Top-5 Accuracy
Inception V1 Code inception_v1_2016_08_28.tar.gz 69.8 89.6
Inception V2 Code inception_v2_2016_08_28.tar.gz 73.9 91.8
Inception V3 Code inception_v3_2016_08_28.tar.gz 78.0 93.9
Inception V4 Code inception_v4_2016_09_09.tar.gz 80.2 95.2
Inception-ResNet-v2 Code inception_resnet_v2_2016_08_30.tar.gz 80.4 95.3
ResNet V1 50 Code resnet_v1_50_2016_08_28.tar.gz 75.2 92.2
ResNet V1 101 Code resnet_v1_101_2016_08_28.tar.gz 76.4 92.9
ResNet V1 152 Code resnet_v1_152_2016_08_28.tar.gz 76.8 93.2
ResNet V2 50^ Code resnet_v2_50_2017_04_14.tar.gz 75.6 92.8
ResNet V2 101^ Code resnet_v2_101_2017_04_14.tar.gz 77.0 93.7
ResNet V2 152^ Code resnet_v2_152_2017_04_14.tar.gz 77.8 94.1
ResNet V2 200 Code TBA 79.9* 95.2*
VGG 16 Code vgg_16_2016_08_28.tar.gz 71.5 89.8
VGG 19 Code vgg_19_2016_08_28.tar.gz 71.1 89.8
MobileNet_v1_1.0_224 Code mobilenet_v1_1.0_224.tgz 70.9 89.9
MobileNet_v1_0.50_160 Code mobilenet_v1_0.50_160.tgz 59.1 81.9
MobileNet_v1_0.25_128 Code mobilenet_v1_0.25_128.tgz 41.5 66.3
MobileNet_v2_1.4_224^* Code mobilenet_v2_1.4_224.tgz 74.9 92.5
MobileNet_v2_1.0_224^* Code mobilenet_v2_1.0_224.tgz 71.9 91.0
NASNet-A_Mobile_224# Code nasnet-a_mobile_04_10_2017.tar.gz 74.0 91.6
NASNet-A_Large_331# Code nasnet-a_large_04_10_2017.tar.gz 82.7 96.2
PNASNet-5_Large_331 Code pnasnet-5_large_2017_12_13.tar.gz 82.9 96.2
PNASNet-5_Mobile_224 Code pnasnet-5_mobile_2017_12_13.tar.gz 74.2 91.9

^ ResNet V2 models use Inception pre-processing and input image size of 299 (use --preprocessing_name inception --eval_image_size 299 when using eval_image_classifier.py). Performance numbers for ResNet V2 models are reported on the ImageNet validation set.

(#) More information and details about the NASNet architectures are available at this README

All 16 float MobileNet V1 models reported in the MobileNet Paper and all 16 quantized TensorFlow Lite compatible MobileNet V1 models can be found here.

(^#) More details on MobileNetV2 models can be found here.

(*): Results quoted from the paper.

Here is an example of how to download the Inception V3 checkpoint:

$ CHECKPOINT_DIR=/tmp/checkpoints
$ mkdir ${CHECKPOINT_DIR}
$ wget http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz
$ tar -xvf inception_v3_2016_08_28.tar.gz
$ mv inception_v3.ckpt ${CHECKPOINT_DIR}
$ rm inception_v3_2016_08_28.tar.gz

Landmark detection

DELF: DEep Local Features - https://github.com/tensorflow/models/tree/master/research/delf - Tensorflow implementation

https://gitcdn.xyz/cdn/Tony607/blog_statics/ce9c3391932e24655b78e27a54543f28f11f3af0/images/landmark/query.jpg

Source: https://gitcdn.xyz/cdn/Tony607/blog_statics/ce9c3391932e24655b78e27a54543f28f11f3af0/images/landmark/query.jpg

Types of neural networks

http://www.asimovinstitute.org/wp-content/uploads/2016/09/neuralnetworks.png Source: http://www.asimovinstitute.org/neural-network-zoo/

External Resources