Image Classification¶
Image classification is performed using a pre-trained model, NASNet Mobile 224, that we have chosen because of its size, performance and accuracy. To get a basic understanding of how this works, you can read Image Classification using Deep Neural Networks.
In addition, we manually matched the model classification with the labels you see in our UI:
cat:
label: cat
threshold: 0.3
priority: 5
categories:
- animal
tabby cat:
see: cat
This was necessary because we didn't find a taxonomy suitable for consumers (mainly just scientific ones) and needed a lot of control to fine tune terms and their probability thresholds. The raw results were not useful to a typical user. Indexing too many words, categories and alternatives also negatively affects performance and leads to noise.
It took us several months of testing until we were happy with the results and there are still labels to improve.
Updating labels¶
After editing or adding labels in rules.yml, you now have to run make generate
in the main project directory to generate native Go source from this file.
Pre-trained Models¶
See also: TensorFlow Hub
Source: https://github.com/tensorflow/models/blob/master/research/slim/README.md
Neural nets work best when they have many parameters, making them powerful function approximators. However, this means they must be trained on very large datasets. Because training models from scratch can be a very computationally intensive process requiring days or even weeks, there are various pre-trained models available. These CNNs have been trained on the ILSVRC-2012-CLS image classification dataset.
Note that the VGG and ResNet V1 parameters have been converted from their original caffe formats (here and here), whereas the Inception and ResNet V2 parameters have been trained internally at Google. Also be aware that these accuracies were computed by evaluating using a single image crop. PhotoPrism uses three crops, except for square images.
^ ResNet V2 models use Inception pre-processing and input image size of 299 (use
--preprocessing_name inception --eval_image_size 299
when using
eval_image_classifier.py
). Performance numbers for ResNet V2 models are
reported on the ImageNet validation set.
(#) More information and details about the NASNet architectures are available at this README
All 16 float MobileNet V1 models reported in the MobileNet Paper and all 16 quantized TensorFlow Lite compatible MobileNet V1 models can be found here.
(^#) More details on MobileNetV2 models can be found here.
(*): Results quoted from the paper.
Here is an example of how to download the Inception V3 checkpoint:
$ CHECKPOINT_DIR=/tmp/checkpoints
$ mkdir ${CHECKPOINT_DIR}
$ wget http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz
$ tar -xvf inception_v3_2016_08_28.tar.gz
$ mv inception_v3.ckpt ${CHECKPOINT_DIR}
$ rm inception_v3_2016_08_28.tar.gz
Landmark detection¶
DELF: DEep Local Features - https://github.com/tensorflow/models/tree/master/research/delf - Tensorflow implementation
Types of neural networks¶
Source: http://www.asimovinstitute.org/neural-network-zoo/
External Resources¶
- https://ai.facebook.com/blog/roberta-an-optimized-method-for-pretraining-self-supervised-nlp-systems/ - An optimized method for pretraining self-supervised NLP systems
- zihangdai/xlnet - Generalized Autoregressive Pretraining for Language Understanding
- https://pjreddie.com/darknet/yolo/ - real time image detection
- https://pjreddie.com/darknet/imagenet/ - using use Darknet to classify images
- ZanLabs/go-yolo - Golang binding for YOLO/Darknet recognition framework
- https://itnext.io/implementing-yolo-v3-in-tensorflow-tf-slim-c3c55ff59dbe - Implementing YOLO v3 in Tensorflow (TF-Slim)
- chewxy/lingo - provides the data structures and algorithms required for natural language processing
- https://modelzoo.co/ - Discover open source deep learning code and pretrained models
- https://polarr.ai/ - Efficient and Immersive C.V. experiences on the edge
- gildasch/gildas-ai
- https://www.tensorflow.org/lite/guide/hosted_models
- https://github.com/tensorflow/models/tree/master/research/slim/nets/nasnet
- https://www.wikidata.org/wiki/Wikidata:Database_download
- ropensci/wikitaxa
- https://datahub.io/collections/yago - YAGO3 is a huge semantic knowledge base, derived from Wikipedia WordNet and GeoNames
- Google AI Blog: Improving Inception and Image Classification in TensorFlow
- CNN Architectures: LeNet, AlexNet, VGG, GoogLeNet, ResNet and more
- Gildas Chabot - AI image search with Go & Tensorflow (slides)