Image for post
Photo by David Clode on Unsplash

Movidius Neural Compute Stick

Intel makes a device which can be plugged into a Raspberry Pi3 (among other supported boards) to run neural networks with efficiency. The device is called Neural Compute Stick (NCS) and attaches over USB port. Intel provides with a toolchain which can be used to port Tensorflow and Caffe models to NCS supported format.

Choosing a network

We wanted to port our own custom trained model to NCS. The journey has been, well, insightful and interesting. The task we had at hand was of object detection. We wanted to use Tensorflow because of the familiarity we had with the framework. We decided to use InceptionV3 as our base network and retrain it.

Training the network

As it goes, there are various ways to retrain a model to bias it towards the objects you want to detect. We used retrain.py script provided by Tensorflow, which uses Tensorflow Hub to ease the tasks.

The script takes InceptionV3, adds a few layers such as Placeholder and softmax along with some variables. It then retrains the model. You can read the comments in the script to know the details of the training phases (it involves creating bottleneck files, has options to distort images etc).

Porting InceptionV3

To port this retrained model, we need to use NCSDK which provides a compiler, a checker and a few other tools.

The retraining script outputs various files: checkpoints, labels, graph, weights etc. We piped the final protobuf file (.pb extension) to the NCSDK’s compiler:

mvNCCompile retrained_graph.pb -in=Placeholder -on=final_result -o retrained.graph

and…

Toolkit Error: Stage Details Not Supported: VarHandleOp.

Not very useful, but it was something. If you want to know more about this issue follow our conversation on the ncsforum.

The first suspects were unsupported layers or variables that were added during the retraining phase. But it was not clear which layer(s) it could have been. While exploring the NCS forums, somebody suggested that specifying a different input layer worked for them. Taking the hint, we tried various input layers, and the following compilation worked:

mvNCCompile model.pb -in=input/BottleneckInputPlaceholder -on=final_result -o retrained.graph

Success was inspiring for a moment there, but soon enough the limitations were obvious. The input tensor dimensions were (1, 1, 2) which was trouble.

However, it was clear after the result that it was surely a matter of unsupported layers that was causing the NCS compiler to fail.

We modified the script to give the input layer a specific, concrete name — so to speak — and tried to compile. However, this time we had changed the NCSDK’s code to enable debug messages as suggested by a nice fellow on the forums. Here is what the compilation said:

Toolkit Error: Stage Details Not Supported: Top Not Found module_apply_default/hub_input/Sub

Well, that is more information than the previous error message (VarHandleOp not supported). It looked like something being inserted by Tensorflow Hub APIs was causing the problem.

We thought that may be, just in case, it was some variables which needed to be frozen, and graph had to be transformed for deployment. So used Tensorflow tool’s graph_transforms:

transform_graph \

— in_graph=retrained.pb \

— out_graph=optimized_retrained.pb \

— inputs=’Placeholder’ \

— outputs=’final_result’ \

— transforms=’

strip_unused_nodes(type=float, shape=”1,299,299,3″)

remove_nodes(op=Identity, op=CheckNumerics)

fold_constants(ignore_errors=true)

fold_batch_norms

Fold_old_batch_norms’

We took the optimized graph, and compiled it. With no luck, of course!

You should know, that up to this point, we had not ported any other tensorflow models, not even the default ones successfully. Because we had not even tried that! Our bad, and so we did just that.

Movidius guide says it supports many pretrained Tensorflow models out of the box with some commands (see the link for details). And sure enough, InceptionV3 was converted to NCS supported graph. It was not retrained model, but it was a good sign.

Then we decided to follow the example given by the guide to port MNIST model. We studied it and performed all the instructions. The resulting graph was indeed supported by the NCSDK, and we could compile it.

We decided that we need to stop using Tensorflow Hub, and write the process from scratch. In the process we discovered Tensorflow for Poets which does not use hub in the process. This saved us the effort of writing the training script from scratch.

You can study the retraining script, which closely resembles Tensorflow’s retrain script. But it differs in some crucial aspects. It does not add the same layers, and it does not use Tensorflow Hub anywhere in the process. This gives us much better control over the training process.

As a cursory check, we tried to retrain on InceptionV3 with this new script and port it with NCSDK. You know what? It failed. Yes, it failed!

We do not give up just like that, oh no. What we did instead, was to retrain on MobileNet with the new script. This time, it worked. We could compile the graph file:

python -m scripts.retrain — bottleneck_dir=tf_files/bottlenecks — how_many_training_steps=4000 — model_dir=tf_files/models/ — summaries_dir=tf_files/training_summaries/tmp — output_graph=tf_files/retrained.pb — output_labels=tf_files/tmp.txt — image_dir=training_data/

transform_graph — in_graph=retrained.pb — out_graph=optimized_retrained.pb — inputs=’input’ — outputs=’final_result’ — transforms=’

strip_unused_nodes(type=float, shape=”1,224,224,3″)

remove_nodes(op=Identity, op=CheckNumerics, op=PlaceholderWithDefault)

fold_batch_norms

fold_old_batch_norms’

We took optimized graph, and compiled it:

mvNCCompile optimized_retrained.pb -in=input -on=final_result -o retrained.graph

The final generated graph is usable on the Movidius stick.

We tried using MobileNet with the Tensorflow Hub script, but the resulting graph could not be ported with NCSDK.

We used various versions of MobileNet as base networks and measured performance. Here are some of the data:

Image for post

Concluding Remarks:

  • If you want to use pretrained InceptionV3 on Movidius, you can easily use it without any fuss.
  • We could not port custom trained InceptionV3 model to run on Movidius stick.
  • Retraining using Tensorflow Hub seems to make the models incompatible with NCSDK.
  • MobileNet retrained without Tensorflow Hub is a good option to run the models on Movidius.
  • There are clear performance observations which can help decide which model suits your needs.
  • We suggest you go through our discussion on ncsforum to get some technical idea of the process.
0 CommentsClose Comments

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.