The hypothesis

A big stumbling block with machine learning is data. You need a lot of data to train a system, but where do you get it? In our case, we need a large corpus of images of plants, with the data about the interesting bit on them (fruits, vegetables, branch structure).

One way of doing this would be to do it manually, or with humans, but this is very expensive in terms of labour cost.

An alternative is to create an artificial data set that is as close to the real thing as possible, so the system can be trained accurately enough to work in the real world. We see an example of this approach of using a computer game (Grand Theft Auto), to train autonomous vehicle systems

Our hypothesis is that we can use artificial, simulated data as a starting point for our learning algorithms, improve them over time, until a point is reached where the difference is small enough to jump the gap to a real-world system:

First experiment

The first experiment(s) we’re doing is using Blender to create the artificial environment, and Keras/Tensorflow as the ML system.

We created two sets of images - cubes and spheres:

Cubes

Spheres

This was a tiny set, just getting to learn Keras/Tensorflow, and the experiment was based one this Keras tutorial.

Predicting individual cubes and spheres was fine, but the results for detecting spheres/cubes in an image with both was absolutely rubbish.

On the left is an image I created using cubes & spheres, and on the right you can see a heat map of the localisation result using a sliding window:

I created a system that detects empty space 👍🏼

I played around with a couple of things but it was basically hopeless.

Second experiment

The next step was to create a larger data set and not rely on the Keras randomisation algorithms to increase the size of the set. I created a Python/Blender script to create a virtual “plant”, with a very simple branch structure and a single “tomato”:

The script will orbit the camera around the object of interest and generate images for the training set, with two classes.

With the tomato

Without the tomato

Again, individual prediction results looked good, but in a composite image using a sliding window for localisation the results were again rubbish:

Success

Looking at the results it seemed like detecting rotated trees could be a problem (all the trees in the training set are upright).

I simply added rotation_range=360 to the Keras image generator for the training set. This means each image will be rotated with a random amount (0-360 degrees).

And success:

I’m happy with that as a first stab at feature detection and sliding localisation using artificial data (Blender) and Keras/Tensorflow.

I’d love to hear your comments/suggestions below, this is a new field for me and I’m learning.

You can follow me on twitter if you’re interested in updates.