I my previous post I demonstrated the first successful results for the sliding window object recognition code.

The next step was to use that system to build a 3D localisation prototype. The idea was to triangulate or project the 2D results onto a 3D space and use that to generate a probabilistic model of where the “tomatoes” are located.

2D Description

I will illustrate the approach using 2D.

Let’s say we have a cartesian 2D space where some objects of interest are located, and a camera:

In the 3D case, we have a probability for each window of the image. In the 2D case it’s a probability along the line of the field of view (FOV) of the camera.

That probability can be projected onto the 2D space, where each line in the field of view of the camera has an associated probability.

In other words, for each pixel in the field of view of the camera, a line can be drawn in the 2D space, where each point on that line has the same probability of having an object.

If we discretise the space into voxels (I’m using “voxels” as we’re actually dealing with 3D objects), the probability that an object exists in each voxel can be determined by projecting from the camera onto the probability line:

and repeat for all voxels. Then we expect, if things work, to have a result like this:

Where the voxels lie in the path of the camera viewpoint that contain an object have corresponding high probabilities.

Since a single viewport can’t distinguish where on the view line the object resides, multiple viewpoints must be used and the results combined (similar to radio triangulation) to attempt to get the following result:

If the camera always orbits around the outside, some details of what’s inside may be unknowable.

3D results

This is what I did for the 3D case, using the previous results:

  • Use N random camera locations
  • For each camera, use the 2D probability result to determine the probability for each voxel
  • Combine the results into a single probability for each voxel

Here is the result of the voxel probability for a single camera (including the “plant” under test):

And here is the result of combining the results for 20 camera positions:

The same result visualised with isosurfaces using marching cubes:

and showing the plant:

Next steps

You might notice the isosurfaces are “flat” in the XY plane. This is because the camera target vector is constrained in the XY planes in these experiments. I.e. the code should be extended to also support looking “down” onto the plant or “up” from beneath.

However, the robotic arm I have ordered will require a tilt mechanism to tilt the camera up/down, and the math is easier at the moment, so I’m not sure when that will happen.

What I really want to do is prepare for getting the hardware and reduce the gap between simulation an real-world first, probably by 3D printing a real-world “plant” using the same model being used in silico.

I’d love to hear your comments/suggestions below, this is a new field for me and I’m learning.

You can follow me on twitter if you’re interested in updates.