Terrain Classification
This page includes:
Terrain classification is very useful for autonomous mobile robots in
real-world environments for path planning, target detection, and as a
pre-processing step for other perceptual tasks. We were provided with a
large dataset acquired by a robot equipped with a SICK2 laser scanner.
The robot drove around a campus environment and acquired around 35 million
scan readings. Each reading was a point in 3D space, represented by its
coordinates in an absolute frame of reference. Thus, the only available
information is the location of points, which was fairly noisy because of
localization errors.
Our task is to classify the laser range points into four
classes:
ground, building,
tree, and
shrubbery. Since classifying ground
points is trivial given their absolute z-coordinate, we classify them
deterministically by thresholding the z coordinate at a value close to
0. After we do that, we are left with approximately 20 million non-ground
points. Each point is represented simply as a location in an absolute 3D
coordinate system. The features we use require pre-processing to infer
properties of the local neighborhood of a point, such as how planar the
neighborhood is, or how much of the neighbors are close to the ground. The
features we use are invariant to rotation in the x-y plane, as well as the
density of the range scan, since scans tend to be sparser in regions
farther from the robot.
Our first type of feature is based on the principal plane around it. For each
point we sample 100 points in a cube of radius 0.5 meters. We run PCA on
these points to get the plane of maximum variance (spanned by the first two
principal components). We then partition the cube into 3x3x3 bins around
the point, oriented with respect to the principal plane, and compute the
percentage of points lying in the various sub-cubes. We use a number of
features derived from the cube such as the percentage of points in the
central column, the outside corners, the central plane, etc. These features
capture the local distribution well and are especially useful in finding
planes. Our second type of feature is based on a column around each
point. We take a cylinder of radius 0.25 meters, which extends
vertically to include all the points in a "column". We
then compute what percentage of the points lie in various segments of this
vertical column (e.g., between 2m and 2.5m). Finally, we also use an
indicator feature of whether or not a point lies within 2m of the
ground. This feature is especially useful in classifying shrubbery.
For training we select roughly 30 thousand points that represent the
classes well: a segment of a wall, a tree, some bushes.
We considered three different models: SVM, Voted-SVM and AMNs.
All methods use the same set of features, augmented with a quadratic
kernel.
The first model is a multi-class SVM with a quadratic kernel over the
above features. This model (right panel) achieves reasonable performance
in many places, but fails to enforce local consistency of the
classification predictions. For example arches on buildings and other
less planar regions are consistently confused for trees, even though they
are surrounded entirely by buildings.
We improved upon the SVM by smoothing its predictions using voting. For
each point we took its local neighborhood (we varied the radius to get the
best possible results) and assigned the point the label of the majority of
its 100 neighbors. The Voted-SVM model (middle panel) performs
slightly better than SVM: for example, it smooths out trees and
some parts of the buildings. Yet it still fails in areas like arches of
buildings where the SVM classifier has a locally consistent wrong
prediction.
The final model is a pairwise AMN over laser scan points, with associative
potentials to ensure smoothness.
Each point is connected to 6 of its neighbors: 3 of them are sampled
randomly from the local neighborhood in a sphere of radius 0.5m, and
the other 3 are sampled at random from the vertical cylinder column of
radius 0.25m. It is important to ensure vertical consistency since
the {\sf SVM} classifier is wrong in areas that are higher off
the ground (due to the decrease in point density) or because objects
tend to look different as we vary their z-coordinate (for example, tree
trunks and tree crowns look different). While we experimented with a
variety of edge features including various distances between points,
we found that even using only a constant feature performs well.
We trained the AMN model using CPLEX to solve the quadratic program; the
training took about an hour on a Pentium 3 desktop. For inference, we used
min-cut combined with the alpha-expansion algorithm of Boykov et al.
described above. We split up the dataset into 16 square
(in the xy-plane) regions, the largest of which contains around 3.5
million points. The implementation is largely dominated by I/O time, with
the actual min-cut taking less than a minute even for the largest segment.
We can see that the predictions of the AMN (left panel) are much smoother:
for example building arches and tree trunks are predicted correctly. We
hand-labeled around 180 thousand points of the test set and computed
accuracies of the predictions (excluding ground, which was classified by
pre-processing). The differences are dramatic: SVM: 68%,
Voted-SVM: 73% and AMN: 93%.
This fly-through movie
shows a part of the dataset labeled by the AMN model.
Below are the enlarged images of the results shown in the paper:
Below are the images of the labeled portion of the test set:
| Labeled test image (180K points) |
|
| SVM |
|
| Voted-SVM |
 |
| AMN |
 |