crownsegmentr is an R package with a C++ implementation of the AMS3D
algorithm (Ferraz et. al, 2016) for tree crown
segmentation in airborne lidar data. For a general description of how
the tree segmentation works, see the documentation of the
segment_tree_crowns generic. Pseudo code of
the AMS3D algorithm is listed
below.
This is a basic example which shows you how to segment a normalized point cloud and visualize the result:
# Load a point cloud of some trees included in the lidR package
point_cloud <- lidR::readLAS(system.file(
"extdata/MixedConifer.laz",
package = "lidR"
))
# Segment a normalized point cloud
segmented_point_cloud <- crownsegmentr::segment_tree_crowns(
point_cloud,
crown_diameter_to_tree_height = 0.25,
crown_height_to_tree_height = 0.5
)
# Generate random crown colors
crown_colors <- lidR::pastel.colors(
n = length(unique(segmented_point_cloud@data[["crown_id"]]))
)
# Plot the segmented crown bodies
lidR::plot(
segmented_point_cloud,
color = "crown_id",
pal = crown_colors,
nbreaks = length(crown_colors),
size = 3,
axis = TRUE
)The package provides four functions: The central function
segment_tree_crowns(), the preprocessing
functions watershed_diameter_raster()
and li_diameter_raster(), and the
postprocessing function
remove_small_trees().
This is the central function of the package, which implements the AMS3D-Algorithm. It takes a point cloud and returns the same point cloud with an additional attribute called “crown_id” (or differently, depending on user specification). In addition, the user must specify the two arguments crown_diameter_to_tree_height and crown_length_to_tree_height, to determine the size of the search kernel depending on the estimated shapes of the trees. Those values can either be given as single numbers, or as raster, which allows to account for different tree shapes in different parts of the point cloud. They can also be set to 0, if non-zero values are given for crown_diameter_constant and crown_length_constant, respectively. For details see the documentation of the function, or the pseudo code of the algorithm.
These two preprocessing functions do the same thing, but in different
ways: They take a point cloud, and do a preliminary segmentation of tree
crowns with a quick algorithm (lidR::li2012() or lidR::watershed()).
Then, they return a raster of local crown_diameter_to_tree_height
ratios, based on the preliminary segmentation. This raster can then be
used as input for segment_tree_crowns. Usually, this improves the
segmentation accuracy of the AMS3D algorithm. However, in cases where
the preliminary segmentation performs poorly, it may be worse than
specifying reasonable fixed numbers for crown_diameter_to_tree_height.
The watershed variant requires the Bioconductor package
EBImage. If you have it available,
watershed_diameter_raster() is recommended. Otherwise,
li_diameter_raster() offers a good alternative. Both functions can
also be called from within segment_tree_crowns(), by specifying
crown_diameter_to_tree_height = “li2012” or
crown_diameter_to_tree_height = “watershed”. In these cases, the
functions will be called with default parameters. In order to vary
parameters, the function must be called manually.
The AMS3D-Algorithm often yields numerous smaller crown clusters in the
lower part of the point cloud, especially if the point cloud is thin in
its lower parts, or the constants for crown diameter and crown length
are small. To reduce such over-segmentation, you can post-process the
segmentation result with remove_small_trees(). Crown ID’s whose crowns
have diameters or heights below user-defined thresholds will be set to
NA.
The AMS3D algorithm works best for airborne laser scanning (ALS) point clouds with densities between 5 and 20 points per m². Typically, point densities higher than this do not improve the accuracy of the segmentation, while substantially increasing the computation time: The processing time increases roughly quadratically with the density of the input point cloud. For high-density point clouds, we recommend thinning the point cloud before segmentation, or using other algorithms.
The code base ofsegment_tree_crowns() is split into an R “front-end”
and a C++ “back-end”. The other three functions are less complex, and
implemented in one R file each.
The function segment_tree_crowns is an S4
generic , i.e. it can
handle point cloud data stored in different data types and behave
differently according to that type. More specifically, the generic
function chooses one out of several so called “methods” based on the
input data type. There are methods for
data.frames/data.tables,lidR::LASobjects, andlidR::LAScatalogs.
While the other functions (li_diameter_raster(),
watershed_diameter_raster() and remove_small_trees()) are also
implemented as S4 generics, they can only take point clouds of type
lidR::LAS.
The data type specific methods deal with specifics of their data type.
The actual segmentation is done by the internal function
segment_tree_crowns_core, which is
used by the data.frame/data.table and lidR::LAS methods. The
lidR::LAScatalog method internally calls the lidR::LAS method.
This function performs the segmentation by first calling the C++
back-end to calculate centroids and by then clustering the terminal
centroids with the DBSCAN algorithm (as implemented in the
dbscan::dbscan function).
It takes point cloud data in the tabular form of a data.frame or
data.table and returns a list with at most three elements. The first
element always contains a vector of crown IDs with one ID for each point
(i.e. row) in the input data. The second and third elements are optional
and contain coordinates of terminal and prior centroids together with
crown IDs and (row) indices of the points they belong to.
This method just calls the core function and binds the returned crown IDs to the input table. It returns this extended table and, if requested, also the terminal centroids or all centroids returned by the core function.
Similar to the data.frame/data.table method in that it extends the
input object with a crown ID attribute and, if requested, returns the
centroids as separate lidR::LAS objects. The centroid objects are
assigned the metadata of the input object.
For context: The lidR R
package offers a framework for
processing point clouds of large areas, possibly stored in multiple
files and referenced by so called
LAScatalogs.
LAScatalogs organize point clouds in adjacent chunks which are
processed individually. The chunks each get a buffer area around them so
that edge effects can be accounted for. In the case of individual tree
segmentation, edge effects would be that tree crowns are cut off at the
edge of a chunk when not using a buffer. Parallel processing of the
chunks is also supported.
The segment_tree_crowns method for lidR::LAScatalogs internally
defines a function which segments a chunk of a LAScatalog. This
function is then applied to all chunks of the LAScatalog provided by
the user. The “chunk function” first passes the chunk to the lidR::LAS
method. Afterwards, it excludes both tree crowns and unsegmented points
in the buffer area. Since crown IDs overlap across chunks (the IDs in
each chunk start at 1), the chunk function also calculates unique
replacements for the crown IDs based on the apices’ absolute
coordinates.
- validation functions for method arguments
- helper functions
extract_coordinate_valuesandcollect_scale_n_offset_of_LAScatalog_files - test suite for R functions/methods
The back-end is a small C++ library which implements the AMS3D algorithm. The core functionality can be found in:
namespace ams3d: Functionality for calculating a point’s terminal centroid with the AMS3D algorithm (header), andnamespace spatial: a facade to the Boost Geometry library which provides e.g. the spatial index used for finding points inside cylinders (header).
There is also some R interface code outside of any namespace called
ams3d_R_interface(header).
Note: The namespaces ams3d and spatial contain internal functions,
classes, etc. which should not be used in other namespaces. These
internal components are indicated by an underscore at the beginning of
their name.
This code contains the C++ functions that are callable from R. They are
not contained in any namespace, since this is a requirement of the
Rcpp package which does the
actual exposition to R. The functions are
calculate_centroids_normalized(source) for processing normalized point clouds,calculate_centroids_terraneous(source) for processing non-normalized point clouds, andcalculate_centroids_flexible(source) for processing both normalized and non-normalized while possibly also using rasters for the crown diameter and crown height to tree height parameters.
They are all doing basically the same thing, which is looping over
points they get from R and call the functionality exposed by ams3d and
spatial to calculate terminal centroids for these points.
In addition, there are also some helper functions in the
namespace ams3d_R_interface_util
(source).
The ams3d_R_interface is the only part of the C++ code which calls
R-specific functions (a.o. it manages a progress bar provided by the R
package progress). By
separating the core functionality from the R-specific C++ code it is
possible to use the core functionality with other C++ code when not
using R.
This namespace only exposes two functions (header):
calculate_terminal_centroidcalculate_all_centroids
They do exactly the same thing, i.e. calculate the centroids of a point,
except that the *_plus_centroids variant also returns the prior
centroids which were calculated during the process. Both functions are
overloaded three times:
- The most simple overload assumes a normalized point cloud with
ground height at zero and takes single numbers for the
crown_diameter_to_tree_heightandcrown_height_to_tree_heightarguments (source). - The second overload can deal with not normalized point clouds by reading ground heights from a raster (source).
- The third overload reads both ground height and cylinder dimension parameters from rasters (source).
There is also an internal _Kernel class
(source) that models the cylinder used to find
points in the neighborhood of a point or centroid. It also contains the
logic to calculate the weighted centroid of all points inside the
cylinder. A _Kernel object is instantiated with a point or centroid
and the crown-diameter-to-tree-height and crown-height-to-tree-height
parameters. It features only one public method:
calculate_centroid_in( point_cloud ). Cylinder dimensions are
calculated in the constructor and the centroid calculation logic is
implemented in a few private methods.
There is one more very small internal namespace in ams3d called
_math_functions that contains the gaussian and epanechnikov functions
used for weighing the points inside a cylinder during centroid
calculation.
This namespace mainly exposes functionality of and based on the C++
library
Boost Geometry
for dealing with point data. Most of this functionality consists of data
types (header) but there are also some
functions and one functor (header and
source). Additionally, there is some raster
functionality (header) and a few custom
iterators (header) for
inserting points into the spatial index provided by the Boost Geometry
library (an R*-tree).
The data types are:
coordinate_tfor coordinate values,distance_tfor distances,point_2d_tandpoint_3d_tfor 2D and 3D points,index_for_3d_points_tfor R*-tree index structures, andbox_tfor 3D boxes.
A few simple functions of boost::geometry are directly forwarded:
distance( geometry1, geometry2 )returns the distance between two geometric objects,get_x( point ),get_y( point ), andget_z( point )return the respective coordinate value of a point, andget_xy_point( point )returns a point’s xy-coordinates as a 2D point.
Exposed functions with own logic are
get_points_intersecting_vertical_cylinder( <cylinder dimensions and an index structure> )which searches an R*-tree index andweighted_mean_of( points, weights )which calculates a weighted average position of a collection of points.
In addition there is an internal functor class called
_within_xy_distance_functor whose objects are needed for queries to
the R*-tree index. Functors are function objects, i.e. objects with a
()-operator, making it possible to use such an object like a function.
For handling raster data in the C++ back-end, a small set of three raster classes was set up (header):
I_Rasteran interface class which defines methods that every raster object should have.Rastera normal raster class which implements the methods demanded byI_Raster.Single_value_pseudo_rasteralso implements the methods demanded byI_Rasterbut internally stores just one value. It can return this value faster than aRasterobject could if it was used for storing just one value.
The raster classes were set up like this to make it possible to pass
either a single value or an actual raster to the same function argument.
Without this it would be necessary to code every combination of function
parameters where the ground height and the
crown_diameter_to_tree_height and crown_height_to_tree_height
parameters can be either a single value or a raster of values. With the
I_Raster class it is possible to pass either
Single_value_pseudo_raster or Raster objects to the same function
parameter of type I_Raster.
The spatial index provided by the Boost Geometry library can be
constructed with a packing algorithm that results in faster queries. In
order to use this feature, the index constructor needs to be passed all
of the points it should use in the form of a begin and end iterator. In
order to skip points with non-finite coordinate values and points below
certain heights, custom iterators for std::vector< point_3d_t >
(header) are used by a few
public functions to set up spatial indices:
create_index_of_finite( points, min_height )skips points with non-finite coordinate values and points belowmin_heightcreate_index_of_above_ground( points, min_height_above_ground, ground_height_grid )same as above but does not assume a normalized point cloud and instead reads ground heights from a rastercreate_index_of_above_ground( points, min_height_above_ground_grid, ground_height_grid )same as above but reads minimum above-ground heights also from a raster
There may be a few syntax constructs in the C++ code which appear unfamiliar to R users. This section gives the rationale for some of these constructs. Most of it is based on information found at the website learncpp.com.
There are a few different ways to create and assign a value to objects in C++:
int foo = 0; // copy initializationint foo(0); // direct initializationint foo{0}; // list initializationstd::vector<int> foos{1, 2, 3}; // list initialization
According to
learncpp.com,
list initialization is the preferred option. However, as you can see in
the examples, it works a little bit differently for array-like objects
like e.g. std::vectors or Rcpp::Lists. So if you want to create a
vector with a certain size instead of with some elements, you need to
use direct initialization instead,
e.g. Rcpp::NumericVector foos( <size of the vector> ); for Rcpp
vectors.
Instances (i.e. objects) of a class are initialized by constructors. Constructors are basically functions without a return value which internally assign values to the properties of an object. According to learncpp.com, the most direct way to do these assignments is to use member initializer lists. The syntax of these initializer lists looks like this:
some_class (
<arguments to constructor>
):
property_1{ <initial value> },
property_2{ <initial value> } // ...and so on
{
<the actual body of the constructor with possibly more code>
}
# outer loop over the points in the point cloud
for each point in the point cloud:
find_mode_of( point )
# inner loop to find the mode of a single point
find_mode_of( point ):
# this assignment is needed in the loop below
current_centroid = point
# "move" towards the nearest mode by calculating a new centroid at the
# location of the previous centroid and repeating this until the
# centroids converge
do:
former_centroid = current_centroid
current_centroid = calculate_weighted_mean_of (
points_in_neighborhood_of( former_centroid )
)
while( distance_of( former_centroid, current_centroid ) > very_small
AND number_of_iterations < too_many )
# the terminal centroid is returned as approximation to the mode
return( current_centroid )
# the neighborhood of a point is defined by a crown-sized cylinder
points_in_neighborhood_of( point ):
return( points_in( vertical_cylinder_at( point ) ) )
# the cylinder's size is calculated using its above-ground height and the
# four main parameters to the algorithm
vertical_cylinder_at( point ):
cylinder = new cylinder (
height = above_ground_height_of( point ) * cl_2_th + cl_c
diameter = above_ground_height_of( point ) * cd_2_th + cd_c
)
return( upper_three_quarters_of( cylinder ) )
# The numbers "cl_2_th", "cd_2_th", "cl_c" and "cd_c" are parameters to the
# algorithm and stand for "crown length to tree height",
# "crown diameter to tree height", "crown length constant" and
# "crown diameter constant".
points_in( cylinder ):
<use a spatial index for finding the points in cylinder>
calculate_weithed_mean_of( points ):
return( weighted_average_position_of (
points,
weights_of( points )
) )
weights_of( points ):
return( for each point in points:
exp( -5 * relative_horizontal_distance_of_cylinder_center_to( point )^2 )
* ( 1 - relative_vertical_distance_of_cylinder_center_to( point )^2 )
)
# horizontal: gaussian profile
# vertical : epanechnikov profile
Ferraz, A.; Saatchi, S.; Mallet, C. & Meyer, V., (2016) “Lidar detection of individual tree size in tropical forests”, In: Remote Sensing of Environment, 183, 318-333, DOI: 10.1016/j.rse.2016.05.028 Ferraz, A.; Bretar, F.; Jaquemond, S.; Gonçalves, G.; Pereira, L.;, Tomé, M. & Soares, P., “3-D mapping of a multi-layered Mediterranean forest using ALS data”, In: Remote Sensing of Environment, 121, 210-223, DOI: 10.1016/j.rse.2012.01.020