Skip to main content

page search

Library Connected components labeling for giga-cell multi-categorical rasters

Connected components labeling for giga-cell multi-categorical rasters

Connected components labeling for giga-cell multi-categorical rasters

Resource information

Date of publication
December 2013
Resource Language
ISBN / Resource ID
AGRIS:US201600064190
Pages
24-30

Labeling of connected components in an image or a raster of non-imagery data is a fundamental operation in fields of pattern recognition and machine intelligence. The bulk of effort devoted to designing efficient connected components labeling (CCL) algorithms concentrated on the domain of binary images where labeling is required for a computer to recognize objects. In contrast, in the Geographical Information Science (GIS) a CCL algorithm is mostly applied to multi-categorical rasters in order to either convert a raster to a shapefile, or for statistical characterization of individual clumps. Recently, it has become necessary to label connected components in very large, giga-cell size, multi-categorical rasters but performance of existing CCL algorithms lacks sufficient speed to accomplish such task. In this paper we present a modification to the popular two-scan CCL algorithm that enables labeling of giga-cell size, multi-categorical rasters. Our approach is to apply a divide-and-conquer technique coupled with parallel processing to a standard two-scan algorithm. For specificity, we have developed a variant of a standard CCL algorithm implemented as r.clump in GRASS GIS. We have established optimal values of data blocks (stemming from the divide-and-conquer technique) and optimal number of computational threads (stemming from parallel processing) for a new algorithm called r.clump3p. The performance of the new algorithm was tested on a series of rasters up to 160Mcells in size; for largest size test raster a speed up over the original algorithm is 74 times. Finally, we have applied the new algorithm to the National Land Cover Dataset 2006 raster with 1.6×10¹⁰ cells. Labeling this raster took 39h using two-processors, 16 cores computer and resulted in 221,718,501 clumps. Estimated speed up over the original algorithm is 450 times. The r.clump3p works within the GRASS environment and is available in the public domain.

Share on RLBI navigator
NO

Authors and Publishers

Author(s), editor(s), contributor(s)

Netzel, Pawel
Stepinski, Tomasz F.

Publisher(s)
Data Provider