-
Notifications
You must be signed in to change notification settings - Fork 192
DBScan
For a summary of DBScan, see https://en.wikipedia.org/wiki/DBSCAN.
The DBScan implementation in GeoWave is incrementally evolving. As of 0.9.0, DBScan is implemented using Map-Reduce jobs. Ideally, the algorithm will be moved into a Spark implementation.
The approach chosen meets the following requirements.
- Geo-spatial (orthdromic) distance is part of the distance measurement. Two objects cannot be close if they are not Geo-spatial close.
- Distance measures can be furthered refined to include other attributes. Two objects can be Geo-spatial close and be far along other dimensions.
- The solution is scalable, The implementation is compatible with Hadoop 2.0 infrastructure.
- Support Points, Polygons, Lines and Line String
- Ideally, the solution is not restricted to Simple Features.
- Leverage GeoWave's components, where possible. This is intended to be a GeoWave solution.
The last requirement is intended to take advantage of GeoWave's components. There are a number of general purpose DBScan algorithms available. Some perform quite well. However, assuming geo-spatial distances, GeoWave can offer some short-cuts that present advantages.
Just prior to developing DBScan, support for subclass of Near-Neighbors class of algorithms was introduced. with the intent of reuse, the support infrastructure lay the ground work for DBScan. The specific near-neighbors problem set is summarized as follows.
For each item in the data set, find all other items in the same data set that are considered neighbors by a provided distance function.
The distance function assume geo-spatial proximity as one of the primary measures, meeting requirements 1 and 2. Incrementally applied, the infrastructure could be used to build a nearest neighbor graph, clusters, etc. An example use case is finding those items whose cardinality of the set of neighbors exceed some minimum bound.
The solution for near-neighbors
With the intent of reuse this ground work
At this time, The algorithm focused The basic approach of the algorithm is