Intrepid Monkey in the Point Cloud: When do point sets intersect? (part 1)

I woke up to 2021 and there was this burning question in my mind: when two PCs $A$ and $B$ do intersect? And if they do intersect, how does it happen? And how to compute the intersection properties in $\mathcal{O}(|A|+|B|)$ manner? As usual, answers are many depending on the data and the application. There are simple answers like checking if bounding boxes or convex hulls intersect

(actually computing the convex hull is a $\mathcal{O}(|A|\log(|A|)+...)$ effort), but for many efficient techniques one needs rounded sets. A set $A$ can be rounded to $\{A\}_\epsilon$ by:

\[ \{A\}_\epsilon= _{df}\, \{\text{round}(a/\epsilon)\epsilon\,|\, a\in A\} \]

Rounding allows local census to be made. How many points in the neighborhood are from a set $A$ and how many from the set $B$? A natural visualization shows the relative sizes of the sets at the same time with the overlap distribution. Below there is another application of rounded sets. A PC ($A$, black dots) sampling to a nearly Poisson disk distributed subset (red circles). Red crosses represent $\{A\}_\epsilon$. A formula for the selected subset $C \subseteq A$ (depicted by red circles) is:

\[ C= \{NN(p,A)\,|\, p\in \{A\}_\epsilon\} \],

or, allowing set arguments for nearest neighbor operator $NN(.,.)$: $C=NN(\{A\}_\epsilon, A)$.

A synthetic test set $A$ and a subset selection $NN(\{A\}_\epsilon, A)$ by rounding.

The distance distributions $l_A=\text{mean}_{p\in A,q\in N(p)} \|p-q\|$ is the same as the distribution of the edge lengths in Delaunay triangularization. The black curve is the original and the red one of the subsample. Theoretical means $l_0$ shown with dashed lines assume the optimal hexagonal packing. Basically, inside each equilateral triangle of hexagonal lattice there are 3/6 points and the point density $\rho=(3/6)/(l_0^2 \sqrt{3}/2\times 1/2)$ (number of points per area), from where:

\[ l_0= \sqrt{\frac{2}{\sqrt{3}\rho}}\]

An infinite hexagonal packing has the smallest possible mean distance $l_0$.

The ratio of the observed $l_A$ and the theoretical $l_{0A}$ is called the packing ratio $1\le l_A/l_{0A}$. One can modify the rounding operator $\{.\}_\epsilon$ to round to a hexagonal grid (or face-centered cubic (FCC) in 3D), which keeps the packing ratio more stable $l_A/l_{0A}\approx l_C/l_{0C}$, although this ratio is truly a scale-dependent thing for most of the natural PCs. The only difficulty is with the surface area. It can be defined e.g. by choosing an alpha shape radius $r_\alpha$ and define a triangularization and the surface area it covers. But there are other ways, including the counting of the rounded grid nodes: $\text{area}(A)\approx \|\{A\}_\epsilon\|\epsilon^2$. But this is again a scale dependent property.

So, returning to the original question, how much point sets $A$ and $B$ intersect? One can make a local poll within the rounded representations and produce an estimate $0\le\lambda\le 1$:

\[\lambda= \frac{|\{A\}_\epsilon \cap \{B\}_\epsilon|}{|\{A\}_\epsilon \cup \{B\}_\epsilon|}\]

This is again a scale ($\epsilon$) dependent property. Let us have an experiment!

Overlap of two point clouds $A$ and $B$ analyzed by two different methods. The overlap based on point insidence is depicted in red line and the rounding method in blue in the left detail.

The little plot on the left shows $\lambda$ by rounding in blue. The curve could be smoothed by choosing several grid alignments and averaging. Now we see a value $\lambda=0.86$ over a range of $\epsilon\in [2.0,7.5]$ m. Above that range, the estimate becomes unstable, because both PCs (blue and green) have rather restless perimeters.

The estimate at right is based on a poll over natural neighbors $N(p, A\cup B)\subset A\cup B$, where $A$ and $B$ are combined before the Delaunay triangularization. The local insidence ratio $\lambda'(p)$ is the ratio of neighbors $q\in N(p, A\cup B)$ belonging to the same point cloud as $p$:

\[\lambda'(p)=\frac{|N(p,A\cup B)\cap A|}{|N(p,A\cup B)|},\] in case of $p\in A$. In case of $p\in B$, the denominator has $N(p,A\cup B)\cap B$.

The insidence $\lambda'(p)$ with $p$ in the center of 6 points.

One problem remains, what single value of total overlap $\lambda$ to choose by the insidence distribution? The mean $\text{mean}_{p\in A\cup B} \lambda'(p) = 0.58$. This means point sets are rather mixed: there are not many solo invaders with $\lambda'(p)<0.2$, and not many inland dwellers with $0.8 < \lambda'(p)$. An estimate based on local mixing needs a $\lambda_0'$ limit and counting only the truly 'mixed' cases:

\[ \lambda= \{p\in A\cup B\,|\, \lambda_0' \le \lambda'(p) \le 1-\lambda_0' \} \]

A choice with $\lambda_0'= 0.2$ gives $\lambda=0.74$. With two PCs, which are more solid by their outer boundary, these two measures coincide quite well.

There is also a possibility to remove largest triangles from the Delaunay triangularization. An easy trick is to remove triangles $t$ with an edge longer than a limit length $\eta$. The scale parameter $\epsilon$ makes the point identity more coarse, and $\eta$ ignores larger structures. Two control parameters is more messy but gives more control. Below a triangularization with $\eta= 10$ m is depicted.

Delaunay triangularization over $A\cup B$.

There is also a question of how much our intuitive concept of point set overlapping is useful for each application. Anyways, overlap concepts based on the Delaunay triangularization are of complexity $\mathcal{O}(n\log n)$, and they were introduced here just for comparing with the rounding method. But for experimenting more, let's add one layer, a neighborhood voting, to the computations. But that could be in another posting.

Anyways, there is a saying from the ancient past: a meaningful geometric feature has a specific scale interval, where it is stable.

Intrepid Monkey in the Point Cloud

Hexagonal lattice rotated

Monday, January 4, 2021

When do point sets intersect? (part 1)

No comments:

Post a Comment