I woke up to 2021 and there was this burning question in my mind: when two PCs $A$ and $B$ do intersect? And if they do intersect, how does it happen? And how to compute the intersection properties in $\mathcal{O}(|A|+|B|)$ manner? As usual, answers are many depending on the data and the application. There are simple answers like checking if bounding boxes or convex hulls intersect
(actually computing the convex hull is a $\mathcal{O}(|A|\log(|A|)+...)$ effort), but for many efficient techniques one needs rounded sets. A set $A$ can be rounded to $\{A\}_\epsilon$ by:
\[ \{A\}_\epsilon= _{df}\, \{\text{round}(a/\epsilon)\epsilon\,|\, a\in A\} \]
Rounding allows local census to be made. How many points in the neighborhood are from a set $A$ and how many from the set $B$? A natural visualization shows the relative sizes of the sets at the same time with the overlap distribution. Below there is another application of rounded sets. A PC ($A$, black dots) sampling to a nearly Poisson disk distributed subset (red circles). Red crosses represent $\{A\}_\epsilon$. A formula for the selected subset $C \subseteq A$ (depicted by red circles) is:
\[ C= \{NN(p,A)\,|\, p\in \{A\}_\epsilon\} \],
or, allowing set arguments for nearest neighbor operator $NN(.,.)$: $C=NN(\{A\}_\epsilon, A)$.
 |
| A synthetic test set $A$ and a subset selection $NN(\{A\}_\epsilon, A)$ by rounding. |
The distance distributions $l_A=\text{mean}_{p\in A,q\in N(p)} \|p-q\|$ is the same as the distribution of the edge lengths in Delaunay triangularization. The black curve is the original and the red one of the subsample. Theoretical means $l_0$ shown with dashed lines assume the optimal hexagonal packing. Basically, inside each equilateral triangle of hexagonal lattice there are 3/6 points and the point density $\rho=(3/6)/(l_0^2 \sqrt{3}/2\times 1/2)$ (number of points per area), from where:
\[ l_0= \sqrt{\frac{2}{\sqrt{3}\rho}}\]
 |
| An infinite hexagonal packing has the smallest possible mean distance $l_0$. |
The ratio of the observed $l_A$ and the theoretical $l_{0A}$ is called the packing ratio $1\le l_A/l_{0A}$. One can modify the rounding operator $\{.\}_\epsilon$ to round to a hexagonal grid (or face-centered cubic (FCC) in 3D), which keeps the packing ratio more stable $l_A/l_{0A}\approx l_C/l_{0C}$, although this ratio is truly a scale-dependent thing for most of the natural PCs. The only difficulty is with the surface area. It can be defined e.g. by choosing an alpha shape radius $r_\alpha$ and define a triangularization and the surface area it covers. But there are other ways, including the counting of the rounded grid nodes: $\text{area}(A)\approx \|\{A\}_\epsilon\|\epsilon^2$. But this is again a scale dependent property.
So, returning to the original question, how much point sets $A$ and $B$ intersect? One can make a local poll within the rounded representations and produce an estimate $0\le\lambda\le 1$:
\[\lambda= \frac{|\{A\}_\epsilon \cap \{B\}_\epsilon|}{|\{A\}_\epsilon \cup \{B\}_\epsilon|}\]
This is again a scale ($\epsilon$) dependent property. Let us have an experiment!
 |
| Overlap of two point clouds $A$ and $B$ analyzed by two different methods. The overlap based on point insidence is depicted in red line and the rounding method in blue in the left detail. |
The little plot on the left shows $\lambda$ by rounding in blue. The curve could be smoothed by choosing several grid alignments and averaging. Now we see a value $\lambda=0.86$ over a range of $\epsilon\in [2.0,7.5]$ m. Above that range, the estimate becomes unstable, because both PCs (blue and green) have rather restless perimeters.
The estimate at right is based on a poll over natural neighbors $N(p, A\cup B)\subset A\cup B$, where $A$ and $B$ are combined before the Delaunay triangularization. The local insidence ratio $\lambda'(p)$ is the ratio of neighbors $q\in N(p, A\cup B)$ belonging to the same point cloud as $p$:
\[\lambda'(p)=\frac{|N(p,A\cup B)\cap A|}{|N(p,A\cup B)|},\] in case of $p\in A$. In case of $p\in B$, the denominator has $N(p,A\cup B)\cap B$.
 |
| The insidence $\lambda'(p)$ with $p$ in the center of 6 points. |
One problem remains, what single value of total overlap $\lambda$ to choose by the insidence distribution? The mean $\text{mean}_{p\in A\cup B} \lambda'(p) = 0.58$. This means point sets are rather mixed: there are not many solo invaders with $\lambda'(p)<0.2$, and not many inland dwellers with $0.8 < \lambda'(p)$. An estimate based on local mixing needs a $\lambda_0'$ limit and counting only the truly 'mixed' cases:
\[ \lambda= \{p\in A\cup B\,|\, \lambda_0' \le \lambda'(p) \le 1-\lambda_0' \} \]
A choice with $\lambda_0'= 0.2$ gives $\lambda=0.74$. With two PCs, which are more solid by their outer boundary, these two measures coincide quite well.
There is also a possibility to remove largest triangles from the Delaunay triangularization. An easy trick is to remove triangles $t$ with an edge longer than a limit length $\eta$. The scale parameter $\epsilon$ makes the point identity more coarse, and $\eta$ ignores larger structures. Two control parameters is more messy but gives more control. Below a triangularization with $\eta= 10$ m is depicted.
 |
| Delaunay triangularization over $A\cup B$. |
There is also a question of how much our intuitive concept of point set overlapping is useful for each application. Anyways, overlap concepts based on the Delaunay triangularization are of complexity $\mathcal{O}(n\log n)$, and they were introduced here just for comparing with the rounding method. But for experimenting more, let's add one layer, a neighborhood voting, to the computations. But that could be in another posting.
Anyways, there is a saying from the ancient past: a meaningful geometric feature has a specific scale interval, where it is stable.