Intrepid Monkey in the Point Cloud: January 2021

Monday, January 11, 2021

Lattice lambada

For many scale-dependent features, a space partition of some sorts is needed for finding a possible scale interval where the feature is meaningful and approximately constant. The overlap ratio $\lambda$ of two point sets was defined by: \[\lambda_\epsilon(A,B)= \frac{|\{A\}_\epsilon \cap \{B\}_\epsilon|}{|\{A\}_\epsilon \cup \{B\}_\epsilon|}\]

And the earlier post mentioned, that it can be improved by a shift: \[\lambda:= \left(\lambda_\epsilon(A,B)+ \lambda_\epsilon(A+\mathbf{1}_A\epsilon/2, B+\mathbf{1}_B\epsilon/2)\right)/2, \] where $\mathbf{1}_A$ is a $|A|\times d$ matrix filled with ones. The term $\mathbf{1}\,\epsilon/2$ shifts the rounding creating a different version of $\lambda$ calculation. Averaging from the two versions stabilizes the scale-dependent features.

Shifted rounding grid (red).

Rounded sets can be formed by hexagonal lattice arrangement, too. And the shifting trick can be utilized there, too. A hexagonal lattice has the following matrix $E=(n_1\,n_2\,n_3)$:

\[E= \begin{pmatrix} 1 & 1/2 & 1/2 \\ 0 & \sqrt{3}/2 & 1/(2\sqrt{3}) \\ 0 & 0 & \sqrt{2/3} \end{pmatrix}.\]

Now, any given point $p\in \mathbb{R}^3$ has a unique projection \[\nu^*=\text{arg min}_{\nu\in\mathbb{Z}^3} \|p-E\nu\|\] to the lattice indices $\nu$ and this can be found by rounding. So, for a scale $\epsilon$: \[\{P\}_\epsilon=\{E\,\text{round}(E^{-1}p/\epsilon)\epsilon\,|\, p\in P\}.\]

The 2D case follows by removing the last row and column from the matrix $E$. The orthogonal rounding had $c= \mathbf{1}\epsilon/2$ as the crucial shift for the shifted rounding. Now, we use $c=\left(1/2,\,1/(2\sqrt{3}),\,1/(2\sqrt{6})\right)\epsilon$, where $c=\sum_i n_i/4$ is the center of an tetrahedron, when $n_i$ are the three vertices and origo is the fourth one. Happiness abounds, since the hexagonal rounding gives much better selection $C\subset A$ of sparse subsets $C=NN(\{A\}_\epsilon,A\}$, and the stable scale interval is usually wider with any scale-dependent feature.

Just as in a previous article, black dots are $A$, magenda crosses are $\{A\}_\epsilon$ and members of $C$ are depicted by red circles. One can see that the packing ratio of $C$ has gone from $l/l_0\approx 1.05$ to close to 1.

A synthetic test set $A$ and a subset selection $NN(\{A\}_\epsilon, A)$ by hexagonal rounding.

An example of the overlap factor $\lambda$ in the case of point sets $A$ and $B$ of the previous blog post had a block rounding (or cubic rounding) as the black curve in the Fig. below. Red line is the overlap estimate from the local insidence histogram. And blue is the hexagonal rounding by different scale factors $\epsilon$, and the dashed blue line is the shifted hexagonal rounding. As one can see, the hexagonal rounding is more stable on the scale interval 2...8 m than the cubic rounding. Surprisingly, the hexagonal rounding is more unstable above this scale reaching $\lambda=1$. The case of PCs A and B is actually a rather difficult one. Different exampels give much better and more unisono results.

Overlap estimates by hexagonal rounding, cubic rounding and shifted hexagonal rounding are being compared to original cubic rounding.

Wednesday, January 6, 2021

When do point sets intersect (part 2)

In our previous exploitation of the subject, we found a nice $\mathcal{O}(|A|+|B|)$ way of checking how much two point set (D2 or D3) intersect. We found that a simple rounding trick delivered most of the goodies needed. But a corresponding $\mathcal(n\log(n)), n=|A\cup B|$ method was void of a similar scale sensitivity analysis. How to add that? An answer is a modified Laplacian interpolation of local insidense ratios $\lambda'(p)$ shown in the picture below right. The original ratios are here. The color bar shows values of $0\le \lambda'(p)\le 1$. (Blue: a lonely stranger in a foreign land. Yellow: a citizen among citizens).

The adjacency matrix $\mathcal{A}$ has elements $a_{ij}\in \mathcal{A}$ in such a way, that $a_{ij}= 1/|N(p)|$ if and only if $p_j\in N(p_i)$ and $a_{ii}\equiv 0$. one can modify this a bit e.g. by using weighted inverse distances $a_{ij}= w_{ij}/\sum_{p_j\in N(p_i} w_{ij}$, where $w_{ij}= 1/\text{max}(\delta_0,\|p_i-p_j\|)$, where $\delta_0= 0.01$ m is the bumper margin for too close point pairs. Now, original insidence values can be modified by: \[ \bar{\lambda}':= A^u \bar{\lambda}' \]

where $\bar{\lambda}=\{\lambda'(p)\}_{p\in A\cup B}$ is an assembly vector. One can choose a power $u=0.23$ yielding a similar value $\lambda= 0.86$ as the rounding method. As long as weights on an individual row $a_i\in\mathcal{A}$ equal 1: $\sum_{j=1}^{|A|}a_{ij}=1$, the resulting smoothing $f:=\mathcal{A}^uf$ has a unit property, which means that a constant function $f(p)$ stays constant. This can be extended to full interpolation, but that topic may come up later, if ever.

Interestingly, the scale associated with $u=0.23$ is $\epsilon= 5.4$ m, which is within the rounding method stability area $2.2\le \epsilon \le 8$ m. There are several ways to estimate this associated scale, the handiest one uses an assumption of uniform distribution and the mean distance $l_0= 3.5$ m of the example PCs $A$ and $B$, but a more general method seeks for a stability area in the same way as was done with the rounding method. This is an interesting research question: How to choose $u$ so that the scale factor becomes controlled in a general PC (with relatively uniform spatial distribution)? And: are there meaningful interpolants $f'= \sum_{j=1}^m A^{u_j}f$ with a relatively short length 4 m so that the scale is well controlled?

But, above there is the original pointwise incidenses (left) and a result of adjacency matrix smoothing (right). If we used mere $1/|N(p)|$ weight and $u=1$, it would have been graph Laplacian interpolation.

Matrix power $u=2$ is special, because it keeps a sparse matrix $\mathcal{A}$ sparse. Below are the non-zero elements of the initial matrix, and the power $\mathcal{A}^2$.

An interesting detail with both the inverse length smoothing and Laplacian is that on each iteration, the original values disappear and the mean of the surrounding values substitute them. Since the mean natural neighborhood size $\bar{n}= \text{mean}_{p\in P}|N(p)| \approx 6$ (by Leonhard Euler, btw.!), approximately 1/7th of the information disappears while the node values (in this case $\lambda'(p)$'s) get smoother by substitution and while the overall value field progresses towards a harmonic function. And harmonic functions have a minimum of local information density of all smooth functions. So, the adjacency power interpolation is a fancy way of regularization, where the information content gets destroyed by a power constant $u$ chosen.

More about geometric information perhaps in the next year 2022!

Monday, January 4, 2021

When do point sets intersect? (part 1)

I woke up to 2021 and there was this burning question in my mind: when two PCs $A$ and $B$ do intersect? And if they do intersect, how does it happen? And how to compute the intersection properties in $\mathcal{O}(|A|+|B|)$ manner? As usual, answers are many depending on the data and the application. There are simple answers like checking if bounding boxes or convex hulls intersect

(actually computing the convex hull is a $\mathcal{O}(|A|\log(|A|)+...)$ effort), but for many efficient techniques one needs rounded sets. A set $A$ can be rounded to $\{A\}_\epsilon$ by:

\[ \{A\}_\epsilon= _{df}\, \{\text{round}(a/\epsilon)\epsilon\,|\, a\in A\} \]

Rounding allows local census to be made. How many points in the neighborhood are from a set $A$ and how many from the set $B$? A natural visualization shows the relative sizes of the sets at the same time with the overlap distribution. Below there is another application of rounded sets. A PC ($A$, black dots) sampling to a nearly Poisson disk distributed subset (red circles). Red crosses represent $\{A\}_\epsilon$. A formula for the selected subset $C \subseteq A$ (depicted by red circles) is:

\[ C= \{NN(p,A)\,|\, p\in \{A\}_\epsilon\} \],

or, allowing set arguments for nearest neighbor operator $NN(.,.)$: $C=NN(\{A\}_\epsilon, A)$.

A synthetic test set $A$ and a subset selection $NN(\{A\}_\epsilon, A)$ by rounding.

The distance distributions $l_A=\text{mean}_{p\in A,q\in N(p)} \|p-q\|$ is the same as the distribution of the edge lengths in Delaunay triangularization. The black curve is the original and the red one of the subsample. Theoretical means $l_0$ shown with dashed lines assume the optimal hexagonal packing. Basically, inside each equilateral triangle of hexagonal lattice there are 3/6 points and the point density $\rho=(3/6)/(l_0^2 \sqrt{3}/2\times 1/2)$ (number of points per area), from where:

\[ l_0= \sqrt{\frac{2}{\sqrt{3}\rho}}\]

An infinite hexagonal packing has the smallest possible mean distance $l_0$.

The ratio of the observed $l_A$ and the theoretical $l_{0A}$ is called the packing ratio $1\le l_A/l_{0A}$. One can modify the rounding operator $\{.\}_\epsilon$ to round to a hexagonal grid (or face-centered cubic (FCC) in 3D), which keeps the packing ratio more stable $l_A/l_{0A}\approx l_C/l_{0C}$, although this ratio is truly a scale-dependent thing for most of the natural PCs. The only difficulty is with the surface area. It can be defined e.g. by choosing an alpha shape radius $r_\alpha$ and define a triangularization and the surface area it covers. But there are other ways, including the counting of the rounded grid nodes: $\text{area}(A)\approx \|\{A\}_\epsilon\|\epsilon^2$. But this is again a scale dependent property.

So, returning to the original question, how much point sets $A$ and $B$ intersect? One can make a local poll within the rounded representations and produce an estimate $0\le\lambda\le 1$:

\[\lambda= \frac{|\{A\}_\epsilon \cap \{B\}_\epsilon|}{|\{A\}_\epsilon \cup \{B\}_\epsilon|}\]

This is again a scale ($\epsilon$) dependent property. Let us have an experiment!

Overlap of two point clouds $A$ and $B$ analyzed by two different methods. The overlap based on point insidence is depicted in red line and the rounding method in blue in the left detail.

The little plot on the left shows $\lambda$ by rounding in blue. The curve could be smoothed by choosing several grid alignments and averaging. Now we see a value $\lambda=0.86$ over a range of $\epsilon\in [2.0,7.5]$ m. Above that range, the estimate becomes unstable, because both PCs (blue and green) have rather restless perimeters.

The estimate at right is based on a poll over natural neighbors $N(p, A\cup B)\subset A\cup B$, where $A$ and $B$ are combined before the Delaunay triangularization. The local insidence ratio $\lambda'(p)$ is the ratio of neighbors $q\in N(p, A\cup B)$ belonging to the same point cloud as $p$:

\[\lambda'(p)=\frac{|N(p,A\cup B)\cap A|}{|N(p,A\cup B)|},\] in case of $p\in A$. In case of $p\in B$, the denominator has $N(p,A\cup B)\cap B$.

The insidence $\lambda'(p)$ with $p$ in the center of 6 points.

One problem remains, what single value of total overlap $\lambda$ to choose by the insidence distribution? The mean $\text{mean}_{p\in A\cup B} \lambda'(p) = 0.58$. This means point sets are rather mixed: there are not many solo invaders with $\lambda'(p)<0.2$, and not many inland dwellers with $0.8 < \lambda'(p)$. An estimate based on local mixing needs a $\lambda_0'$ limit and counting only the truly 'mixed' cases:

\[ \lambda= \{p\in A\cup B\,|\, \lambda_0' \le \lambda'(p) \le 1-\lambda_0' \} \]

A choice with $\lambda_0'= 0.2$ gives $\lambda=0.74$. With two PCs, which are more solid by their outer boundary, these two measures coincide quite well.

There is also a possibility to remove largest triangles from the Delaunay triangularization. An easy trick is to remove triangles $t$ with an edge longer than a limit length $\eta$. The scale parameter $\epsilon$ makes the point identity more coarse, and $\eta$ ignores larger structures. Two control parameters is more messy but gives more control. Below a triangularization with $\eta= 10$ m is depicted.

Delaunay triangularization over $A\cup B$.

There is also a question of how much our intuitive concept of point set overlapping is useful for each application. Anyways, overlap concepts based on the Delaunay triangularization are of complexity $\mathcal{O}(n\log n)$, and they were introduced here just for comparing with the rounding method. But for experimenting more, let's add one layer, a neighborhood voting, to the computations. But that could be in another posting.

Anyways, there is a saying from the ancient past: a meaningful geometric feature has a specific scale interval, where it is stable.