Intrepid Monkey in the Point Cloud: 2024

Wednesday, July 31, 2024

Lattice lambada in higher dimensions

I wrote on 2021 about how to do a quick approximate Poisson disc sampling by rounding data $P\subset\mathbb{R}^{n\times d}$. And I got a question from the audience about how to do the same with higher dimensions $2<d$. So, here comes.

Let us have a unit simplex with $d+1$ vertices, one of them (with an index 0) at origo. The second vertex $e_1= (1,0,...)$ has length $\lambda_1=1$, where $\lambda_n$ means the perpendicular distance of each new vertex from the simplex face formed by earlier vertices.

That way: \[\lambda_n^2= 1 - \sum_{i=1}^{n-1} (\lambda_i/(i+1))^2 \] and we can assemble a matrix $E\in\mathbb{R}^{d\times d}$ from the vertices starting from the origo:

For example $\lambda_2= \sqrt{3}/2$ and $\lambda_3= \sqrt{2/3}$ are heights of an equilateral triangle and of a regular tetrahedron, respectively. The rows $e_i\in E$ have the following property:

That means simplex edges starting from origo are unit vectors with $60^o$ angle between all of them. Sounds a tight grid!? Well, not always a maximally tight one... E.g. with $d= 24$ a denser packing of unit spheres is possible, but the sampling presented here (when seen as a packing problem) performs consistently well in most dimensions.

Now, we refresh the procedure in the 2021 post. We form an index matrix $A\subset\mathbb{Z}^{n\times d}$

to find unique rows $A_u\subseteq A$: \[A= \{\text{round}(E^{-1}p/\epsilon)\mid p\in P\}\] and finally the sampling \[\{P\}_\epsilon= EA_u\epsilon.\] If one attacks the census info $n(q), q\in \{P\}_\epsilon$, which tells how many samples $p\in P$ gets rounded to $q$, we have achieved a histogram with a rather nice spatial regularity, which does not occur with the Poisson disc sampling. Otherwise, these two methods (Poisson and rounding by a slanted grid $E$) have equal usage in ML. The time complexity is of $\mathcal{O}(nd)$ when a nice hashing method per each column will be used to find unique rows of $A$. (The ranges at each column of $A$ can be balanced by first centering $P$ to origo).

Thursday, June 20, 2024

Tangentially, yours!

A problem popped recently to the mental horizon: how close a predicted trajectory of a ship comes to a real trajectory? For a moment, let us consider only the unit sphere and unit vectors and we happily identify points $p$ of a unit sphere centered to origo $O$ with vectors $p-O$... Also, we identify a point $p= (\theta,\phi)$ expressed by its latitude $\phi$ and longitude $\theta$ with its 3D map $p\rightarrow q$ image $q= (\cos\theta\cos\phi,\, \cos\theta\sin\phi,\, \sin\theta)$. So the reader, be aware. (And if the notation gets you baffled, check the notation page...)

Let us assume that an arc $[p_0,b]$ is a part of the predicted trajectory and $p_1$ is the ground truth (the pun intended). $p_0$ can be the last reference point and $b= \hat{p}_1$ is the prediction for $p_1$. The problem can be presented as finding the arc distance of a point $p_1$ from a great circle $C$ defined by the arc $[p_0,b]\subset C$. This great circle $C$ can be identified with a unit vector $c$, which is a normal of the plane of $C$.

Fig. 1. A point $a\in [b,p_0]$ closest to a point $p_1$.

OK, $c\perp b$ and $c\perp p_0$, that is: $c= \pm(p_0\times b)^0$ and we can find a vector $a\in [p_0,b]\subseteq C$ which is a projection of $p_1$ on the plane $C$ with a suitable scaling so that $\|a\|=1$: \[a= (p_1 - (p_1\cdot c) c)^0.\;\; (1)\] Now, arc lengths shown in Fig. 1 are: \[\alpha_\perp= |[a,p_1]|= \pi/2-\cos^{-1}(c\cdot p_1)\;\; (2) \\ \alpha_\parallel= |[a,p_0]|= \cos^{-1}(a\cdot p_0).\;\; (3)\].

The definition of the normal vector $c$ has the sign undecided. One needs to fix it so that $c\cdot p_1 \ge 0$. What comes to the planet Earth with a radius $R= 6370$ km, we get \[l_{[ab]}= R|[a,b]| = R\cos^{-1}(a\cdot b)\;\;(4)\]

Eq. (2) is useful in estimating the relative randezvous error $e_p=|[a,p_1]|/|[p_0,p_1]|$ and Eq. (3) leads to the relative time error $e_t= |[p_0,a]|/|[p_0,b]|-1$. The estimate of $e_t$ assumes time information was not a part of the teaching of the predictor, which is not always the case.

Now, the question arises... Should we use spherical geometry here? Would computations of $e_p$ and $e_t$ become faster, or would the numerical accuracy be better, or both? The latter is important, since it is easy to have numerical instability at distances below 20 km. As it will be revealed soon (after the Midsummer Feast, I hope), the answer is a surprisingly close call. In general, a lot of trigonometric trickery can be substituted by vector formulas (using either geometric algebra or vector algebra) without much loss in computational efficiency, when operations happen in the modern processing environments and intermediate results have enough memory to dwell in...

But while you are waiting, here is the way where all points get projected to a plane defined by points $p_0$,$p_1$ and $b$: $l_\perp= \sqrt{1-\cos(\beta)^2}\|p_1-p_0\|$ and $l_\parallel= \cos(\beta)\,\|p_1-p_0\|$, where $\cos(\beta)= (p_1-p_0)^0\cdot (b-p_0)^0$.

It yields quite good results as long as the distances are under 200 km ($|[u,v]|/\|u-v\| - 1 < 32\times10^{-3}$).

Monday, April 8, 2024

Eliminating close gaps between points

One posting (When do point sets intersect(part 1) ) touched Poisson disk sampling (PDS). A grid based approximation was used there because ... it served the particular case considered there. There are many good algorithms, e.g. Bridson's algorithm (2007) is of $\mathcal{O}(d |A|)$, where $A\subset\mathbb{R}^d$ is a point set dense enough that the Poisson disk radius $r$ is smaller than the mean length $l_0$ between points: $r< l_0$.

PDS is sometimes needed in some manifold operations (e.g. for creating a well-behaving kernel, creating a balanced sampling) and quite fast algorithms exist for metric spaces. The defining property of PDS is that natural neighbor (NN) samples tend to be at distance $r$. We are interested in a case, where the PDS distance $r < l_0$ is used to remove close points (red circles in Fig. 1) so that the distance histogram becomes cut (e.g. to apply algorithms, which are numerically sensitive to small gaps. We also want the samples not to get attracted by local voids, therefore we fill in some temporary bogus samples (green in the Fig. 1).

Fig. 1. The initial step (black dots) is modified by removing close points (red circles) and adding bogus points (green) to free spaces.

The algorithm eliminates some points from the dense areas of the PC $A$ and adds the same amount of points to a PC $B$. The algorithm focuses on a set $C_i=A_i\cup B_i$ on each iteration $i$ over steps 1-3. Initially $i:= 0,\; A_0:= A$ and $B_0 =\; \{\}$:

Iteration $i$: Produce a Delaunay triangularization $(C_i,T_i)$.
Remove some points $p\in C_i$, which are very close to another point
Add the same amount of points to centers $c_t$ of largest triangles $t\in T_i$.
Go to 1 as long as changes happen in steps 2 and 3.
Output $A_n$ or $C_n$ depending on the application case.

This is actually just a scheme for a set of algorithms, and there is a chance for many variations at:

Step 1: $(A,T)$ can be approximative, especially when dimensionality $3<d$. Then, a set $NN(p)$ (in 2D case a triangle) can have a variable size, i.e. $|NN(p)| \equiv d+1$ characteristic to Delaunay simplices does not need to hold.
Step 2: Number $k$ of points to be removed can be controlled. For example: remove $p$ or $q$ if $\|p-q\| \le l_{min}$. Increase $l_{min}$ slowly until it reaches a value $r$ you want.
Step 3: Size of triangles $t$ corresponds to their measure (area/volume/hypervolume...) or an approximation of the measure. Addition may not be possible at some iterations, but addition may resume later. If there is an initial set with much higher density, the added points can be substituted by the nearest match; otherwise the added points $B$ will be ignored from the final result.
Step 4 can have a control for the current $l_{min}$ to approach the Poisson disk radius $r$.

The step 3 fills in synthetical points $b\in B$, which do not belong to the original data set. This is to prevent Poisson points concentrating to the edges of gaps in the data (phenomenon, which is prevalent in higher dimensions).

The iteration time of the above algorithm depends on the ratio $\lambda= r/l_0$. And here is the interesting question: how fast the iteration time grows when $r$ grows towards $l_0$? There have to be a certain point where there is no space left for addition of bogus points...

Anyways, if this algorithm scheme is used just to eliminate close points, one has to remove the synthetical points. Misquoting Hamlet: "There are more things about creating synthetical points on a manifold, Horatio, than are dreamt of by your brain".

Hexagonal lattice rotated