What is a KdTree?!
Saturday, September 4th, 2010So what is a KdTree? There are heaps of explanations you can find on the web. Wikipedia’s explanation is ok -although kinda hard to read. Anyway I figured I’d give a quick overview of what its all about.
So a KdTree is a type of “spatial partitioning” data structure. You feed in a bunch of spatial data and the KdTree organizes it in a way that makes certain queries really fast. In particular nearest neighbour searches.
I originally started toying with this after trying to get my skin weight loading script a bit smarter when dealing with radically different geometries. Searching within a tolerance doesn’t work well if the geometries can be vastly different. After all, some verts could be right next to each other, while others have matches further away. Weighting the further points helps, but it can screw up the points that have more exact matches.
So how do we solve this? Well I use an algorithm that finds the closest point, grabs the distance to this closest point and given a ratio looks for others within this radius. You give the search function a factor – it defaults to 2 – and it sees if there are any other points within that radius. So basically the tolerance is dynamic, not fixed. So if the closest point is very close, the search radius used is very small. However if the closest point is very far away, then all points within a much larger radius get considered. It turns out this works really well when transferring vertex data like skin weighting between radically different geometries.
Anyway, this is totally possible to do using the binary search, but because we’re now searching twice (finds the closest then the tolerance search) we now need every single bit of speed we can extract – hence my delving into faster ways of searching. Of course, I should probably be doing this all in c++, but… Maybe next.
Let me finish off with some numbers. On my 3 year old, dual core 2Ghz laptop, given a 10k point KdTree, I can run 10k queries on the data in about 2.2 seconds given similar inputs, or if the data is vastly different it takes about 7 seconds. As you can see, the early out test I wrote about in the previous post can make the code up to 4 times faster. By contrast, the binary search algorithm takes about 25 seconds.




