Efficient K-nearest neighbor queries using clustering with caching
MetadataShow full item record
We introduce a new algorithm for K-nearest neighbor queries that uses clustering and caching to improve performance. The main idea is to reduce the distance computation cost between the query point and the data points in the data set. We use a divide-and-conquer approach. First, we divide the training data into clusters based on similarity between the data points in terms of Euclidean distance. Next we use linearization for faster lookup. The data points in a cluster can be sorted based on their similarity (measured by Euclidean distance) to the center of the cluster. Fast search data structures such as the B-tree can be utilized to store data points based on their distance from the cluster center and perform fast data search. The B-tree algorithm is good for range search as well. We achieve a further performance boost by using B-tree based data caching. In this work we provide details of the algorithm, an implementation, and experimental results in a robot navigation task.