Open Software

Open software maintained by PRML lab

Fast PC k-NN algorithm

We developed a probably correct k-nearest neighbor search algorithm that works even for very large dimensionality. It receives a tolerance error and outputs a correct answer with an error below the specified error. The key idea is very simple, if a pair is close enough to each other in the original high-dimensional space, then they must be close even in a few dimensions. It is clearly true if we have a sufficient number of samples. However, we have to be careful that the order cannot be preserved anymore, that is, 1-NN in the original space cannot stay as 1-NN in the projection space. So, we evaluate in probability the possible distance of 1-NN’s in the selected projection space of a small number of (PCA) features. For a given tolerance error, we narrow the search area in the projection space so as to keep the possibility of outreach the area is less than the specified value. For the detail, see reference [1]

[1] J. Toyama, M. Kudo and H. Imai, “Probably Correct k-Nearest Neighbor Search in High Dimensions”, Pattern Recognition, in press, (DOI information: 10.1016/j.patcog.2009.09.026).

kNN_MDS(Ver1.3)：mds_rb1.0.tgz

Kernelized Supervised Laplacian Eigenmap for Visualization and Classification of Multi-Label Data

We had previously proposed a supervised Laplacian eigenmap for visualization (SLE-ML) that can handle multi-label data. In addition, SLE-ML can control the trade-off between the class separability and local structure by a single trade-off parameter. However, SLE-ML cannot transform new data, that is, it has the “out-of-sample” problem. In this paper, we show that this problem is solvable, that is, it is possible to simulate the same transformation perfectly using a set of linear sums of reproducing kernels (KSLE-ML) with a nonsingular Gram matrix. We experimentally showed that the difference between training and testing is not large; thus, a high separability of classes in a low-dimensional space is realizable with KSLE-ML by assigning an appropriate value to the trade-off parameter. This offers the possibility of separability-guided feature extraction for classification. In addition, to optimize the performance of KSLE-ML, we conducted both kernel selection and parameter selection. As a result, it is shown that parameter selection is more important than kernel selection. We experimentally demonstrated the advantage of using KSLE-ML for visualization and for feature extraction compared with a few typical algorithms.

[2] Mariko Tai, Mineichi Kudo, Akira Tanaka, Hideyuki Imai and Keigo Kimura, "Kernelized Supervised Laplacian Eigenmap for Visualization and Classification of Multi-Label Data."  Pattern Recognition. Online paper is available at  DOI:10.1016/j.patcog.2021.108399 (26 Oct 2021).


Source code (python) is available in Jupyter notebook at https://github.com/KudoMineichi/KSLE-ML

MENU

Open software maintained by PRML lab

Fast PC k-NN algorithm

Kernelized Supervised Laplacian Eigenmap for Visualization and Classification of Multi-Label Data