北海道大学大学院情報科学研究科

公開ソフトウェア

  • HOME »
  • 公開ソフトウェア

Open softwares maintained by PRML lab

Fast PC k-NN algorithm

We developed a probably correct k-nearest neighbor search algorithm that works even for very large dimensionality. It receives a tolerance error and outputs a correct answer with error below the specified error rate. The key idea is very simple, if a pair is close enough to each other in the original high-dimensional space, then they must be close even in a few dimensions. It is clearly true if we have a sufficient number of samples. However, we have to be careful that the order cannot be preserved anymore, that is, 1-NN in the original space cannot stay as 1-NN in the projection space. So, we evaluate in probability the possible distance of 1-NN’s in the selected projection space of a small number of (PCA) features. For a given tolerance error, we narrow the search area in the projection space so as to keep the possibility of outreach the area is less than the specified value. For the detail, see reference [1]

[1] J. Toyama, M. Kudo and H. Imai, Probably Correct k-Nearest Neighbor Search in High Dimensions. Pattern Recognition, Vol.43, No.4, 1361-1372.(DOI information: 10.1016/j.patcog.2009.09.026)

You may download the source code written in C++ from below

 

Matlab/Octave library for Multi-Label Classification

We developed a library for multi-label classification using Matlab. It enables us to combine many algorithms on feature dimension reduction, label dimension reduction and clustering. It covers many state-of-the-art algorithms for multi-label classification. Datasets are also included.

Kernelized Supervised Laplacian Eigenmap for Visualization and Classification of Multi-Label Data

We had previously proposed a supervised Laplacian eigenmap for visualization (SLE-ML) that can handle multi-label data. In addition, SLE-ML can control the trade-off between the class separability and local structure by a single trade-off parameter. However, SLE-ML cannot transform new data, that is, it has the ``out-of-sample'' problem. In this paper, we show that this problem is solvable, that is, it is possible to simulate the same transformation perfectly using a set of linear sums of reproducing kernels (KSLE-ML) with a nonsingular Gram matrix. We experimentally showed that the difference between training and testing is not large; thus, a high separability of classes in a low-dimensional space is realizable with KSLE-ML by assigning an appropriate value to the trade-off parameter. This offers the possibility of separability-guided feature extraction for classification. In addition, to optimize the performance of KSLE-ML, we conducted both kernel selection and parameter selection. As a result,  it is shown that parameter selection is more important than kernel selection. We experimentally demonstrated the advantage of using KSLE-ML for visualization and for feature extraction compared with a few typical algorithms.

[2] Mariko Tai, Mineichi Kudo, Akira Tanaka, Hideyuki Imai and Keigo Kimura, “Kernelized Supervised Laplacian Eigenmap for Visualization and Classification of Multi-Label Data.”  Pattern Recognition. Online paper is available at  DOI:10.1016/j.patcog.2021.108399 (26 Oct 2021).

PAGETOP
Copyright © 情報認識学研究室 All Rights Reserved.
Powered by WordPress & BizVektor Theme by Vektor,Inc. technology.