Pattern Recognition

We put the priority on the following two goals.

To develope an efficient and highly reliable classifiers
To find a harmony between data and features

Construction of classifiers

The goal is to construct the most appropriate classifier to the given problem. So far, we have improved and proposed many classifiers (algorithms):linear classifiers, quadratic classfiers, piecewise-linear classifiers [Tenmo98a], non-linear classifiers [Sato98]、k-nearest neighbor classifiers [Masu98]、mixture-model based classifiers [Tenmo98c]、projection=based classifiers [Kazuki10]. Especially, a series of subclass methods using a set of hyper-rectangles, is our original and powerful classifiers [Mine98a]. An example is an extenstion of using a set of convex hulls of data [tetsuji09,tetsuji10,tetsuji_2011]. Another characterristic works is class-dependent feature subsets selection [Kazu03].

Margin-maximizing classifiers using convex hulls

Volume prototypes

When we deal with hundreds of thousands of data, it is inefficient to keep all the data, sometimes even impossible. Alternative way is to keep only a few representative data points. Instead we have to give up the goodness of approximation of the underlying distribution. Therefore, we propose to use a set of volume prototypes instead of traditional point prototypes. A time series of high-dimensional data is generated according to a distribution changing the center and the covariance over time. Our volume prototypes can keep track such a “population drift” with a forget mechanism.

Volume prototypes (Small green dots are data points, ellipses are volume prototypes)

Fast probably correct (PC) k-nearest neighbor classifiers

The k-nearest neighbor classifier is one of most promising classifiers when a sufficient number of training samples are available. However, it needs a high computation costs, a multiplication of the number of data and the number of features. Many algorithms have been proposed to reduce the cost. One of most successful approaches is to allow approximation to the problem, that is, to solve an “approximate k-nearest neighbors” problem instead of an “exact k-nearest neighbors” problem. However, no algorthim succeeded to gain reasonable efficiency for the case of large dimensionality. To cope with this problem, we proposed a “probably correct” framework. By relaxing the perfection of the solutions, instead we gurtantee the perfection with a high probability, we showed we can reduce largely the computation cost. In some experiments, it was fastest among all well-known algorithms including LSH.

PC k-nearest neighbor (for a given query (black) point, it sufficies to check the data between two lines. a: small-data case, b:large=data case).

Feature Selection

In pattern recognition, reduction of the number of features(dimension) is often benefical for classification because too much features likely occur the curse of dimensionality. Feature selection is to find a small number of features that keeps discriminative information compared with the case of using all features. Especially, we study classifier-independent feature selection. Recently, we face a variety of classification problems, such as weather data, traffic data, and so on. Feature selection is beneficial for these problems to improve the performance.

We study classifier-independent feature selection for time-varying data using volume prototypes. Our approach can select a subset of discrimitive features that changes along with time in dynamic environment such as population drift[tsukioka2014].

This figure is shown an example, static class(center,yellow) and time-varying class(around, red).In this figure, each ellipse is a volume prototype. We select a subset of features using divergence criterion between two classes, such as red and yellow.

Multi-label Classification

In recent years we have witnessed the increasing demand of multi-label classification (MLC) in a wide range of applications, such as text categorization, semantic image annotation, bioinformatics analysis and audio emotion detection, for which reason numerous classic machine learning techniques have been specifically designed and successfully utilized for MLC.

Different from traditional multi-class classification, where each instance is only associated with a single label, the task of MLC is to assign a set of labels from a limited number of labels to an unseen instance. In order to illustrate MLC, let’s consider the problem of classifying news into multiple categories. For instance, given a news article titled “Ukrainian banks quit Crimea”, obviously it could be associated with a label set comprising at least three labels: “Europe”, “Economy” and “Politics”, as it is a news indicating a European country’s economic decision influenced highly by regional politics in Ukrainian crisis. Apparently it poses a huge challenge to the researchers dedicating to cope with real-world multi-label problems as a result of the difficulties of learning the number of labels belonging to an instance and modeling label dependency. 担当: Sun lu

An simple example for comparing traditional classification problem with multi-label problem.

(a) is a typical multi-class classification problem, where examples with two classes are difficult to separate in the feature space. (b) is a multi-label problem, where the * data belongs to both of the two classes simultaneously.