Graduate School of Information Science and Technology, Hokkaido University

Imbalance problems

  • HOME »
  • Imbalance problems

An Imbalanced Dataset is a dataset that has a large difference in the number of samples between head classes and tail classes. It is well known that classifiers trained from such a dataset underestimate tail classes because misclassification of them does not give a large impact on the total accuracy. Recently many practical problems are largely imbalanced such as anomaly detection, medical diagnosis and e-mail filtering. When the number of classes is very large and many tail classes have extremely small numbers of samples, imbalanced problems are also called long-tailed problems. We study ways to apply long-tailed problems.

An example of ling-tailed dataset. The classes are sorted in descending order of sample sizes. (birds dataset from UCI Machine Learning Repository)

Copyright © 情報認識学研究室 All Rights Reserved.
Powered by WordPress & BizVektor Theme by Vektor,Inc. technology.