Maximizing information, minimizing costs: Optimal selection of variables in multi-label classification
The article "Cost-constrained feature selection in multilabel classification using an information-theoretic approach " has been published in the Pattern Recognition journal. The article was co-authored by Tomasz Klonecki, a student of the Information and Biomedical Technologies Polish Academy of Sciences (TIB PAN) Doctoral School, Dr. Paweł Teisseyre from the Institute of Computer Science of the Polish Academy of Sciences and Prof. Jaesung Lee from Chung-Ang University, Rebublic of Korea.
In the paper, the authors proposed a new method of selecting variables for the multi-label classification model. Multi-label classification refers to a situation where multiple target variables are predicted simultaneously (for example, different diseases in a patient) based on explanatory variables. Contrary to existing approaches, the described method makes it possible to take into account information about the costs associated with obtaining variable values. The algorithm is based on the use of the information theory to define a variable importance measure.
The problem of cost-sensitive variable selection, described in the paper, is of great practical importance, especially in medical applications where the acquisition of variable values is often associated with very high costs (performance of tests or diagnostic examinations). The presented method can be used in conjunction with any classification model (linear classifiers, neural networks). It can be recommended when the budget for obtaining variable values is limited.
The work summarizes part of the research conducted by Tomasz Klonecki as part of the preparation of his doctoral thesis.
The article is available on the publisher website: doi: 10.1016/j.patcog.2023.109605.