Efficient Learning Using Sufficient Labels: Labels, Information, and Computation

Authors: Shiyu Duan, Spencer Chang, Jose C. Principe; Published in 2023, Volume 24(31):1−35.

Abstract

Supervised learning often requires a large amount of fully-labeled training data, which can be costly to obtain. However, it is not always necessary to have full label information for every training example in order to train an effective classifier. Drawing inspiration from the principle of sufficiency in statistics, we propose the concept of “sufficiently-labeled data” as a statistic or summary that captures nearly all the relevant information for classification. This statistic is easier to obtain directly and has proven to be sufficient and efficient in finding optimal hidden representations. With as little as a single randomly-chosen fully-labeled example per class, competent classifier heads can be trained. The advantage of sufficiently-labeled data is that it can be obtained directly from annotators without the need for collecting fully-labeled data first. Additionally, obtaining sufficiently-labeled data is easier compared to obtaining fully-labeled data. Moreover, sufficiently-labeled data inherently provides more security as it stores relative information rather than absolute information. Extensive experimental results are provided to support our theory.

[abs]

[pdf][bib]