使用迁移学习完成图像的多标签分类任务

本文通过迁移学习将预训练好的VGG16模型应用到图像的多标签(Multi-Label)分类问题中。该项目数据来自于Kaggle,每张图片可同时属于多个标签。模型的准确度使用F_{\beta} \text{ score}进行量化,如下表所示:

标签预测为Positive (1)预测为Negative (0)
真值为Positive (1)TPFN
真值为Negative (0)FPTN

例如假设真实标签是(1,0,1,1,0,0),预测标签是(1,1,0,1,1,0),则TP=2, FN=1, FP=2, TN=1Precision=\frac{TP}{TP+FP},\text{  }Recall=\frac{TP}{TP+FN},\text{  }F_{\beta}=\frac{(1+\beta^2)*Presicion*Recall}{Recall+\beta^2*Precision}其中\beta越小,F_{\beta} \text{ score}中Precision的权重越大,\beta等于0F_{\beta} \text{ score}就变为Precision;\beta越大,F_{\beta} \text{ score}中Recall的权重越大,\beta趋于无穷大时F_{\beta} \text{ score}就变为Recall。可以在Keras中自定义该函数(y_pred表示预测概率):

此外在损失函数的使用上多标签分类和多类别(multi-class)分类也有区别,多标签分类使用\text{binary crossentropy loss},假设一个样本的真实标签是(1,0,1,1,0,0),预测概率是(0.2, 0.3, 0.4, 0.7, 0.9, 0.2)\text{binary crossentropy loss}=-(\ln 0.2 + \ln 0.7 + \ln 0.4 + \ln 0.7 + \ln 0.1 + \ln 0.8)/6=0.96另外多标签分类输出层的激活函数应选择sigmoid而非softmax。模型架构如下所示:

1.Kaggle网站上下载数据并解压,将其处理成可被模型读取的数据格式:

点击查看代码
from os import listdir
from numpy import zeros, asarray, savez_compressed
from pandas import read_csv
from tensorflow.keras.preprocessing.image import load_img, img_to_array

# create a mapping of tags to integers given the loaded mapping file
def create_tag_mapping(mapping_csv):
    labels = set() # create a set of all known tags
    for i in range(len(mapping_csv)):
        tags = mapping_csv['tags'][i].split(' ') # convert spaced separated tags into an array of tags
        labels.update(tags) # add tags to the set of known labels
    labels = sorted(list(labels)) # convert set of labels to a sorted list 
    # dict that maps labels to integers, and the reverse
    labels_map = {labels[i]:i for i in range(len(labels))}
    inv_labels_map = {i:labels[i] for i in range(len(labels))}
    return labels_map, inv_labels_map

# create a mapping of filename to a list of tags
def create_file_mapping(mapping_csv):
    mapping = dict()
    for i in range(len(mapping_csv)):
        name, tags = mapping_csv['image_name'][i], mapping_csv['tags'][i]
        mapping[name] = tags.split(' ')
    return mapping

# create a one hot encoding for one list of tags
def one_hot_encode(tags, mapping):
    encoding = zeros(len(mapping), dtype='uint8') # create empty vector
    # mark 1 for each tag in the vector
    for tag in tags: encoding[mapping[tag]] = 1
    return encoding

# load all images into memory
def load_dataset(path, file_mapping, tag_mapping):
    photos, targets = list(), list()
    # enumerate files in the directory
    for filename in listdir(path):
        photo = load_img(path + filename, target_size=(128,128)) # load image
        photo = img_to_array(photo, dtype='uint8') # convert to numpy array
        tags = file_mapping[filename[:-4]] # get tags
        target = one_hot_encode(tags, tag_mapping) # one hot encode tags
        photos.append(photo)
        targets.append(target)
    X = asarray(photos, dtype='uint8')
    y = asarray(targets, dtype='uint8')
    return X, y

filename = 'train_v2.csv' # load the target file
mapping_csv = read_csv(filename)
tag_mapping, _ = create_tag_mapping(mapping_csv) # create a mapping of tags to integers
file_mapping = create_file_mapping(mapping_csv) # create a mapping of filenames to tag lists
folder = 'train-jpg/' # load the jpeg images
X, y = load_dataset(folder, file_mapping, tag_mapping)
print(X.shape, y.shape)
savez_compressed('planet_data.npz', X, y) # save both arrays to one file in compressed format

2. 建立两个辅助函数,第一个函数用来分割训练集和验证集,第二个函数用来画出模型在训练过程中的学习曲线:

3. 使用Data Augmentation增加样本并对模型进行训练:

蓝线代表训练集,黄线代表验证集

Leave a Comment

Your email address will not be published. Required fields are marked *