본문 바로가기
딥러닝

multi class classification

by 볼록티 2020. 6. 20.
728x90
반응형
lec04-2-multiclass-classification-newswires

Classifying newswires: a multiclass classification example

  • Now, we know how to classify vector inputs into two mutually exclusive classes using a densely connected neural networks.
  • Here, we will build a network to classify Reuters newswires into 46 mutually exclusive topics.
  • Since we have many classes, this problem is an instance of multi-class classification.
  • single-label, multiclass classification VS multilabel, multiclass classification

데이터가 여러개의 클래스에 속하느냐 하나의 클래스에만 속하느냐 나뉘는데, 여기서는 single-label multiclass classification에 대해 알아본다.

The Reuters dataset

  • A set of short newswires and their topics, published by Reuters in 1986

데이터를 로드한다.

In [1]:
from tensorflow.keras.datasets import reuters

# Like IMDB, the argument num_words restricts the data to 
# the 10,000 most frequently occurring words 
(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)
In [2]:
len(train_data)
Out[2]:
8982
In [3]:
len(test_data)
Out[3]:
2246
In [4]:
train_data[10] # 숫자들이 리스트로 이루고 있고, 각 개별적인 인덱스는 word를 나타낸다.
Out[4]:
[1,
 245,
 273,
 207,
 156,
 53,
 74,
 160,
 26,
 14,
 46,
 296,
 26,
 39,
 74,
 2979,
 3554,
 14,
 46,
 4689,
 4329,
 86,
 61,
 3499,
 4795,
 14,
 61,
 451,
 4329,
 17,
 12]
In [5]:
# decoding newswires back to text
word_index = reuters.get_word_index() # word indexing dictionary를 불러온다.
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
decoded_newswire = ' '.join([reverse_word_index.get(i-3, '?') for i in train_data[0]])
In [6]:
print(decoded_newswire) # 인덱스를 가지고 원래의 단어로 표현해본다.
? ? ? said as a result of its december acquisition of space co it expects earnings per share in 1987 of 1 15 to 1 30 dlrs per share up from 70 cts in 1986 the company said pretax net should rise to nine to 10 mln dlrs from six mln dlrs in 1986 and rental operation revenues to 19 to 22 mln dlrs from 12 5 mln dlrs it said cash flow per share this year should be 2 50 to three dlrs reuter 3
In [7]:
train_labels[10]
Out[7]:
3
In [8]:
train_labels
Out[8]:
array([ 3,  4,  3, ..., 25,  3, 25], dtype=int64)
In [9]:
len(train_labels)
Out[9]:
8982

Preparing the data

학습데이터 준비하기

In [10]:
import numpy as np


# 바이너리한 형태로 변형시켜주는 함수를 만듬.
def vectorize_sequences(sequences, dimension=10000): # 10000 dim->10000개의 단어. 
  results = np.zeros((len(sequences), dimension))
  for i, sequence in enumerate(sequences):
    results[i, sequence] = 1.
  return results

x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
In [11]:
x_train.shape
Out[11]:
(8982, 10000)
  • To vectorize the labels, we can use one-hot encoding.
  • One-hot encoding of the labels consists of embedding each label as an all-zero vector with a 1 in the place of the label index.
In [12]:
# label 값도 원핫인코딩을 하여 46개의 클래스에 대한 라벨링을 해준다.

def to_one_hot(labels, dimension=46):
  results = np.zeros((len(labels), dimension))
  for i, label in enumerate(labels):
    results[i, label] = 1.
  return results

one_hot_train_labels = to_one_hot(train_labels)
one_hot_test_labels = to_one_hot(test_labels)
In [13]:
train_labels[100]
Out[13]:
20
In [14]:
one_hot_train_labels[100]
Out[14]:
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
In [15]:
from tensorflow.keras.utils import to_categorical

one_hot_train_labels = to_categorical(train_labels)
one_hot_test_labels = to_categorical(test_labels)

Building the network

네트워크 구성하기.

In [16]:
from tensorflow.keras import models
from tensorflow.keras import layers

model = models.Sequential() # 인스턴스 생성
model.add(layers.Dense(64, activation='relu', input_shape=(10000,))) # dense layer. 노드 64개
model.add(layers.Dense(64, activation='relu')) # layer 추가
model.add(layers.Dense(46, activation='softmax')) # 노드 46개(class중 하나로 매핑해야하니까.), 클래스에 속할 확률은 softmax function을 사용.
  • softmax activation in the last layer

    • The network will ouput a probability distribution over the 46 classes.
    • For every input sample, the network will produce a 46-dimensional output vector, where output[i] is the probability that the sample belongs to class i.
    • The sum of output[i] for all i will be 1.
  • categorical_crossentropy loss

    • It measures the distance between two probability distributions: label이 가지고 있는 원핫인코딩 벡터와 네트워크가 뱉어주는 확률값으로 매핑된 벡터간의 거리를 구한다고 생각하면 된다.
    • Here, between the probability distribution output by the network and the true distribution of the labels
In [17]:
# 컴파일 하기
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

Validation

  • Use 1,000 samples in the training data as a validation set.
In [18]:
#1000개를 validation set으로 셋팅하자.
x_val = x_train[:1000]
partial_x_train = x_train[1000:]

y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]
In [19]:
history = model.fit(partial_x_train,
                    partial_y_train,
                    epochs=20,
                    batch_size=512,
                    validation_data=(x_val, y_val))
Train on 7982 samples, validate on 1000 samples
Epoch 1/20
7982/7982 [==============================] - 1s 162us/sample - loss: 2.5312 - accuracy: 0.5449 - val_loss: 1.6569 - val_accuracy: 0.6370
Epoch 2/20
7982/7982 [==============================] - 0s 57us/sample - loss: 1.3874 - accuracy: 0.7088 - val_loss: 1.2821 - val_accuracy: 0.7080
Epoch 3/20
7982/7982 [==============================] - 0s 58us/sample - loss: 1.0481 - accuracy: 0.7725 - val_loss: 1.1513 - val_accuracy: 0.7360
Epoch 4/20
7982/7982 [==============================] - 0s 57us/sample - loss: 0.8402 - accuracy: 0.8166 - val_loss: 1.0181 - val_accuracy: 0.7790
Epoch 5/20
7982/7982 [==============================] - 0s 57us/sample - loss: 0.6676 - accuracy: 0.8566 - val_loss: 0.9655 - val_accuracy: 0.7910
Epoch 6/20
7982/7982 [==============================] - 0s 57us/sample - loss: 0.5413 - accuracy: 0.8835 - val_loss: 0.9245 - val_accuracy: 0.8090
Epoch 7/20
7982/7982 [==============================] - 0s 57us/sample - loss: 0.4296 - accuracy: 0.9104 - val_loss: 0.9149 - val_accuracy: 0.8040
Epoch 8/20
7982/7982 [==============================] - 0s 58us/sample - loss: 0.3484 - accuracy: 0.9263 - val_loss: 0.8869 - val_accuracy: 0.8210
Epoch 9/20
7982/7982 [==============================] - 0s 57us/sample - loss: 0.2843 - accuracy: 0.9380 - val_loss: 0.8682 - val_accuracy: 0.8220
Epoch 10/20
7982/7982 [==============================] - 0s 58us/sample - loss: 0.2431 - accuracy: 0.9444 - val_loss: 0.8768 - val_accuracy: 0.8200
Epoch 11/20
7982/7982 [==============================] - 0s 57us/sample - loss: 0.2032 - accuracy: 0.9489 - val_loss: 0.8887 - val_accuracy: 0.8180
Epoch 12/20
7982/7982 [==============================] - 0s 57us/sample - loss: 0.1801 - accuracy: 0.9530 - val_loss: 0.9167 - val_accuracy: 0.8100
Epoch 13/20
7982/7982 [==============================] - 0s 57us/sample - loss: 0.1636 - accuracy: 0.9560 - val_loss: 0.9170 - val_accuracy: 0.8060
Epoch 14/20
7982/7982 [==============================] - 0s 57us/sample - loss: 0.1502 - accuracy: 0.9564 - val_loss: 0.9419 - val_accuracy: 0.8090
Epoch 15/20
7982/7982 [==============================] - 0s 57us/sample - loss: 0.1365 - accuracy: 0.9560 - val_loss: 0.9945 - val_accuracy: 0.8070
Epoch 16/20
7982/7982 [==============================] - 0s 58us/sample - loss: 0.1331 - accuracy: 0.9567 - val_loss: 0.9775 - val_accuracy: 0.8110
Epoch 17/20
7982/7982 [==============================] - 0s 57us/sample - loss: 0.1262 - accuracy: 0.9573 - val_loss: 1.0285 - val_accuracy: 0.8100
Epoch 18/20
7982/7982 [==============================] - 0s 60us/sample - loss: 0.1173 - accuracy: 0.9577 - val_loss: 1.0583 - val_accuracy: 0.7870
Epoch 19/20
7982/7982 [==============================] - 0s 59us/sample - loss: 0.1186 - accuracy: 0.9574 - val_loss: 1.0790 - val_accuracy: 0.7990
Epoch 20/20
7982/7982 [==============================] - 0s 51us/sample - loss: 0.1147 - accuracy: 0.9555 - val_loss: 1.0626 - val_accuracy: 0.7990

val set은 loss가 줄어들다가 증가하는 것을 확인할 수 있다.

  • Plotting the training and validation loss
In [20]:
import matplotlib.pyplot as plt

loss = history.history['loss'] 
val_loss = history.history['val_loss']

epochs = range(1, len(loss) + 1)

plt.plot(epochs, loss, 'bo', label='Training loss') 
plt.plot(epochs, val_loss, 'b', label='Validation loss') 
plt.title('Training and validation loss') 
plt.xlabel('Epochs') 
plt.ylabel('Loss') 
plt.legend()

plt.show()
  • Plotting the training and validation accuracy
In [21]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.show()

9 epoch정도면 되겠다는 판단을 하여 다시 네트워크 모델을 학습한다.

  • We can observe that the network begins to overfit after nine epochs.
  • Retraining a model from scratch
In [22]:
model.fit(partial_x_train, 
          partial_y_train, 
          epochs=9, 
          batch_size=512, 
          validation_data=(x_val, y_val)) 

results = model.evaluate(x_test, one_hot_test_labels)
Train on 7982 samples, validate on 1000 samples
Epoch 1/9
7982/7982 [==============================] - 1s 67us/sample - loss: 0.1070 - accuracy: 0.9579 - val_loss: 1.0627 - val_accuracy: 0.8090
Epoch 2/9
7982/7982 [==============================] - 0s 58us/sample - loss: 0.1055 - accuracy: 0.9592 - val_loss: 1.2025 - val_accuracy: 0.7860
Epoch 3/9
7982/7982 [==============================] - 0s 60us/sample - loss: 0.1046 - accuracy: 0.9557 - val_loss: 1.0833 - val_accuracy: 0.8010
Epoch 4/9
7982/7982 [==============================] - 0s 59us/sample - loss: 0.1024 - accuracy: 0.9577 - val_loss: 1.1566 - val_accuracy: 0.7960
Epoch 5/9
7982/7982 [==============================] - 0s 58us/sample - loss: 0.1002 - accuracy: 0.9579 - val_loss: 1.1376 - val_accuracy: 0.7980
Epoch 6/9
7982/7982 [==============================] - 0s 58us/sample - loss: 0.0985 - accuracy: 0.9583 - val_loss: 1.1708 - val_accuracy: 0.7940
Epoch 7/9
7982/7982 [==============================] - 0s 59us/sample - loss: 0.0955 - accuracy: 0.9585 - val_loss: 1.1728 - val_accuracy: 0.7980
Epoch 8/9
7982/7982 [==============================] - 0s 61us/sample - loss: 0.0974 - accuracy: 0.9588 - val_loss: 1.2222 - val_accuracy: 0.7940
Epoch 9/9
7982/7982 [==============================] - 0s 55us/sample - loss: 0.0958 - accuracy: 0.9569 - val_loss: 1.1581 - val_accuracy: 0.8000
2246/2246 [==============================] - 0s 110us/sample - loss: 1.4068 - accuracy: 0.7809
In [23]:
print(results) # 크로스 엔트로피와 정확도
[1.4068127171014528, 0.7809439]

binary보다 어려운 예측이라는 것을 알 수 있다..

  • Retraining a model from scratch is not a good idea if we have a large-scale training set.
  • In this case, we can use callbacks functionality in keras.

  • Before, we need to mount Google Drive storage with our colab instance.

epoch가 너무 많으면? keras의 callbacks를 사용해서 fitting에서 한 epoch이 끝나면 performance measure를 저장해줌.

call back을 사용하면 매 epoch마다 모델을 저장해준다.

In [ ]:
from google.colab import drive
drive.mount('/content/gdrive') # 내 구글 드라이브에 nount하여 access할 수 있게 설정.
In [0]:
%cd /content/gdrive
In [0]:
!ls
In [0]:
%cd 'My Drive'/exp # 비어있는 폴더로 경로 설정
In [0]:
# 콜백 사용.
from tensorflow.keras.callbacks import ModelCheckpoint

filepath = '/content/gdrive/My Drive/exp/model.{epoch:02d}.hdf5' # gdrive밑으로는 내 드라이브랑 동기화가 되어 있음.
modelckpt = ModelCheckpoint(filepath=filepath)

model.fit(partial_x_train, 
          partial_y_train, 
          epochs=20, 
          batch_size=512, 
          validation_data=(x_val, y_val),
          callbacks=[modelckpt]) 

results = model.evaluate(x_test, one_hot_test_labels)
  • Load the trained model at epoch 9
In [0]:
best_model_path = '/content/gdrive/My Drive/exp/model.09.hdf5'
best_model = models.load_model(best_model_path) # 다시 모델 인스턴스를 불러온다. weight를 전부 그대로 쇽 가져옴.
In [0]:
results = best_model.evaluate(x_test, one_hot_test_labels)
print(results)

Generating predictions on new data

In [0]:
predictions = model.predict(x_test)
In [0]:
predictions.shape
In [0]:
predictions[0]
In [0]:
np.sum(predictions[0])
In [0]:
np.argmax(predictions[0])

A different way to handle the labels and the loss

In [0]:
y_train = np.array(train_labels)
y_test = np.array(test_labels)
In [0]:
model.compile(optimizer='rmsprop',
              loss='sparse_categorical_crossentropy', #입력으로 원핫인코딩하는게 귀찮으면 사용.
              metrics=['acc'])

The importance of having sufficiently large hidden layers

In [25]:
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(4, activation='relu')) # 4개를 사용한다면? 자유도가 줄어든다. 그렇다고 너무 많이 사용하면 overfit이 된다.
model.add(layers.Dense(46, activation='softmax'))

model.compile(optimizer='rmsprop',
               loss='categorical_crossentropy',
               metrics=['accuracy'])

model.fit(partial_x_train,
          partial_y_train,
          epochs=20,
          batch_size=128,
          validation_data=(x_val, y_val))
Train on 7982 samples, validate on 1000 samples
Epoch 1/20
7982/7982 [==============================] - 1s 134us/sample - loss: 3.1503 - accuracy: 0.1709 - val_loss: 2.5320 - val_accuracy: 0.4360
Epoch 2/20
7982/7982 [==============================] - 1s 75us/sample - loss: 2.0464 - accuracy: 0.4535 - val_loss: 1.8149 - val_accuracy: 0.4610
Epoch 3/20
7982/7982 [==============================] - 1s 76us/sample - loss: 1.5929 - accuracy: 0.5028 - val_loss: 1.5725 - val_accuracy: 0.6510
Epoch 4/20
7982/7982 [==============================] - 1s 79us/sample - loss: 1.3414 - accuracy: 0.6937 - val_loss: 1.4370 - val_accuracy: 0.6710
Epoch 5/20
7982/7982 [==============================] - 1s 79us/sample - loss: 1.1907 - accuracy: 0.7135 - val_loss: 1.3901 - val_accuracy: 0.6650
Epoch 6/20
7982/7982 [==============================] - 1s 77us/sample - loss: 1.0940 - accuracy: 0.7270 - val_loss: 1.3904 - val_accuracy: 0.6740
Epoch 7/20
7982/7982 [==============================] - 1s 80us/sample - loss: 1.0211 - accuracy: 0.7414 - val_loss: 1.3542 - val_accuracy: 0.6730
Epoch 8/20
7982/7982 [==============================] - 1s 80us/sample - loss: 0.9591 - accuracy: 0.7506 - val_loss: 1.3948 - val_accuracy: 0.6710
Epoch 9/20
7982/7982 [==============================] - 1s 78us/sample - loss: 0.9082 - accuracy: 0.7583 - val_loss: 1.4124 - val_accuracy: 0.6730
Epoch 10/20
7982/7982 [==============================] - 1s 77us/sample - loss: 0.8637 - accuracy: 0.7750 - val_loss: 1.4120 - val_accuracy: 0.6750
Epoch 11/20
7982/7982 [==============================] - 1s 78us/sample - loss: 0.8217 - accuracy: 0.7845 - val_loss: 1.4878 - val_accuracy: 0.6790
Epoch 12/20
7982/7982 [==============================] - 1s 77us/sample - loss: 0.7874 - accuracy: 0.7886 - val_loss: 1.4796 - val_accuracy: 0.6760
Epoch 13/20
7982/7982 [==============================] - 1s 78us/sample - loss: 0.7573 - accuracy: 0.7947 - val_loss: 1.5259 - val_accuracy: 0.6820
Epoch 14/20
7982/7982 [==============================] - 1s 81us/sample - loss: 0.7291 - accuracy: 0.8044 - val_loss: 1.5988 - val_accuracy: 0.6650
Epoch 15/20
7982/7982 [==============================] - 1s 78us/sample - loss: 0.7042 - accuracy: 0.8067 - val_loss: 1.6308 - val_accuracy: 0.6800
Epoch 16/20
7982/7982 [==============================] - 1s 78us/sample - loss: 0.6840 - accuracy: 0.8175 - val_loss: 1.6486 - val_accuracy: 0.6740
Epoch 17/20
7982/7982 [==============================] - 1s 77us/sample - loss: 0.6657 - accuracy: 0.8247 - val_loss: 1.7441 - val_accuracy: 0.6780
Epoch 18/20
7982/7982 [==============================] - 1s 76us/sample - loss: 0.6477 - accuracy: 0.8241 - val_loss: 1.7829 - val_accuracy: 0.6710
Epoch 19/20
7982/7982 [==============================] - 1s 77us/sample - loss: 0.6313 - accuracy: 0.8244 - val_loss: 1.8267 - val_accuracy: 0.6730
Epoch 20/20
7982/7982 [==============================] - 1s 75us/sample - loss: 0.6188 - accuracy: 0.8257 - val_loss: 1.8867 - val_accuracy: 0.6720
Out[25]:
<tensorflow.python.keras.callbacks.History at 0x1d7eef9ccf8>

최적의 모형을 위해 그에 맞는 parameter를 잘 찾아줘야한다..

call back을 사용하여 내 드라이브에 저장하는 방법을 잘 알아두자.

여기서는 노트북으로 작업해서 colab 환경이 아니므로 실행하지 못하였다.

In [0]:
 

 

 

 

 

 

 

SangheumHwang[deep learning class]

728x90
반응형

'딥러닝' 카테고리의 다른 글

fundamentals of machine learning  (0) 2020.06.20
regression  (0) 2020.06.20
binary classification_multi perceptron  (0) 2020.06.20
MNIST 데이터를 활용한 딥러닝 기초  (2) 2020.05.16
4. How Deep learning work  (0) 2020.03.28

댓글