728x90
반응형
Classifying newswires: a multiclass classification example¶
- Now, we know how to classify vector inputs into two mutually exclusive classes using a densely connected neural networks.
- Here, we will build a network to classify Reuters newswires into 46 mutually exclusive topics.
- Since we have many classes, this problem is an instance of multi-class classification.
- single-label, multiclass classification VS multilabel, multiclass classification
데이터가 여러개의 클래스에 속하느냐 하나의 클래스에만 속하느냐 나뉘는데, 여기서는 single-label multiclass classification에 대해 알아본다.¶
The Reuters dataset¶
- A set of short newswires and their topics, published by Reuters in 1986
데이터를 로드한다.¶
In [1]:
from tensorflow.keras.datasets import reuters
# Like IMDB, the argument num_words restricts the data to
# the 10,000 most frequently occurring words
(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)
In [2]:
len(train_data)
Out[2]:
In [3]:
len(test_data)
Out[3]:
In [4]:
train_data[10] # 숫자들이 리스트로 이루고 있고, 각 개별적인 인덱스는 word를 나타낸다.
Out[4]:
In [5]:
# decoding newswires back to text
word_index = reuters.get_word_index() # word indexing dictionary를 불러온다.
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
decoded_newswire = ' '.join([reverse_word_index.get(i-3, '?') for i in train_data[0]])
In [6]:
print(decoded_newswire) # 인덱스를 가지고 원래의 단어로 표현해본다.
In [7]:
train_labels[10]
Out[7]:
In [8]:
train_labels
Out[8]:
In [9]:
len(train_labels)
Out[9]:
Preparing the data¶
학습데이터 준비하기¶
In [10]:
import numpy as np
# 바이너리한 형태로 변형시켜주는 함수를 만듬.
def vectorize_sequences(sequences, dimension=10000): # 10000 dim->10000개의 단어.
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
In [11]:
x_train.shape
Out[11]:
- To vectorize the labels, we can use one-hot encoding.
- One-hot encoding of the labels consists of embedding each label as an all-zero vector with a 1 in the place of the label index.
In [12]:
# label 값도 원핫인코딩을 하여 46개의 클래스에 대한 라벨링을 해준다.
def to_one_hot(labels, dimension=46):
results = np.zeros((len(labels), dimension))
for i, label in enumerate(labels):
results[i, label] = 1.
return results
one_hot_train_labels = to_one_hot(train_labels)
one_hot_test_labels = to_one_hot(test_labels)
In [13]:
train_labels[100]
Out[13]:
In [14]:
one_hot_train_labels[100]
Out[14]:
In [15]:
from tensorflow.keras.utils import to_categorical
one_hot_train_labels = to_categorical(train_labels)
one_hot_test_labels = to_categorical(test_labels)
Building the network¶
네트워크 구성하기.¶
In [16]:
from tensorflow.keras import models
from tensorflow.keras import layers
model = models.Sequential() # 인스턴스 생성
model.add(layers.Dense(64, activation='relu', input_shape=(10000,))) # dense layer. 노드 64개
model.add(layers.Dense(64, activation='relu')) # layer 추가
model.add(layers.Dense(46, activation='softmax')) # 노드 46개(class중 하나로 매핑해야하니까.), 클래스에 속할 확률은 softmax function을 사용.
softmax
activation in the last layer- The network will ouput a probability distribution over the 46 classes.
- For every input sample, the network will produce a 46-dimensional output vector, where
output[i]
is the probability that the sample belongs to classi
. - The sum of
output[i]
for alli
will be 1.
categorical_crossentropy
loss- It measures the distance between two probability distributions: label이 가지고 있는 원핫인코딩 벡터와 네트워크가 뱉어주는 확률값으로 매핑된 벡터간의 거리를 구한다고 생각하면 된다.
- Here, between the probability distribution output by the network and the true distribution of the labels
In [17]:
# 컴파일 하기
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
Validation¶
- Use 1,000 samples in the training data as a validation set.
In [18]:
#1000개를 validation set으로 셋팅하자.
x_val = x_train[:1000]
partial_x_train = x_train[1000:]
y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]
In [19]:
history = model.fit(partial_x_train,
partial_y_train,
epochs=20,
batch_size=512,
validation_data=(x_val, y_val))
val set은 loss가 줄어들다가 증가하는 것을 확인할 수 있다.¶
- Plotting the training and validation loss
In [20]:
import matplotlib.pyplot as plt
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
- Plotting the training and validation accuracy
In [21]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
9 epoch정도면 되겠다는 판단을 하여 다시 네트워크 모델을 학습한다.¶
- We can observe that the network begins to overfit after nine epochs.
- Retraining a model from scratch
In [22]:
model.fit(partial_x_train,
partial_y_train,
epochs=9,
batch_size=512,
validation_data=(x_val, y_val))
results = model.evaluate(x_test, one_hot_test_labels)
In [23]:
print(results) # 크로스 엔트로피와 정확도
binary보다 어려운 예측이라는 것을 알 수 있다..¶
- Retraining a model from scratch is not a good idea if we have a large-scale training set.
In this case, we can use
callbacks
functionality inkeras
.Before, we need to mount Google Drive storage with our colab instance.
In [ ]:
from google.colab import drive
drive.mount('/content/gdrive') # 내 구글 드라이브에 nount하여 access할 수 있게 설정.
In [0]:
%cd /content/gdrive
In [0]:
!ls
In [0]:
%cd 'My Drive'/exp # 비어있는 폴더로 경로 설정
In [0]:
# 콜백 사용.
from tensorflow.keras.callbacks import ModelCheckpoint
filepath = '/content/gdrive/My Drive/exp/model.{epoch:02d}.hdf5' # gdrive밑으로는 내 드라이브랑 동기화가 되어 있음.
modelckpt = ModelCheckpoint(filepath=filepath)
model.fit(partial_x_train,
partial_y_train,
epochs=20,
batch_size=512,
validation_data=(x_val, y_val),
callbacks=[modelckpt])
results = model.evaluate(x_test, one_hot_test_labels)
- Load the trained model at epoch 9
In [0]:
best_model_path = '/content/gdrive/My Drive/exp/model.09.hdf5'
best_model = models.load_model(best_model_path) # 다시 모델 인스턴스를 불러온다. weight를 전부 그대로 쇽 가져옴.
In [0]:
results = best_model.evaluate(x_test, one_hot_test_labels)
print(results)
Generating predictions on new data¶
In [0]:
predictions = model.predict(x_test)
In [0]:
predictions.shape
In [0]:
predictions[0]
In [0]:
np.sum(predictions[0])
In [0]:
np.argmax(predictions[0])
A different way to handle the labels and the loss¶
In [0]:
y_train = np.array(train_labels)
y_test = np.array(test_labels)
In [0]:
model.compile(optimizer='rmsprop',
loss='sparse_categorical_crossentropy', #입력으로 원핫인코딩하는게 귀찮으면 사용.
metrics=['acc'])
The importance of having sufficiently large hidden layers¶
In [25]:
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(4, activation='relu')) # 4개를 사용한다면? 자유도가 줄어든다. 그렇다고 너무 많이 사용하면 overfit이 된다.
model.add(layers.Dense(46, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(partial_x_train,
partial_y_train,
epochs=20,
batch_size=128,
validation_data=(x_val, y_val))
Out[25]:
In [0]:
SangheumHwang[deep learning class]
728x90
반응형
'딥러닝' 카테고리의 다른 글
fundamentals of machine learning (0) | 2020.06.20 |
---|---|
regression (0) | 2020.06.20 |
binary classification_multi perceptron (0) | 2020.06.20 |
MNIST 데이터를 활용한 딥러닝 기초 (2) | 2020.05.16 |
4. How Deep learning work (0) | 2020.03.28 |
댓글