Multinomial logistic regression#
Let \(\mathcal Y\) be a finite set, e.g. \(\mathcal Y=\{1, 2, \ldots, K\}\), and the training dataset
Multinomial logistic regression predicts a vector of probabilities
How to predict these probabilities? We can use a linear model whose output is a vector of \(K\) different numbers:
Now convert the vector \(\boldsymbol z \in \mathbb R^K\) to the vector of probabilities \(\boldsymbol{\widehat y}\) via softmax:
If we need to pick a class, we can choose the most probable one:
The parameters \(\boldsymbol w_k\) naturally form a matrix
Q. What is the shape of this matrix? How many parameters does multinomial regression have?
Loss function#
The optimal parameters \(\boldsymbol W\) are solutions of the following optimization problem:
If the targets \(y_k\) are one-hot encoded, then they from a matrix
Accordingly, the loss function (27) can be written as
and this is generally the cross-entropy loss (3), taken with opposite sign.
Question
Denote \(\boldsymbol{\widehat Y} = (\widehat y_{ik}) = \mathrm{Softmax}(\boldsymbol {XW})\) (softmax is applied to each row). Rewrite the loss function (27) in matrix form.
Answer
Regularized version:
Q. Why do we see minus before the regularization term?
Example: MNIST#
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.datasets import fetch_openml
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
%config InlineBackend.figure_format = 'svg'
X, Y = fetch_openml('mnist_784', return_X_y=True, parser='auto')
X = X.astype(float).values / 255
Y = Y.astype(int).values
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 3
1 import numpy as np
2 import matplotlib.pyplot as plt
----> 3 import seaborn as sns
4 from sklearn.metrics import accuracy_score, confusion_matrix
5 from sklearn.datasets import fetch_openml
ModuleNotFoundError: No module named 'seaborn'
Visualize data#
Visualize some random samples:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[2], line 19
16 plt.title(title, size=20)
17 plt.show()
---> 19 plot_digits(X, Y, random_state=12)
NameError: name 'X' is not defined
Splitting into train and test#
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=10000)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[3], line 1
----> 1 X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=10000)
2 X_train.shape, X_test.shape, y_train.shape, y_test.shape
NameError: name 'train_test_split' is not defined
Check that the classes are balanced:
np.unique(y_test, return_counts=True)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[4], line 1
----> 1 np.unique(y_test, return_counts=True)
NameError: name 'y_test' is not defined
Fit and evaluate#
Fit the logistic regression:
%%time
LR = LogisticRegression(max_iter=100)
LR.fit(X_train, y_train)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
File <timed exec>:1
NameError: name 'LogisticRegression' is not defined
Make predictions:
y_hat = LR.predict(X_test)
y_hat
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[6], line 1
----> 1 y_hat = LR.predict(X_test)
2 y_hat
NameError: name 'LR' is not defined
We can also predict probabilities:
y_proba = LR.predict_proba(X_test)
y_proba[:3]
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[7], line 1
----> 1 y_proba = LR.predict_proba(X_test)
2 y_proba[:3]
NameError: name 'LR' is not defined
Calculate metrics:
print("Accuracy:", accuracy_score(y_test, y_hat))
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[8], line 1
----> 1 print("Accuracy:", accuracy_score(y_test, y_hat))
NameError: name 'accuracy_score' is not defined
Visualize performance#
plt.figure(figsize=(10, 8))
plt.title("Logistic regression on MNIST")
sns.heatmap(confusion_matrix(y_test, y_hat), annot=True);
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[9], line 3
1 plt.figure(figsize=(10, 8))
2 plt.title("Logistic regression on MNIST")
----> 3 sns.heatmap(confusion_matrix(y_test, y_hat), annot=True);
NameError: name 'sns' is not defined

Plot some samples with predictions and ground truths:
plot_digits(X_test, y_test, y_hat)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[10], line 1
----> 1 plot_digits(X_test, y_test, y_hat)
NameError: name 'X_test' is not defined