Mathematical Intuition Behind Support Vector Machines (SVM)
Support Vector Machines (SVM) are supervised learning models used for
classification and regression. SVMs are particularly well-suited for
classification tasks and aim to find the optimal hyperplane that
separates different classes in the feature space.
Key Concepts
- Hyperplane:
- In an (n)-dimensional space, a hyperplane is an (n-1)-dimensional
subspace that separates the space into two half-spaces.
- For a 2D space, the hyperplane is a line; for a 3D space, it is a
plane.
- Margin:
- The margin is the distance between the hyperplane and the nearest
data points from each class. These points are called support
vectors.
- SVM aims to maximize this margin, making the classifier more
robust.
- Support Vectors:
- These are the data points that are closest to the hyperplane and
influence its position and orientation.
- They lie on the edge of the margin.
- Optimal Hyperplane:
- The optimal hyperplane is the one that maximizes the margin between
the two classes.

Objective Function
The
objective is to maximize the margin, which is equivalent to minimizing
∥w∥ (since the margin is 2/∥𝑤∥) subject to the constraint above. This
leads to the following optimization problem:
min(w,b) 1/2 * ∥w∥^2
subject to yi.(w⋅xi − b) ≥ 1
Dual Problem
The primal problem can be converted to the dual problem by maximizing
the Lagrangian with respect to w and b
Solution
Once the 𝛼𝑖 are found by solving the dual problem, the weight vector
𝑤 and bias 𝑏 can be computed as:
for any support vector xk.
Predictions
Given a new data point xk, the decision function is:
f(x)=w⋅x−b
The predicted class label is:
yhat = sign(f(x))
For non-linearly separable data, we introduce slack variables 𝜉𝑖 to
allow some misclassifications, leading to the soft margin SVM. The
optimization problem becomes: