SUPPORT VECTOR MACHINES¶

Linearly Seperable Data¶

Mathematical Intuition Behind Support Vector Machines (SVM)¶

Support Vector Machines (SVM) are supervised learning models used for classification and regression. SVMs are particularly well-suited for classification tasks and aim to find the optimal hyperplane that separates different classes in the feature space.

Key Concepts¶

  1. Hyperplane:

    • In an (n)-dimensional space, a hyperplane is an (n-1)-dimensional subspace that separates the space into two half-spaces.
    • For a 2D space, the hyperplane is a line; for a 3D space, it is a plane.
  2. Margin:

    • The margin is the distance between the hyperplane and the nearest data points from each class. These points are called support vectors.
    • SVM aims to maximize this margin, making the classifier more robust.
  3. Support Vectors:

    • These are the data points that are closest to the hyperplane and influence its position and orientation.
    • They lie on the edge of the margin.
  4. Optimal Hyperplane:

    • The optimal hyperplane is the one that maximizes the margin between the two classes.

image.png

Formulation¶

Given a dataset¶

{(𝑥𝑖,𝑦𝑖)}i=1 to m, where 𝑥𝑖 ∈ 𝑅^𝑛 are the feature vectors and yi ∈ {−1,1} are the class labels, the goal is to find the hyperplane that best separates the classes. The hyperplane can be represented as:¶

𝑤⋅𝑥𝑖−𝑏=0¶

where¶

w is the normal vector to the hyperplane and b is the bias term.For any point xi, the decision rule is:¶

𝑤⋅𝑥𝑖−𝑏≥1 if 𝑦𝑖=1¶

𝑤⋅𝑥𝑖−𝑏≤−1 if 𝑦𝑖=−1¶

This can be combined into one constraint:¶

𝑦𝑖(𝑤⋅𝑥𝑖−𝑏)≥1¶

Objective Function¶

The objective is to maximize the margin, which is equivalent to minimizing ∥w∥ (since the margin is 2/∥𝑤∥) subject to the constraint above. This leads to the following optimization problem:¶

min(w,b) 1/2 * ∥w∥^2¶

subject to yi.(w⋅xi − b) ≥ 1¶

WhatsApp Image 2024-07-03 at 14.55.01_974c8737.jpg

Lagrangian Formulation¶

To solve this optimization problem, we use the method of Lagrange multipliers. The Lagrangian is formulated as:¶

L(w,b,α)= 1/2 ∥w∥^2 − ∑(i=1 to m) αi.[yi(w⋅xi−b)−1]¶

where αi ≥ 0 are the Lagrange multipliers.¶

WhatsApp Image 2024-07-03 at 14.55.34_4e3b94a5.jpg

Dual Problem¶

The primal problem can be converted to the dual problem by maximizing the Lagrangian with respect to w and b

eqn.png

Solution¶

Once the 𝛼𝑖 are found by solving the dual problem, the weight vector 𝑤 and bias 𝑏 can be computed as:

WhatsApp Image 2024-07-03 at 15.10.06_fb77b47b.jpg

for any support vector xk.

Predictions¶

Given a new data point xk, the decision function is:

f(x)=w⋅x−b

The predicted class label is:

yhat = sign(f(x))

For non-linearly separable data, we introduce slack variables 𝜉𝑖 to allow some misclassifications, leading to the soft margin SVM. The optimization problem becomes:

WhatsApp Image 2024-07-03 at 15.17.40_670099c5.jpg

image.png