GitHub - samankhalilinia/MachineLearning-SupportVectorMachines: Support vector machines implemented from scratch in Python.

Machine Learning-Support Vector Machines

Description:

A Python script to estimate from scratch Support Vector Machines for linear, polynomial and Gaussian kernels utilising the quadratic programming optimisation algorithm from library CVXOPT.
Support Vector Machines implemented from scratch and compared to scikit-learn's implementation.
All labelled examples are simulated data.

Linear hard margin fits linearly separable data. Linear soft margin fits linearly separable data with some overlap in class examples. Both Gaussian and polynomial kernels estimate non-linearly separable examples.

Using this Python Script	Using SKLearn Library
Hard Margin	Hard Margin
Soft Margin	Soft Margin
Polynomial Kernel	Polynomial Kernel
Gaussian Kernel	Gaussian Kernel

Given two classes of labelled examples, we are interested in finding a decision boundary resulting from an appropriate choice of support vectors.

Model

Simulate labelled training dataset where there are N couples of and k is the number (dimension) of x variables.
We are interested in SVM disrimitive analysis by finding the optimum decision boundary resulting from a choice of S support vectors. This SVM optimization problem is a constrained convex quadratic optimization problem. Meaning it can be formulated in terms of Lagrange multipliers. For the Lagrange multipliers to be applied under constraints, can use the Wolfe Dual Principle which is appropriate as long as the Karush Kuhn Tucker (KKT) conditions are satisfied. Further, as this is a convex problem, Slater’s condition tells us that strong duality holds such that the dual and primal optimums give the solution.
The Wolfe dual soft margin formula with kernel is given by

Where

are the Lagrange multipliers, is the kernel function, N are the number of training samples in the dataset, x is the matrix of training samples, y is the vector of target values, C is a supplied hyperparameter.
The non-zero Lagrange multipliers are the data points which contribute to the formation of the decision boundary.
The hypothesis function is the decision boundary. The hypothesis formula in terms of the Kernel function is given by:

Where S is the set of support vectors, is the Lagrange multiplier, b is the bias term, y is the target from the examples, is the Kernel and

Linear Kernel

Applies to linearly separable data in the space.
When the constraint C is zero. The data must be completely linearly separable and the decision boundary is referred to as 'hard margin'.
When the parameter C is non-zero, the approach allows for some overlap in the data and the decision boundary is referred to as 'soft-margin'.

Polynomial Kernel

Applies to non-linearly separable data in space.

Gaussian Kernel (aka Radial Basis Function (RBF))

Applies to non-linearly separable data in . space.

Where is a free scalar parameter chosen based on the data and defines the influence of each training example.

CVXOPT Library

Note: indicates component-wise vector inequalities. It means that each row of the matrix represents an inequality that must be satisfied.

To use the CVXOPT convex solver API. The Wolfe dual soft margin formula is re-written as follows

Where

G is a Gram matrix of all possible dot products of vectors .

How to use

python supportVectorMachines.py

Expected Output

Estimating kernel: linear
     pcost       dcost       gap    pres   dres
 0: -1.4469e+01 -2.6709e+01  5e+02  2e+01  2e+00
 1: -1.6173e+01 -1.1054e+01  1e+02  6e+00  6e-01
 2: -2.1081e+01 -1.0259e+01  1e+02  4e+00  3e-01
 3: -6.7928e+00 -4.0658e+00  1e+01  4e-01  3e-02
 4: -3.1094e+00 -3.4085e+00  9e-01  2e-02  2e-03
 5: -3.2416e+00 -3.2970e+00  7e-02  4e-04  4e-05
 6: -3.2908e+00 -3.2914e+00  7e-04  4e-06  4e-07
 7: -3.2913e+00 -3.2913e+00  7e-06  4e-08  4e-09
 8: -3.2913e+00 -3.2913e+00  7e-08  4e-10  4e-11
 9: -3.2913e+00 -3.2913e+00  7e-10  4e-12  4e-13
10: -3.2913e+00 -3.2913e+00  7e-12  4e-14  7e-15
Optimal solution found.
============================================================
        SUPPORT VECTOR MACHINE TERMINATION RESULTS
============================================================
               **** In-Sample: ****
3 support vectors found from 160 examples.
               **** Predictions: ****
40 out of 40 predictions correct.
============================================================
Finished

Highlights

SVMs are particularly well-suited for classification of complex but small- or medium-sized linear and non-linearably separable datasets.
The SKLearn SVC class is based on the libsvm library, which implements an algorithm that supports the kernel trick. The training time complexity is usually between and . This means that training becomes slow when the number of training instances are large.
Support Vector Machines in Machine Learning generally follow
Logistic regression in studies. This work is a natural stepping stone and advances the candidates Machine Learning knowledge from this step as follows:

Strong introduction to vectors and vector operations.
Appreciation and usage of advanced optimisation routines such as convex optimisation and Sequential Minimal Optimization (SMO).
SKLearn implements SVM optimisation using LIBSVM which utilises an SMO routine.

Requirements

Python (>2.7), Numpy, CVXOPT, sklearn and matplotlib.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
images		images
svgs		svgs
README.md		README.md
README.md_tex		README.md_tex
datageneration.py		datageneration.py
plotting.py		plotting.py
requirements.txt		requirements.txt
supportVectorMachines.py		supportVectorMachines.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning-Support Vector Machines

Description:

Using this Python Script

Using SKLearn Library

Hard Margin

Hard Margin

Soft Margin

Soft Margin

Polynomial Kernel

Polynomial Kernel

Gaussian Kernel

Gaussian Kernel

Model

Linear Kernel

Polynomial Kernel

Gaussian Kernel (aka Radial Basis Function (RBF))

CVXOPT Library

How to use

Expected Output

Highlights

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Machine Learning-Support Vector Machines

Description:

Using this Python Script

Using SKLearn Library

Hard Margin

Hard Margin

Soft Margin

Soft Margin

Polynomial Kernel

Polynomial Kernel

Gaussian Kernel

Gaussian Kernel

Model

Linear Kernel

Polynomial Kernel

Gaussian Kernel (aka Radial Basis Function (RBF))

CVXOPT Library

How to use

Expected Output

Highlights

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages