Alexandre Trilla, PhD - Data Scientist |

Blog

-- Thoughts on data analysis, software development and innovation management. Comments are welcome

Support Vector Machines and the importance of regularisation to approach the optimal boundary

27-Nov-2011

In this post I would like to show some empirical evidence about the importance of model regularisation in order to control the complexity so as to not overfit the data and generalise better. To this end, I make use of this week's work on Support Vector Machines (SVM) in the Machine Learning class. Firstly, a scenario similar to Post 52 is settled. Then, the optimal Bayes boundary is drawn, along with an unregularised SVM (C=100). Up to this point, the performance of this classifier is expected to be as modest as the former unregularised Logistic Regression: since it is tightly dependent on the observed samples, the slope of the linear boundary differs a lot from the optimum. Instead, when a considerable amount of regularisation is applied, i.e., a SVM with C=1, the model is more conservative and reticent to classify outliers, so its final slope approaches the optimum, thus increasing its expected effectiveness.

The Octave code used to generate the figure is available here. Note that the rest of the ml-class files need to be present along with the former.