← Back to context

Comment by steppi

3 years ago

The article you’ve linked is incorrect. As Dr_Birdbrain said, fitting an SVM is a convex problem with unique global optimum. sklearn.SVC relies on libsvm which initializes the weights to 0 [0]. The random state is only used to shuffle the data to make probability estimates with Platt scaling [1]. Of the random_state parameter, the sklearn documentation for SVC [2] says

Controls the pseudo random number generation for shuffling the data for probability estimates. Ignored when probability is False. Pass an int for reproducible output across multiple function calls. See Glossary.

[0] https://github.com/scikit-learn/scikit-learn/blob/2a2772a87b...

[1] https://en.wikipedia.org/wiki/Platt_scaling

[2] https://scikit-learn.org/stable/modules/generated/sklearn.sv...

Which article is incorrect? Indeed it looks like there is no random initialization in libsvm or thereby sklearn.svm.SVC or in sklearn.svm.*. I seem to have confused random initialization in Simulated Annealing with SVMs; though now TIL that there are annealing SVMs, and SVMs do work with wave functions (though it's optional to map the wave functions into feature space with quantum state tomography), and that there are SVMs for the D-Wave Quantum annealer QC.

From "Support vector machines on the D-Wave quantum annealer" (2020) https://www.sciencedirect.com/science/article/pii/S001046551... :

Kernel-based support vector machines (SVMs) are supervised machine learning algorithms for classification and regression problems. We introduce a method to train SVMs on a D-Wave 2000Q quantum annealer and study its performance in comparison to SVMs trained on conventional computers. The method is applied to both synthetic data and real data obtained from biology experiments. We find that the quantum annealer produces an ensemble of different solutions that often generalizes better to unseen data than the single global minimum of an SVM trained on a conventional computer, especially in cases where only limited training data is available. For cases with more training data than currently fits on the quantum annealer, we show that a combination of classifiers for subsets of the data almost always produces stronger joint classifiers than the conventional SVM for the same parameters.