We had a request for coverage of several topics including support vector machines and fraud detection. We'll run through several flavors of SVM and how to compute them and in doing that we'll cover their use in fraud (or outlier) detection. Here are references that we'll use in this process.
Notes from stanford class cs229 - coverage of basic svm: 13-svm.pdf
Microsoft tech report on using SVM for single class classification (basically find the support of an unknown, empirical density function). You can think of this as finding the subspace of feature space, where all the normal cases (non-fraud, non-outlier) live. this approach has the strength that it trains well on huge volumes of normal data. Fraud and outlier detection usually present relatively few examples of the stuff we want to identify. OneClassSVM-tr-99-87.pdf
Here are some more references regarding fast algorithms for svm:
fastKernelClassifiers-bordes-1.pdf
fastScalableLocalKernelMachine-segata.pdf
multiclassSVM-LaRank-bordes.pdf
p1737-bordes.pdf
stochasticSubGradient-shalev-shwartz.pdf
Here's are a couple of papers evaluating relative performance of different supervised learning algorithms (including SVM).
Modest-sized data sets: performanceCompSupervisedLearning-caruana.pdf
High-dimension data sets: perfEvalSupervisedLearningHighDim-caruana.pdf
Here's R-code for the examples from class:
svmExamp.R
oneClassSVM.R
Simple Paper on SVM: http://patriciahoffmanphd.com/resources/papers/SVM/SupportVectorJune3.pdf
Comments (0)
You don't have permission to comment on this page.