My university offers a Certificate of Advanced Study in Data Science:
As the world’s data grow exponentially, organizations need to understand, manage and use big data sets. These data spawned the term “big data,” which now monopolizes forward-thinking business dialogue.
Well, I have some medium-size data from last semester: the grades of business calculus students at the academic drop deadline. At that moment the available data consisted of 18 homework sets, 7 quizzes, and 1 midterm exam: 26 dimensions total. Based on these data, can we tell whether the student should drop the course?
Looks like we need some dimension reduction here. I achieve it by simply replacing 18 homework scores with their average, and likewise for quizzes. This has the effect of projecting 26-dimensional data to 3-dimensional space. I scaled each to the interval , so that the data is represented by 139 points in the cube
.
Suppose that the minimal acceptable grade is C-; the grades of D or F are worse than dropping the class. Accordingly, the points are colored blue (C- or better) or red (D or F).

The goal is to find a plane that separates the red dots from the blue ones. Since there is no plane that separates the groups perfectly, the task calls for a soft-margin support vector machine. I decided to get my hands dirty with actual computations in a spreadsheet.
My objective is
subject to the normalization . Here
,
for blue dots and
for red dots. The logic is that I want
to have the sign
, that is, to be on the correct side of the separating plane. Squaring the penalty terms helps with minimization. The term
introduces a penalty for being too close to the separating plane; this ought to improve the quality of separation.
The gradient of is easy to calculate: for example,
The spreadsheet recalculated both and
every time I changed the values of
(the value of
was set to
). This made it easy to do gradient descent “by hand”, starting with
and
.

At the final step the gradient of is parallel to the gradient of the constraint function
: we are at a critical point. Since
is convex, the point is a minimum. The optimal plane has the equation
which tells us that Exam 1 score is the most effective predictor while homework average
is the least effective. (Homework was done online, with unlimited attempts, and no control over who actually did the work.)

The plane gives the correct prediction in 123 out of 139 cases. Not bad, considering that the data is incomplete and somewhat biased; it does not include the students who made a forward-thinking business decision and dropped the class by the deadline.