My university offers a Certificate of Advanced Study in Data Science:

As the world’s data grow exponentially, organizations need to understand, manage and use big data sets. These data spawned the term “big data,” which now monopolizes forward-thinking business dialogue.

Well, I have some medium-size data from last semester: the grades of business calculus students at the academic drop deadline. At that moment the available data consisted of 18 homework sets, 7 quizzes, and 1 midterm exam: 26 dimensions total. Based on these data, can we tell whether the student should drop the course?

Looks like we need some dimension reduction here. I achieve it by simply replacing 18 homework scores with their average, and likewise for quizzes. This has the effect of projecting 26-dimensional data to 3-dimensional space. I scaled each to the interval ${[0,1]}$, so that the data is represented by 139 points in the cube ${[0,1]^3}$.

Suppose that the minimal acceptable grade is C-; the grades of D or F are worse than dropping the class. Accordingly, the points are colored blue (C- or better) or red (D or F).

The goal is to find a plane ${Ax+By+Cz=D}$ that separates the red dots from the blue ones. Since there is no plane that separates the groups perfectly, the task calls for a soft-margin support vector machine. I decided to get my hands dirty with actual computations in a spreadsheet.

My objective is

$\displaystyle \mathcal{E}(A,B,C,D) = \sum_{i=1}^{139} ((0.1-\epsilon_i (Ax_i+By_i+Cz_i-D))^+)^2 \rightarrow \min$

subject to the normalization ${A+B+C=1}$. Here ${z^+=\max(z,0)}$, ${\epsilon_i=1}$ for blue dots and ${\epsilon_i=-1}$ for red dots. The logic is that I want ${Ax_i+By_i+Cz_i-D}$ to have the sign ${\epsilon_i}$, that is, to be on the correct side of the separating plane. Squaring the penalty terms helps with minimization. The term ${0.1}$ introduces a penalty for being too close to the separating plane; this ought to improve the quality of separation.

The gradient of ${\mathcal{E}}$ is easy to calculate: for example,

$\displaystyle \frac{ \partial \mathcal{E}}{\partial A}= -2 \sum_{i=1}^{139} \epsilon_i x_i (0.1-\epsilon_i (Ax_i+By_i+Cz_i-D))^+$

The spreadsheet recalculated both ${\mathcal{E}}$ and ${\nabla \mathcal{E}}$ every time I changed the values of ${A,B,D}$ (the value of ${C}$ was set to ${1-A-B}$). This made it easy to do gradient descent “by hand”, starting with ${A=B=C=1/3}$ and ${D=0.7}$.

At the final step the gradient of ${\mathcal {E}}$ is parallel to the gradient of the constraint function ${A+B+C}$: we are at a critical point. Since ${\mathcal{E}}$ is convex, the point is a minimum. The optimal plane has the equation

$\displaystyle 0.4703x+0.3285y+0.2012z=0.69035$

which tells us that Exam 1 score $x$ is the most effective predictor while homework average $z$ is the least effective. (Homework was done online, with unlimited attempts, and no control over who actually did the work.)

The plane gives the correct prediction in 123 out of 139 cases. Not bad, considering that the data is incomplete and somewhat biased; it does not include the students who made a forward-thinking business decision and dropped the class by the deadline.

## One thought on “Forward-thinking business dialogue”

1. Lianxin says:

Even for students with points above the green plane, they might consider dropping as well. They may or may not perform well in the final, which is an inherent risk for them; the risk can be quantified through the outliers to the green plane. Whereas dropping the class is a sure withdraw with no uncertainties at all. Each student should calculate their own optimal exit boundaries with their own risk preference through a mean-variance optimization. Assuming students are risk averse, they should exit at a plane above the green one; of course except the intense gamblers.