Dummy coding For Linear Regression

Praveen Alex Mathew
2 min readApr 14, 2020

--

Typically in computer programs, an enum is used to represent a field having different values. For instance, while representing a set of athletes, the field stating the sport played would have the value represent by an enum.

Athlete’s years in career

However, enums label the elements in serial order.

#include <stdio.h>enum sport { cricket, football, basketball };int main() {
printf("Cricket = %d\n", cricket);
printf("Football = %d\n", football);
printf("Basketball = %d\n", basketball);
return 0;
}
OUTPUT:Cricket = 0
Football = 1
Basketball = 2

Although it may seem to be an easy encoding for the programmers, if used in regression, it would break the model as the numbers used to represent the sport would signify value. In the athletes example, the linear regression model would consider the first sport, cricket in our example, in the enum to be inferior to all the other sports in the enum, while the second sport, football, would be considered inferior to every other sport except the first and so on.

One Hot Form

Using dummy variables, we convert the elements to be put in the sport field to not an enum, but a table. Each element will have an entry in the column of the table. For each record, if the element is selected the value corresponding to that element’s column would be high(1) while others would be low(0).

One Hot Form

Reference Category

However, if all the columns are given to the regression model, it would result in multicollinearity and break the model. Thus we come up with the reference category. One of the elements, say football, is ignored in the table as its value can be inferred from the other.

If football is the referential category

Footnotes:

[1] https://www.statisticssolutions.com/dummy-coding-the-how-and-why/

[2] https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faqwhat-is-dummy-coding/

[3] https://dss.princeton.edu/online_help/analysis/dummy_variables.htm

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Praveen Alex Mathew
Praveen Alex Mathew

Written by Praveen Alex Mathew

Software Developer. Masters in Computer Science @Arizona State University. https://praveenmathew92.github.io/

No responses yet

Write a response