On this page, W3schools.com collaborates with , to deliver digital training content to our students.


Categorical Data

When your data has categories represented by strings, it will be difficult to use them to train machine learning models which often only accepts numeric data.

Instead of ignoring the categorical data and excluding the information from our model, you can tranform the data so it can be used in your models.

Take a look at the table below, it is the same data set that we used in the chapter.

Example

import pandas as pd

cars = pd.read_csv('data.csv')
print(cars.to_string())

Result

             Car       Model  Volume  Weight  CO2
  0       Toyoty        Aygo    1000     790   99
  1   Mitsubishi  Space Star    1200    1160   95
  2        Skoda      Citigo    1000     929   95
  3         Fiat         500     900     865   90
  4         Mini      Cooper    1500    1140  105
  5           VW         Up!    1000     929  105
  6        Skoda       Fabia    1400    1109   90
  7     Mercedes     A-Class    1500    1365   92
  8         Ford      Fiesta    1500    1112   98
  9         Audi          A1    1600    1150   99
  10     Hyundai         I20    1100     980   99
  11      Suzuki       Swift    1300     990  101
  12        Ford      Fiesta    1000    1112   99
  13       Honda       Civic    1600    1252   94
  14      Hundai         I30    1600    1326   97
  15        Opel       Astra    1600    1330   97
  16         BMW           1    1600    1365   99
  17       Mazda           3    2200    1280  104
  18       Skoda       Rapid    1600    1119  104
  19        Ford       Focus    2000    1328  105
  20        Ford      Mondeo    1600    1584   94
  21        Opel    Insignia    2000    1428   99
  22    Mercedes     C-Class    2100    1365   99
  23       Skoda     Octavia    1600    1415   99
  24       Volvo         S60    2000    1415   99
  25    Mercedes         CLA    1500    1465  102
  26        Audi          A4    2000    1490  104
  27        Audi          A6    2000    1725  114
  28       Volvo         V70    1600    1523  109
  29         BMW           5    2000    1705  114
  30    Mercedes     E-Class    2100    1605  115
  31       Volvo        XC70    2000    1746  117
  32        Ford       B-Max    1600    1235  104
  33         BMW         216    1600    1390  108
  34        Opel      Zafira    1600    1405  109
  35    Mercedes         SLK    2500    1395  120
  

In the multiple regression chapter, we tried to predict the CO2 emitted based on the volume of the engine and the weight of the car but we excluded information about the car brand and model.

The information about the car brand or the car model might help us make a better prediction of the CO2 emitted.


ADVERTISEMENT


Login
ADS CODE