Data Science - Statistics Variance

Variance

Variance is another number that indicates how spread out the values are.

In fact, if you take the square root of the variance, you get the standard deviation. Or the other way around, if you multiply the standard deviation by itself, you get the variance!

We will first use the data set with 10 observations to give an example of how we can calculate the variance:

Duration

Average_Pulse

Max_Pulse

Calorie_Burnage

Hours_Work

Hours_Sleep

30

80

120

240

10

7

30

85

120

250

10

7

45

90

130

260

8

7

45

95

130

270

8

7

45

100

140

280

0

7

60

105

140

290

7

8

60

110

145

300

7

8

60

115

145

310

8

8

75

120

150

320

0

8

75

125

150

330

8

8

Tip: Variance is often represented by the symbol Sigma Square: σ^2


Step 1 to Calculate the Variance: Find the Mean

We want to find the variance of Average_Pulse.

1. Find the mean:

(80+85+90+95+100+105+110+115+120+125) / 10 = 102.5

The mean is 102.5


Step 2: For Each Value - Find the Difference From the Mean

2. Find the difference from the mean for each value:

80 - 102.5 = -22.5
85 - 102.5 = -17.5
90 - 102.5 = -12.5
95 - 102.5 = -7.5
100 - 102.5 = -2.5
105 - 102.5 = 2.5
110 - 102.5 = 7.5
115 - 102.5 = 12.5
120 - 102.5 = 17.5
125 - 102.5 = 22.5


Step 3: For Each Difference - Find the Square Value

3. Find the square value for each difference:

(-22.5)^2 = 506.25
(-17.5)^2 = 306.25
(-12.5)^2 = 156.25
(-7.5)^2 = 56.25
(-2.5)^2 = 6.25
2.5^2 = 6.25
7.5^2 = 56.25
12.5^2 = 156.25
17.5^2 = 306.25
22.5^2 = 506.25

Note: We must square the values to get the total spread.



Step 4: The Variance is the Average Number of These Squared Values

4. Sum the squared values and find the average:

(506.25 + 306.25 + 156.25 + 56.25 + 6.25 + 6.25 + 56.25 + 156.25 + 306.25 + 506.25) / 10 = 206.25

The variance is 206.25.


Use Python to Find the Variance of health_data

We can use the var() function from Numpy to find the variance (remember that we now use the first data set with 10 observations):

Example

import numpy as np

var = np.var(health_data)
print(var)

The output:


Variance

Use Python to Find the Variance of Full Data Set

Here we calculate the variance for each column for the full data set:

Example

import numpy as np

var_full = np.var(full_health_data)
print(var_full)

The output:

Variance

Stat Variance

Login
ADS CODE