Variance is another number that indicates how spread out the values are.
In fact, if you take the square root of the variance, you get the standard deviation. Or the other way around, if you multiply the standard deviation by itself, you get the variance!
We will first use the data set with 10 observations to give an example of how we can calculate the variance:
Duration | Average_Pulse | Max_Pulse | Calorie_Burnage | Hours_Work | Hours_Sleep |
---|---|---|---|---|---|
30 | 80 | 120 | 240 | 10 | 7 |
30 | 85 | 120 | 250 | 10 | 7 |
45 | 90 | 130 | 260 | 8 | 7 |
45 | 95 | 130 | 270 | 8 | 7 |
45 | 100 | 140 | 280 | 0 | 7 |
60 | 105 | 140 | 290 | 7 | 8 |
60 | 110 | 145 | 300 | 7 | 8 |
60 | 115 | 145 | 310 | 8 | 8 |
75 | 120 | 150 | 320 | 0 | 8 |
75 | 125 | 150 | 330 | 8 | 8 |
Tip: Variance is often represented by the symbol Sigma Square: σ^2
We want to find the variance of Average_Pulse.
1. Find the mean:
(80+85+90+95+100+105+110+115+120+125) / 10 = 102.5
The mean is 102.5
2. Find the difference from the mean for each value:
80 - 102.5 = -22.5
85 - 102.5 = -17.5
90 - 102.5 = -12.5
95 - 102.5 = -7.5
100 - 102.5 = -2.5
105 - 102.5 = 2.5
110 - 102.5 = 7.5
115 - 102.5 = 12.5
120 - 102.5 = 17.5
125 - 102.5 = 22.5
3. Find the square value for each difference:
(-22.5)^2 = 506.25
(-17.5)^2 = 306.25
(-12.5)^2 = 156.25
(-7.5)^2 = 56.25
(-2.5)^2 = 6.25
2.5^2 = 6.25
7.5^2 = 56.25
12.5^2 = 156.25
17.5^2 = 306.25
22.5^2 = 506.25
Note: We must square the values to get the total spread.
4. Sum the squared values and find the average:
(506.25 + 306.25 + 156.25 + 56.25 + 6.25 + 6.25 + 56.25 + 156.25 + 306.25 + 506.25) / 10 = 206.25
The variance is 206.25.
We can use the var()
function from Numpy to find the variance (remember that we now use the first data set with 10 observations):
The output:
Here we calculate the variance for each column for the full data set:
The output: