Please ensure Javascript is enabled for purposes of website accessibility

4. Selecting Statistical Variables

Transcript

Continuing to the previous lecture on how do we use statistics in agriculture. Now we will try to see the concept of skewness. Skewness means this the data points are not at the center, they are moved either to the left-hand side, or to the right-hand side what does that infer for us. So, we know that center point and the spread you should always try to have a symmetry, but in reality, it is very difficult for us to have it. So, when we try to take two things one is mean and median when we try to plot it against each other you can have left skewness, right skewness, and you can have symmetry. When the mean and the median are equal you try to have a symmetry. This is the best condition if you try to have mean, lesser than the median then you will have left skewness when you have mean greater, than the median you will have right skewness. This skewness also tries to give some interpretation of the acquired data from the real time so we always look for data whether it is symmetry or not, whether the presented one we should choose the mean or the median. At that point of time, we try to use this skewness data isn’t symmetrical, then arrange it or it is skewed. Choose median as both median and mean would be different, this will have two cases right skewness, and left skewness. So, please make a note of these things which are very very important the skewness, skewness, I repeat skewness. Why we have seen three different things mean, median and mode. So, we are trying to compare mean and median.

The next one is the Measure of dispersion. It is very important for us to know the data which is getting dispersed. So, what is the dispersion you have, with respect to the central point. So, for that we have another three parameters which is to be done. The first one what we saw was mean, median, mode and now we are adding three more such that you can try to talk more about the data what you have acquired. The first one is the range. Range let us go back to the same example of apple diameter 50. This is the mean so, now the apple falls in the range of 50 plus or minus 5 again millimeter. So, the range is 45 to 55 diameter. So, the apples fall in the range of 45 millimeter to 55 millimeter. This is the range, range is for example, if you try to take it this is a box so the starting of the box to the ending of the box is called as the range, okay. The next one is going to be the standard deviation, and variance. This is otherwise called as Sigma. So, how do you calculate the Sigma. Sigma is nothing but root of 1 by n minus 1 n is the number of items there, multiplied by the summation of i equal to 1 to n, where you try to take the current value minus the mean and you do the whole square you do root rms. So, why do you do squaring and then root rms you square it, so that you can you remove all the negative terms. When you do the root, you try to do whatever you have squared you are trying to undo it. So, you will try to get the standard deviation and variance. If your process is consistent in the engineered products, we always try to maintain 12 sigma, 6 sigma, 3 sigma, but in natural thing this sigma variation is to be is very high. But when you are looking for commercial products today, for example, let it be chips, let it be a cut, a strawberry. So now every company would like to maintain a very good sigma so, naturally it has to go back to the agricultural firm where, the product which are produced does not have such a large variation tries to have a such a small variation so maybe from 50 plus or minus 5 it is reduced to 50 plus or minus 1. So, then we try to say the standard deviation and the variance is less. So, this is the second parameter which talks about the dispersion of the data. The third one is going to be interquartile range. Interquartile range means you try to take a box. Try to do it like this from 0 to 100, and then what you do is for example this becomes 20, 0 percent, 25 percent, 50 percent, then it is 75 percent, and then it is 100 percent. So, now when we try to do this quartile ranges what is the distance between q3 and to the extreme end from q one to the extreme end is nothing but inter quartile range. You divide it into four and try to see what is the distance for from q one to the extreme and q two to the extreme, you try to get this range which is called as interquartile range. Now, which one to choose is going to be, how to choose distribution must be always symmetrical. We often use the average as a measure of location. So, to supplement it standard deviation is also used. So, the 50 is the average, plus or minus is the standard deviation, which is used, and which talks about the dispersion range.

The next one is going to be the Standard deviation. Standard deviation is after all based on the average data. The distribution is skewed we often use median as a measure of location so, that supplement it by IQR is used as a measure of dispersion. So, the average is often it will often increase with time, the spread increases with an increasing average. These two points are very important for you to understand and digest. In this lecture we saw Mean, Median, Mode. We saw range standard deviation and IQR which is interquartile range. We saw six parameters which talks about the data center location and the data spread

Thank you.

 

Licence

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Statistical Techniques for Agriculturists Copyright © by Commonwealth of Learning (COL) is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book