5. Basics terminology in Statistics
Transcript
Welcome to the next lecture in the course Statistics for Agriculturist. This lecture and the subsequent lectures we will primarily focus on selecting the statistical variables so there are basic terminologies which we are supposed to understand first let us understand the basics and then move forward. So, the data whatever we collect is classified into quantitative data and qualitative data. When we talk about quantitative data it will always be in terms of numbers. For, example it can be one, two, three of fruits or it can be one, two, three, fruits per unit area, or number of fish per unit area, green patch in a given area. So, all these things are quantitative data where you are measuring it with respect to 1, 2, 3 and you give a unit for it. Just by writing 1, 2 ,3 has no meaning you have to write the units also. When we go towards qualitative data, qualitative data does not have a number it only gives you a feel, presence or absence, soft or hard, fish is large or small, fruits are ripened, semi-ripened or raw. So, all these things are nonnumerical data. As far as possible in agriculture or in colloquial terms we try to extract only qualitative data, but keep it in mind that data is subjective, you have to somehow find a correlation, or an indirect measure to convert qualitative data into quantitative data. When we look into quantitative data the further classifications can be discrete continuous. Discrete is nothing but 1, 2, 3, 4 they are all discrete numbers. If I say 1, 1.1, 1.2, 1.3 it is continuous numbers, ok. So, discrete is whole numbers that cannot be further broken down such as number of items. You try to count the fish, you cannot have 1.5 fish, you are trying to count the number of tomatoes, you cannot have one and three fourth of a tomato you cannot count, you count the tomato, it is one apple, it is one person, it is one tree, it is one they are all discrete. Suppose you are trying to measure the weight of a fruit where in which it can be 1, 1.1 gram, 1.2 gram. Many a times in reality the continuous data only comes in agriculture. For example, you can try to see the fish, you try to take a basket of fish, you measure all the fish, length or weight or whatever it is it will be 1.1 gram 1.3 grams 1.8 grams so, now that is called as continuous data. Again, the continuous data also can be split into two, one is called as interval data, the other one is called as ratios. So, interval data means the numbers with known difference between the variables is called as interval data. For example, five minutes, two minutes, three minutes, for 10 minutes, 12 minutes, they are all discrete data, they are all continuous data. The distance between the two data points is the same. For example, if you try to say 1.1 and the next data is 1.123, the next data is 1.3, so the uniformity in the data is lost. So generally, it is a good idea to make the data into continuous and interval, but sometimes you will also have ratios which come into existence. When we talk about ratios, ratios are nothing but number that have measurable intervals with difference can be determined such as height or weight. We try to take a ratio and then we try to talk. For example body to mass index in a human being. So, that is a ratio we talk about. So, very important qualitative, quantitative, in quantitative discrete continuous, under continuous it is interval and ratios. These are very important basics you should understand before doing statistical analysis. When we do qualitative, again the qualitative is divided into nominal data and ordinal data. So, the qualitative we have seen already the data is categorical such as yes, no or response such as a color of an eye, color of a fruit, whatever it is. When we talk about nominal data, the data used for naming variable such as hair color, ripening of fruit, smell from a mango. So, all these things are nominal. When we say ordinal, the ordinal data means the data used to describe the order of value, such as I give 1 for happiness, 2 for sad, 1 for fresh, 3 for slightly fresh, whatever it is like whenever we do questioner, we always follow this ordinal data. So, again Qualitative is divided into Nominal and Ordinal data.
Now, we have done all the data. Now, we are also going to study about the parameters which are involved. After, doing all this parameterization I would like to represent all this data in a graphical form. So, a graph speaks thousand words, a graph speaks thousand data points, ten thousand data points. So, more elegantly and legibly you represent the data you can gain more inferences. So, when we are talking about representing the data which is a very important thing there are also 7, 8 categories.
So, one is called as the bar graph data. This is the bar graph data. So, you have bars and this is the data between x and y. so, x is is a, b, c, d. a, b, c, d can be an interval or it can be represented as any category for example, green color apple, red color apple, blue color apple, whatever it is you can start representing. Histogram is again there is, it is continuously flowing data and here you can have time intervals from 60 to 70 how many, 70 to 80 how many, 80 to 90 how many, so, here it is called as histogram. The difference between Histogram and Bar graph is. Bar graph can be discretized, it can be discretized, it need not be a continuous data, you can represent only one item in each category a, b ,c, d, e. The next one is going to be Frequency Table, occurrence of a single number or occurrence of a defect or occurrence of a size. So, that can be done in a frequency table. In a frequency table in order to compress the data what we do is we try to take a width for the frequency. The width for the frequency is the duration what you take in the x axis. So, here you can have 10 to 12, 12 to 14, 14 to 16, 16 to 18, 18 to 20. How many numbers repeat so you can start doing it, four and then one strike. So, generally what we do in order to have a group we try to take five will be represented like this 1,2,3,4 and a hatch. So, then once you do this, then later you just count in terms of 1,2,3,4 so it is four into five twenty, plus another two so twenty-two, so you quickly write down the data there. So, those things are called as frequency table. This is also exhaustively used today.
The next one is called as the Circle Graph or the Pie graph. So, here in which out of hundred percent you can try to easily figure out what percentage does individual parameter contribute or something. So, you can try to have this circle graph or a pie graph. Then the next one is Line Graph; line graph is you try to first plot all the dots and then you try to connect all those dots you get a line so this is called as the line data. In line data you can have y axis and x axis, if you want to have two axis’s, you can also have y1 and y2 axis in the representation so, you will have one more graph like this ok.
The next one is Stem and Leaf plot. So, this is stem and this is leaf so, you can see that the occurrence whatever is happening in 0 so they start writing it here. So, this is stem and leaf graph. The next one is Line Plot. So, here how many times the occurrence has happened you just start putting 1,2,3,4 and then you can have line plot, and this is called the last one is called as Box and Whisker plot, where in which you try to say the minimum, maximum, and the range in between whatever is there you try to represent. So, we will try to use all of these graphs or pick one of these graphs for representing our data if you could represent the data properly, ninety percent of your interpretation is done quickly. Formulas are one but interpreting in a graph is very very important. The next one is Formation of the class interval. This is what I was trying to talk about the width, so this is what is the class 9 to 11, then it is 14 to 16. So, 11 to 13 is there 14 to 16 they have represented it as with 2, 2 numbers, put whole numbers, put together in each in each class. So, formation of class interval is also very important. Why is it important if you want to use it in histogram you need to know the class, formation of the class. It is a three-step easy process use interval of equal length at convenient round off numbers. So, you should always keep in mind the interval should be equal in length. Suppose the first one you take 9 to 11, the next one cannot be 12 to 15, the interval level should be 12 to 14, then the next one should be 14 to 16. So, generally what we do the lowest number, and the highest number, we try to decide, and we always try to keep it as a whole number, ok. So, how do you decide the starting number and the ending number. We try to look at the data, if there are lot of data which is getting clustered in a small group, then we try to decide what should be the minimum number and maximum number. For example, there is nothing big happening after 51, or there is nothing big happening after 45 so, then what do we do is we even delete this data and try to talk between 9 to 45. We do not lose this data, but I am just telling you. So, 9 to 45 we take and if you want to understand much more finer we can try to make 9 to 10, 10 to 11 finer the class interval more will be your interpretation next for small data set use small number intervals small data sets use small number intervals so if it is a large data it is proportional if you have a large data set use more intervals if there are small data sets use small number of intervals though each one is a the number should be small. For a large data use more intervals that means to say 9 to 15 15 to 20 you keep going like that. So, class representation is very very important when you want to convert the data into a graph and if you want to represent it either as a bar graph or as a histogram, we have to do the class formation.
So, in this lecture what we went through is what is data, we went through different types of data, from there we went through graphs, how do you represent it, and then finally we saw how do you form classes. As and when you are going through the lectures there will be some assignment problems also posted for you so, please try to solve the assignment problems you will try to have more understanding of the course what it goes through and if you have doubts, please put it in the blog we will try to discuss and clarify.
Thank you.