4. Varying probability sampling
Transcript
Welcome to lecture number four in week two. Now we are going to see the very varying probability sampling. See we always assume that whatever event happens, happens 100 percent but it does not happen. Any event happening follows a binomial distribution. So, when I say binomial distribution then there comes a Probability. So, in this lecture we will try to introduce a concept called Probability getting introduced in the sampling. So, Varying probability sampling will be the focus of this lecture. So, till now we have covered simple random sampling, which is probability sampling. We have covered systematic sampling, stratified samples, cluster sampling and multi stage sampling. So, now what we are seeing is Varying Probability Sampling.
Unequal or varying probability sampling which is nothing but unequal probabilities provide more effective estimators than equal probability sampling, and this type of sampling is known as Varying Probability Sampling okay. So here suppose let us assume I am trying to spray some insecticide, or I am trying to spray some medicine, or I am trying to throw some food into a aqua farm, when I do it what is the guarantee, that all the pieces will go settle down. The spray whatever I do, what is the guarantee it exactly hits at the target. It is only a probability. Today more and more statistic is getting into agriculture and they are trying to talk about what is the water consumption a farm has, how do we improve its productivity. So, in that case we try to bring in little bit of probability sampling concept into our analysis. Units are selected with probability proportion to give a measure of size. This sampling limit is also called as probability proportional to size abbreviated as pps sampling. The main limitation of this varying probability sampling is going to be the is that it involves writing down the successive cumulative totals. We will see a problem then you will understand, it is time consuming and tedious especially if the number of units in the population is extremely large. We are supposed to use the probability sampling but you should also understand there are some nuances while implementing that in the field. Let us take a simple example a village has 10 orchards containing 150, 50, 80, 100, 200, 160, 40, 220, and 60, and 140 trees respectively. It is desired to select a sample of four orchards with replacement and with probability proportion to the number of trees in the orchard. So, you have written down all the 10, you have written down the size Xi and then what you are trying to do in the next column is you are trying to write down the cumulative size, that is cumulative means it is summation, 150 200 280 when you go down the last number comes to 1200. If you want to see the numbers associated it is going to be 1 to 150 then 151 to 200 like this it will go from 1061 to 1200.
Now the next one is going to be write down the pps. So, when you write down the pps what we do is we try to talk about 150 divided by 1200. You try to convert this 150 out of 1200 what is the pps, what is the probability of what are you trying to do you are almost trying to normalize the data, so you will get some value here, so that value is noted down and that is taken further in your processing. It is ok if you have 10 if you have 100 samples this becomes really challenging for you to mathematically do. So, let us take another simple example a village has 10 holding consisting of 50 30 45 25 40 etcetera etcetera up to 27 fields respectively. Select a sample of four holdings with replacement method and with probability proportion to the number of field in the holding. So, you try to solve this problem try to do the pps of finding out for this problem and you will try to see which one to choose and how do we choose. So, you can see here in case n equal to 10, m equal to 50, Then first we have to select a pair of random numbers so that is 10 20 and hence 10th unit is selected as the sample. Similarly, choosing other pairs (4, 13) (5, 45) (7,15).
The pair of (5, 45) is rejected as 45 is greater than the size of value 40 and so another pair is drawn which turned out to be (6, 30). Hence, the sample which consists of the holding with series 10 4 7 and 6 are selected finally for doing replacement method. So, this was a simple example we just solved two simple examples to understand the topic of varying probability sampling.
Thank you.
Download