Let's say the probability of success is \(p = 0.001\), and there are \(n = 1000\) trails. We can apply the binomial distribution here:
where \(k\) is the number of successes. The expected value of \(X\) is \(\mu = np\). Suppose we know what the \(\mu\) is but we don't know anything about \(p\) and \(n\) except that \(p\) is very small and \(n\) is very large, then can we have a model with \(\mu\) as the only parameter? Let's create a new probability function \(P_n\) by expanding our binomial probability function:
Let's get rid of \(p\):
If \(n\) is very large, then the last term would be very close to \(1\):
In this article, I represent \(e^{x}\) as a limit, and that representation can be used here:
In \(n!\), the terms after \(n-k+1\) can be canceled out:
For the next step, I will switch \(k!\) and \(n^k\):
Since there are \(k\) factor in both the numerator and denominator:
If \(n\) approaches infinity, the different between \(n\) and \(n-k+1\) will get closer to 0, which means every one of these \(k\) terms will get closer to one. So for very large \(n\):
Therefore, where \(n\) is very large and \(p\) is very small, the probability function is:
If you sum the probabilities form \(k=0\) till \(k=\infty\), does it add up to one?
Using the representation of \(e^x\) as an infinite series, we can see that this is true:
Also, if the variance of a binomial distribution is \(np(1-p)\), then the variance of the Poisson distribution would be: