Let's say the probability of success is \(p = 0.001\), and there are \(n = 1000\) trails. We can apply the binomial distribution here:
where \(k\) is the number of successes. The expected value of \(X\) is \(\mu = np\). Suppose we know what the \(\mu\) is but we don't know anything about \(p\) and \(n\) except that \(p\) is very small and \(n\) is very large, then can we have a model with \(\mu\) as the only parameter? Let's create a new probability function \(P_n\) by expanding our binomial probability function:
Let's get rid of \(p\):
If \(n\) is very large, then the last term would be very close to \(1\):
In this article, I represent \(e^{x}\) as a limit, and that representation can be used here:
In \(n!\), the terms after \(n-k+1\) can be canceled out:
For the next step, I will switch \(k!\) and \(n^k\):
Since there are \(k\) factor in both the numerator and denominator:
If \(n\) approaches infinity, all of these factors will get closer to one. So for very large \(n\):
Therefore, where \(n\) is very large and \(p\) is very small, the probability function is:
If you sum the probabilities form \(k=0\) till \(k=\infty\), does it add up to one?
Using the representation of \(e^x\) as an infinite series, we rewrite the above as:
And we know that this is true. Also, if the variance of a binomial distribution is \(np(1-p)\), then the variance of the Poisson distribution would be: