(A) In this case, the observed data is ten interaction sites, of which five have high conservation, five low. As expected, in this case the likelihood peaks at p2 = 0.5. The prior is B(7,3), indicating prior knowledge that high conservation is found in interaction sites; it corresponds to adding seven pseudocounts to the C = high category, and three to C = low, and produces a prior peaked above p2 = 0.5. The posterior is also shown, along with the MAP estimate of p2. The influence of the prior information in this case where the observed counts are low is clear.
(B) Learning from 100 training examples (75 high, 25 low). Here the weak B(7,3) prior has little influence over the posterior distribution, and with a large training set the ML and MAP estimates are similar (p2 ∼ 0.75). The posterior distribution for p2 is narrower—some of the uncertainty about its value has been removed given the evidence (training examples).
(C) Using a stronger prior B(70,30) still indicates that the most likely value for p2 is 0.7; however, note that the prior is narrower—a lot of evidence would be needed to be convinced that p2 was less than 0.6, say. Small samples are more susceptible to noise than larger samples. For a training set with five high and five low conservation scores, the ML estimate (p2 = 0.5) is quite different from the MAP estimate of about 0.7, which takes into account the prior. Hopefully, this illustrates why priors are useful, but also cautions against choosing the wrong prior (or too strong/weak a prior)!
(D) This final example has a B(70,30) prior and shows ML and MAP estimates from training data with 75 high and 25 low conservation scores. This combination of a good prior and a larger training set is the example here with the least uncertainty about the value of p2.