Thanks to a comment by Josh to my last post, I have realized that my rationale for programming a ratio-of-factorials calculator was, um, flawed. Since I’m full to the brim with integrity and honesty, I’ll leave the original post as is, tempting as it is to erase evidence of my stupidity.

It turns out that I was wrong about the role of the normalizing ratio-of-factorials in calculating the BIC (see the original post for details). Worse, I was wrong for what in retrospect is an obvious flaw in my thinking. Better, somewhat, is the fact that my ratio-of-factorials calculator works. Worse, again, is the fact that it took me as long as it did to solve what wasn’t really a difficult problem, I don’t really have a use for it, and writing about it is as good as pointing out, with mathematical rigor, exactly how dumb I’m capable of being. Better, perhaps, is the fact that I have a very small audience (1 +/-1). If I ever come to write posts for a popular blog, it will be nice to have worked out such kinks well before hand in relative anonymity.

To recap, the BIC (or Bayesian Information Criterion) is a model fit statistic that takes into account the goodness of fit and the complexity of the model needed to achieve the fit. It is defined as

-2*log(*L*) + *k**log(*N*)

where the left term is (twice) the (negative) log likelihood of a model, and the right term is the product of the number of free parameters (*k*) and the log of the sample size (*N*). The lower the BIC, the better the fit. If you exponentiate the BIC, you get

*N*^{k}/*L*^{2}

The likelihood I am dealing with (the multinomial likelihood) has two (multiplied) parts, and I was worried that one part (the normalizing constant) could cause problems by (when present) multiplying only the left, ‘fit’ term of the BIC, leaving the right, ‘complexity’ term alone. I “figured” that the BIC *without* the normalizing constant could lead to one conclusion, while the BIC *with* the normalizing constant could lead to a different conclusion. Now that I’ve thought about it more carefully, I see that the normalizing term in the BIC is an additive constant, since it’s a multiplier in the ‘raw’ likelihood and the BIC contains the *log* likelihood. The end result being that the BIC will give the same answer with or without the normalizing constant.

Anyway, fueled by erroneous thinking, I had, in the back of my mind, been working out how to overcome the computational difficulties presented by the multinomial normalizing constant for some time. Well, I figured out how to do it, and it seems quite simple in retrospect. The multinomial normalizer is a ratio of factorials, but the factorials I need to calculate are too big for direct calculation. The solution is to switch the order of operations around, put the elements of the factorials into vectors, and divide element by element before carrying out the multiplication.

Better yet, if you take the logs of the elements of the factorials first, and substitute subtraction and addition for division and multiplication, you can work with much larger factorials and still get the desired multinomial normalizer.

The end result is that I’m glad I worked out how to calculate the normalizer for the multinomial likelihood, but it would be nice to have had a good reason to do it.