Egads! you may be saying, if you don't have an interest in math. Well, although I include all the math, I also wrote this page to be accessible to those without a strong math background. I also wrote it to amuse myself, and I hope it amuses you too. I have gotten some positive feedback, so you might go ahead and take a chance. Live a little!
If you don't understand how the Electoral College works, I suggest reading a little more about it now. The National Archives and Records Administration Federal Register maintains a complete page on the U.S. Electoral College. Click on the Frequently Asked Questions image, then look for the "What is the Electoral College?" link in the menu just under the eagle.
The methods of Banzhaf and others before him were published in various law review journals, notably the Villanova Law Review, Winter 1968 [1]. I learned about Banzhaf's work in an article by Ian Stewart in the July 1995 issue of Scientific American [6] and recursively searched for the other articles. Here is the list of related references of which I know.
Note: The Electoral College system does not actually require the electors to cast their electoral votes for the candidate for whom the popular votes were cast. Aside from showing the absurdity of the system, this failure also confounds any attempt at analysis. Such things do happen, although rarely. On the NARA's Electoral College site, see the 1988, 1976, and 1972 elections, to name a few recent examples.

This expression comes from the standard combinatorial formula for the numbers of ways you can choose p items (in this case, voters who for vote candidate #1) from n total items (i.e. total voters). That quantity is the combination (n,p) and is given by

Banzhaf continues:
... Consider a group made up of N+1 citizenvoters, where N is an even number. The total number of ways in which each citizenvoter could vote for either of the two principal candidates is 2^(N+1) which is equal to 2 x 2^N. Each would be able to cast a critical vote only where the other N citizenvoters were equally divided into two groups; N/2 voting yea and N/2 voting nay. This, as previously indicated, can happen in
ways. Thus, an individual citizenvoter would be critical in determining the outcome of a vote in the following fraction of combinations.

You might think that the probability has an inverse linear relationship to the population of the block. However, as Banzhaf and others have shown, this is not correct. The probability is related to the inverse square root of the block's population. This is rather surprising at first glance, but you can either accept the equations below as proof, or do it yourself and see.
So Banzhaf continues by simplifying the expression and using Stirling's approximation for factorial of large numbers to arrive at the final probability of an individual voter being critical in determining the block's decision. This gives the inversesquarerootrelationship.

So that's the first factor in Banzhaf's Power Index. (inhale, exhale)
The next factor really cannot be computed directly with a simple (ahem) formula like Stirling's approximation. Or at least, I don't know of one (and aren't you glad). Apparently, neither did Banzhaf. He did say in a footnote (rather dryly, if I might editorialize momentarily)

On another occasion, Banzhaf said slightly more about what he did.

So the question is now, "What did Shapley and Mann do to compute their numbers?" Well, they had a slightly different definition of power of a block in a block voting system, but that's not really important for the current purpose. What is important was that they used a very general approach to computing probabilities over large possible outcome sets which is known as the Monte Carlo approach. This will surprise few mathematicians and computer scientists.
For those unfamiliar with the Monte Carlo technique, the brief description is that you sample the possible outputs some number of times and count the number of times that the event in question occurs. Then you assume that your sampling was unbiased, so that the true probability of the event is simply the number of occurrences you saw divided by the number of samples you took. Obviously, you need to cover a significant portion of the possible outcomes before this will give accurate results. Monte Carlo simulation is a large topic, and I won't go in to it any further here. I searched for Monte Carlo simulation on a search engine and came up with some 22,000 web pages. Most are very applied to a particular topic, although the topics range from economics to applied physics. Find one to which you can relate if you want to know more. One interesting note is that we can only hope that the pseudorandom number generator doesn't give us the exact same combination more than once in a particular simulation.
So I wrote some code that would generate a coalition from amongst the blocks in the input. It then determines which, if any, of the blocks are critical according to Banzhaf's definition, which means that they have power in that coalition. They get a point for that trial of the Monte Carlo simulation. After some number of trials, you divide the points that each block has accumulated by the number of trials and that gives you the second factor in the Banzhaf Power Index.
Banzhaf's definition of power was as follows. A given block has power in a given coalition if and only if the block, by leaving the coalition, can turn it from a winning coalition into a losing coalition. (Clearly, leaving a losing coalition does not affect the outcome, since there are no negative weights.) Thus if the current number of 'yea' votes is Y and it requires at least M votes to win, a block with B votes is critical if and only if
There are other definitions of power, discussed below under Criticisms of the Banzhaf Power Index
Finally, the probability that a citizen voter will change the outcome of the election with his/her vote is the product of the two probabilities above (i.e. the probability that both the individual changes the block's decision and the block's votes change the election). According to the laws of probability, the probability that two independent events happen at the same time is the product of the probability that each happens. So we multiply the two numbers together for each block to get the power attributed to each voter within the block. This is the Banzhaf Power Index.
Banzhaf also analyzed two commonly suggested alternatives to the Electoral College:
Stewart then assigns power to individual voters by simply dividing the block's power by the number of residents. Stewart also gives the example of Tompkins County, NY in 1982, which managed to create a weighted system with power per capita for each district that was within 1.8% of the others. (The table is below, but read the paragraphs between here and the table first!)
you should be saying. Doesn't that assume that power within a block has an inverse linear relationship with the population? In other words, don't you want the power divided by the squarerootofpopulation to be equal amongst the districts, not power per capita (power divided by population)?
But the answer is actually 'Yes.' This is what Stewart left out of his article. Apparently, the election board of Tompkins County, New York also left out this part from their distribution of votes among blocks. Where's John Allen Paulos (author of Innumeracy  Mathematical Illiteracy and Its Consequences) when you need him?
So here's the table. Note that the numbers in the fifth column are nearly the same for each district. Stewart says that this implies that each voter in the entire election has nearly the same power. But look at the last column. This gives the ratios of powerdividedbythesquarerootofthepopulation. (That is, for each block, divide column four by the square root of column two. Then for each block, divide that number by the smallest of such numbers for each block.) Since the squareroot of 2 over p in the Banzhaf Power Index cancels out in the ratios, these numbers are equal to the Banzhaf Power Ratios of each district. That is, divide the Banzhaf Power Index for each district by the Power Index for the smallest district and you get a measure of relative strength.
Note that the Banzhaf Power Ratios are not all that close to each other. As Banzhaf noted in general, the voters in larger districts have more power than those in smaller districts. For example, the voters of the district of Lansing have approximately 1.33 times the power of voters in the district of Dryden West. The one exception is between the two districts with an equal number of votes. Then the voters in the district with 10 extra people have slightly less power, which makes sense.
Block Pop. Weight Power Power/Pop Norm. Power/Sqrt(Pop) = (Votes) Banzhaf Power Ratio       Lansing 8317 404 4734 0.569 1.335714 Dryden_East 7604 333 4402 0.579 1.298965 Enfield_Newfield 6776 306 3934 0.581 1.229748 Ithaca_3 6550 298 3806 0.581 1.210087 Ithaca_4 6002 274 3474 0.579 1.153852 Ithaca_SE 5932 270 3418 0.576 1.141931 Ithaca_1 5630 261 3218 0.572 1.103571 Ithaca_2 5378 246 3094 0.575 1.085621 Ithaca_NE 5235 241 3022 0.577 1.074743 Groton 5213 240 3006 0.577 1.071306 Caroline_Danby 5203 240 3006 0.578 1.072335 Ithaca_5 5172 238 2978 0.576 1.065526 Ithaca_West 4855 224 2798 0.576 1.033288 Ulysses 4666 214 2666 0.571 1.004283 Dryden_West 4552 210 2622 0.576 1.000000
I don't think that Banzhaf ignores combinations with more than simple majority as much as he discounts the extra votes as being noncritical, then gives every voter within the block the opportunity to be the critical voter in that block, and identifies every block as a critical block that has enough votes to change the election. This strikes me as being nearly identical to Sickels' description of the ShapleyShubik method (above). In fact, Banzhaf noted that his method and the ShapleyShubik method

The difference really only comes in to play when the election is a landslide (or at least is not close). Banzhaf does not assign critical status to anyone when the election is a landslide, while Shapley and Shubik do. I think that Banzhaf's definition is more correct, since power in a landslide is basically nonexistent from an intuitive sense. I mean, should anyone be considered to be critical (i.e. to have power) in an election decided by 52513 (e.g. the 1984 U.S. Presidential election)? Also, one can see why the difference is "at most a few percent," since by definition, only a few percent of coalitions form a landslide.
It would be relatively easy to modify my program (see below) to compute the ShapleyShubik index, although I believe the program would not be quite as efficient because ShapleyShubik only assigns power to the one district that puts the coalition over the majority. That means order of voting is important in each trial, and then every block is given the opportunity to be the "crucial" block. But this strategy would require generating many more coalitions, since now there are not only N coalitions, but also R orders in which that coalition can be achieved. And R is a big number for systems as large as the U.S. Electoral College. R = B! (B factorial), where B is the number of blocks. 51! is approximately 2.6 x 10^(66)  yes, that's a 67digit number. In Monte Carlo methods such as what I use to compute the solution for a system as large as the Electoral College, there is a minimum percentage of the possible output space you should sample (i.e. minimum number of trials). Even with Banzhaf's definition, I'm probably not sampling enough. And now I'm supposed to multiply the number of samples (and length of time) by a 67digit number?!? No, thank you!
Also, the discussion up to this point has assumed that a winning coalition requires only a simple majority. However, voting assemblies sometimes require more than a simple majority in order to pass a measure. The Banzhaf Power Index can be adjusted to this case quite easily by simply changing the number of votes required to form a winning coalition. All the other mechanics of the computations remain the same, as does the definition of "critical." I have modified my code to allow the number of votes required to form a winning coalition to be input by the user. Thanks to Elaine Garcia for pointing this case out to me. Even if it was because she needed the code to compute that case....
test.dat
and TompkinsCountyNY.dat
are from the
Stewart article, while usa.dat
corresponds to the 1990 census
configuration of the Electoral College and usa1960.dat
to the
1960 Electoral College.
My code uses the bruteforce method (i.e. counting the total number of coalitions and the number in which each block is critical) when the number of blocks is below a predefined constant. There's nothing magical about the number, but I do not recommend raising it high enough to try the bruteforce method on a sample as large as the U.S. Electoral College (51 blocks). It may not finish in your lifetime. It should be possible to run the program on a few nottoosmall samples for timing data, then pick an appropriate constant for your machine, using the fact that there are 2^n combinations of n objects, when each has a yes/no choice.
For "large" numbers of blocks, the program uses the Monte Carlo method, which
means that the banzhafNextCoalition()
function in
banzhaf.c
has an if
statement that basically divides
it into two different functions. For either "large" or "small" numbers of
blocks, the equations listed above under Mathematics of the
Banzhaf Power Index for the power of each voter to change the block's
decision are used to compute that factor. The two factors are put together in
banzhafComputeRatios()
.
If you find a bug in my code or have your own implementation, please send me mail. Feel free to use the code here to help get you started. Please give credit to me if you do.
State  Population  EV  PR  State  Population  EV  PR  
CA  29760021  54  3.344  LA  4219973  9  1.308  
NY  17990455  33  2.394  MS  2573216  7  1.302  
TX  16986510  32  2.384  SC  3486703  8  1.278  
FL  12937926  25  2.108  IA  2776755  7  1.253  
PA  11881643  23  2.018  AZ  3665228  8  1.247  
IL  11430602  22  1.965  KY  3685296  8  1.243  
OH  10847115  21  1.923  OR  2842321  7  1.239  
MI  9295297  18  1.775  NM  1515069  5  1.211  
NC  6628637  14  1.629  AK  550043  3  1.205  
NJ  7730188  15  1.617  VT  562758  3  1.192  
VA  6187358  13  1.564  RI  1003464  4  1.190  
GA  6478216  13  1.529  ID  1006749  4  1.188  
IN  5544159  12  1.524  NE  1578385  5  1.186  
WA  4866692  11  1.490  AR  2350725  6  1.167  
TN  4877185  11  1.489  DC  606900  3  1.148  
WI  4891769  11  1.486  KS  2477574  6  1.137  
MA  6016425  12  1.463  UT  1722850  5  1.135  
MO  5117073  11  1.453  HI  1108229  4  1.132  
MN  4375099  10  1.428  NH  1109252  4  1.132  
MD  4781468  10  1.366  ND  638800  3  1.118  
OK  3145585  8  1.346  WV  1793477  5  1.113  
AL  4040587  9  1.337  DE  666168  3  1.095  
WY  453588  3  1.327  NV  1201833  4  1.087  
CT  3287116  8  1.317  ME  1227928  4  1.076  
CO  3294394  8  1.315  SD  696004  3  1.071  
MT  799065  3  1.000 
I have run the program on the 1964 data (1960 census data) for which Banzhaf published figures. You can click on the file of results from that simulation below.
The numbers do not agree exactly; both are probably slightly off from the "true" answers according to the definitions above. Banzhaf's original numbers will be slightly off because of the approximation he made so as to avoid lengthy computational processing in the 1960s (a wise decision). Maybe someday I will update this page with a description of what he did; I have spoken to him about this. My numbers will be off because Monte Carlo numbers are probably always slightly off, and I'm not really able to generate enough samples with the standard numerical precision that is supported in this implementation. But repeat runs were really, really close, so I doubt the errors are large.
Some details, for the geeks among you: I computed 4.29 billion coalitions using an SGI Onyx^2 with an R10000 processor. That took 24 hours, 49 minutes to run on the 1990 data.
Let me take this opportunity to climb on my soapbox for a moment and plead with everyone (especially in the U.S., where voter turnout is so low) to vote in each and every election. It is never difficult to register (not in the U.S., at least), and it does matter to you who is elected, whether or not you can see it in your daily life. It may seem from these numbers that it doesn't matter, but it does. Also note that if you do not vote, you have ZERO voting strength. Don't throw away what you could use to affect your government.
That's enough of the soapbox for today.
NOTE
I have in the past made a few small changes to the code. I hope the current version supports all the claims I make on this page. See the code for an option to be faster but less accurate. Also, the program now accepts an argument to require more than a simple majority.
This is the Tompkins County, NY input data shown above. This is the data for which Stewart's description of the Banzhaf Power Index is misleading. My sample output file is not completely generated by my program. I manually entered the powerdividedbysquarerootofpopulation column to show the difference between what Stewart and Tompkins County did and what Banzhaf intended. I also have more decimal places than my program as it currently is will print, but you can change that easily enough.
These numbers were obtained with 4.29 billion (yes, with a 'b') samples. The numbers from Banzhaf's article of course will not quite agree, but it is close. Differences are due to the Monte Carlo approach being based on pseudorandom numbers (and of course different algorithms for generating such since the late 1960s when Banzhaf did his work), and possibly on changes in division and square root algorithms. (That is, modern machines likely have more digits and allegedly more accurate approximations to these operations.)
Of course, any time you present an analysis of this type, you really ought to compute the variance of your numbers. (For the nonmathematical, variance says how far away the answer could be, and implicitly assumes that you know what the probability distribution between the stated estimate and the possible values are. Hmmm, maybe that wasn't the best description for the nonmathematical either. Sigh.) What follows here is very much a "backoftheenvelope" calculation, but this should give the average person some idea of the accuracy of the numbers in the outputs above without undertaking such heroic effort as looking in a math book. Gasp!
For the Monte Carlo approach, one expects the error to be bounded by 3v/(sqrt(N)), where N is the number of trials and v is the variance for each trial. N for the sample outputs I have above is 4.29 billion. I haven't actually calculated v, but it would go something like this. Variance is defined as the expected value of ( x  u )^2, where u is the mean. In this case, the mean is the probability for the particular block. Of course, these are unknown a priori. But they are bounded by 0<=u<=1. I think that in the worst case, we're looking at a variance for each trial of 1.0. That's a variance of no worse than 4.48e05 for the probabilities computed for each block.
For Stirling's approximation, there are numerous variations. I used the one Banzhaf used, given above. The error in this approximation is bounded by 1/(12 * N). N here is the number of people in the block, so we're looking at numbers that range from around 450,000 to 29.76 million for the 1990 data, 226,000 to 16.78 million for the 1960 data. So, let's approximate all such numbers with 1/220000, or an error of around 4.55e06. That is, the approximations using Stirling's approximation should be off by no more than that number, and actually are better for all the 1990 data, and larger states have more accurate values than smaller states.
Next, we plug this error in each approximation to a largenumber factorial into the equation given above for probability of an individual being the deciding vote in the block. Recall (or scroll up and see) that that equation has three such factorials, all of which theoretically need to be approximated. In fact, Banzhaf plugged the approximation symbolically and then simplified. But note that the two factorials in the denominator are half the size of the factorial in the numerator. So the worst thing that could happen is an error that looks like

The upshot of all this is that I expect (unless the pseudorandom number generator did something really evil) that both factors in the power computed for each block have 4 significant figures (i.e. meaningful decimal places to begin the answer, not including consecutive zeros between the decimal point and the significant digits). In other words, I expect that if I run the simulation again (or if you run your own), the numbers will agree through four significant digits. But that's in the raw power computed. What I'm showing you (and what Banzhaf published) are ratios between the smallest computed power and the other blocks. That division by a rather small number has the potential to introduce error, although given that the highest ratio is only 3.345, it should introduce no more than 3.6e03 amount of error by a change that does not affect the third decimal place (which is the fourth significant digit once the ratios are computed) when the answers are rounded as I have done here (and in my program, i.e. I don't look at more digits than I'm showing you here). So this could introduce a variation of three or four in the fourth significant digit. A variation that large is not likely, of course. I've been generous with the error estimates I'm giving here, so the real values are somewhat less and thus the variation is almost surely less than four in the fourth significant digit.
So I expect that another simulation will indeed give exactly the same numbers for the power ratios to within the three decimal places I show.
I would classify that as a successful test and I firmly believe that you won't see a variation of more than four in the last decimal place from the numbers I have here. You probably won't see a variation of more than one in the fourth decimal place. By all means, tell me if you do and I'll put it on this page. That should include any changes to improve the accuracy of Stirling's approximation or an increase in the number of samples. Obviously, if you decrease the number of samples or the accuracy of Stirling's approximation, then this claim would not apply.
Apparently, it also does not apply if you try to use each bit of the pseudorandom number as an independent variable, a change I introduced into the code in order to squeeze some more speed out of the computations. It did yield better speed (24:49 hours:minutes versus 38:46), but it changed some of the numbers in the third significant digit. Thus it has introduced some inaccuracy into the computations that might be problematic. I think the problem would be most severe for voting systems that just surpass the threshold for which the exact computations are done. I have not tested this, however. I have left both blocks of code in the C program, and you can use either one by compiling in the version you like. (There's a preprocessor directive you should define or undefine.) The default was at one time the faster, less accurate version, but is now the slower, more accurate version.
And if you've skipped down from the top, stop cheating!
Second, let me muse about things I might try differently if I were to do this again. Yeah, right!
This is by far the most likely alternative to make any difference in the computed numbers, and thus (I think) the most interesting of the ideas I list here. I gave the definition in the section on criticisms to Banzhaf's definition. I noted that Banzhaf said the two will yield basically the same results and gave at least one intuitive reason why. And, from the way I understand what Banzhaf said, Shapley helped him prepare his numbers, so I assume that Shapley did not balk at the claim that the two methods yielded similar results. But I can only assume.... At any rate, you could make the change. I strongly suggest that you decrease the sample size and implement a parallelization scheme if you are going to do this. Read the note about number of samples you would need under Responses to the Criticisms and the idea for parallelizing the code cleanly in item 3 below.
I'm still only sampling a very small percentage of the possible outcomes for coalitions. There are 2.2518 x 10^{15} possible, I only sample 4.29 x 10^{12} of them, or less than 0.2%. That's a much smaller percentage than would normally be used in a Monte Carlo simulation, from what I read in my glancing through the literature looking for the formula for variance. But more would require some fancier handling of the arithmetic and/or parallelizing the code, I believe.
I also thought about splitting the sample space up by allowing the user to enter a prefix to be prepended to the random bit stream. Using three prefix bits, for example, would also offer easy eightway parallelism of the code (8=2^{3}) by simply running eight processes with the eight possible threebit prefixes. That might make it more tolerable to generate more samples.
Assuming that each of the subprocesses uses the same (large) number of samples, you could simply average the resulting power ratios from each process to get the final values. That assumes that the raw power computed for the smallest block is always the same, and that each process decides that the same block has the least power. With large sample sizes in each process, that seems rather likely. If you don't want to make that assumption, then have the program print out the power indices, which can always be averaged.
The nice thing about the prefix strategy for parallelization is that it guarantees that no two processes will create the same coalition, since they are guaranteed to have a difference in at least one of the first three assignments of blocks to coalitions (in or out). Of course, it can not guarantee that a single process will not generate a coalition twice.
It might be interesting to see how a better version of Stirling's approximation would affect the final computed power ratios. I've already said that I expect the answers not to change by more than four in the fourth significant digit, though.
I can't see that more decimal places would mean anything, but arbitraryprecision libraries are available on the 'net. I used doubleprecision.
Mark Livingston