I’ve been digging into some (actually, lots of) data on ballot layout and preference voting from the Swedish electoral authority, Valmyndigheten, recently, mainly as a way of practicing my RDD-skills. In the process I’ve come across some funky results that I have a hard time getting my head around, which has led me to contemplate: is the electoral authority following its own rules regarding list placement?
So first of all, why bother about ballots? That the design of ballot papers can have effects on how voters choose their candidates is already well established, an example being list effects (top of the list is favoured, even other things being equal). This might not seem like a surprise to anyone (and perhaps it shouldn’t be) but it does carry certain implications for voter rationality – after all, if voters were actually rational and deliberate about choosing their preferred candidates, the design of the list shouldn’t matter whatsoever.
Swedish party ballot papers typically contain 30 or more candidates per, which means that the candidates often end up filling both the front and the back of the ballot. This is good news for us as researchers (or is it? see the discussion at the end), since it allows us to investigate certain types of list effects. In this case: does it matter whether a candidate ends up on the back side? Intuitively, we might expect voters to be less likely to pick a candidate from the back side of the ballot paper simply because they may not bother flipping the ticket over and having a look. People are lazy.
Now, if we were to simply compare the number of preference votes between candidates on the front of the paper and candidates on the back, we would capture a tremendous amount of other things. Candidates at the very top of the list are usually there because they are incumbents and are therefore more known, and list placement also signals the competence that parties internally think the candidates possess. Any average difference between candidates on the front and the back of the ballot could therefore not be interpreted as effects of which side they ended up on.
Enter regression discontinuity design – RDD. For those who may not know, RDD is a method of causal inference based on the existence of thresholds, precisely around which we can plausibly assume as-good-as local randomization. In this case, it appears likely that the last candidates on the first page and the first candidates on the second page are not going to be systematically different from each other, in the way that the first candidates on the first page and the first candidates on the second page would be. In a nutshell, we can therefore restrict the analysis to the candidates just around the threshold (the breaking point where the remaining candidates are moved to the backside), and any systematic differences in outcome (share of preference votes) would then be a causal effect of ending up on either side of the ballot paper.
So, I dug deep into the electoral authority data banks and built two datasets of candidates’ preferential vote share, and list placement number. One is from the parliamentary elections in 2010 (4281 observations in total), and the other is from the municipal elections in 2014 (25273 observations in total).
The cutoff on Swedish ballots are, according to the electoral authority (and this is crucial), determined as follows: If there are less than 28 candidates, they are all fit on the first page. If there are exactly 28 candidates, 26 will be fit on the first page and the remaining two on the back side. If there are more than 28 but less than 65 candidates, 27 will be fit on the first side and the remaining on the back side. If there are 65 candidates or more, 31 will be fit on the first side and the remaining on the back side. A caveat is that in rare cases, a candidate’s name and description may be long enough to fill more than one line on the ballot. In these cases, it is the number of lines rather than the number of candidates that are used to guide the cutoff, and the list placement for the candidate on the margin is going to decrease for that particular list. This has not been taken into consideration here due the difficulty of determining when a candidate description is long enough to reach a second line, why a small number of observations will actually have an erroneous (too high) list place cutoff (note: too high – this is also crucial later).
RDD-graphs with cutoff-centered list placement for these two datasets (fixed size bins of 100 observations each, and third-degree polynomials fitted) look like this:
Ok, big bummer. There is nothing there. No clear jump around the threshold whatsoever. In the face of this, any further analysis is sort of superfluous, or unconvincing at best, but I would like to bring up something slightly different that weirds me out just a little bit.
If we were to have found something there, a typical robustness check would be to test a number of placebo thresholds as well – essentially, “pretending” that the real threshold is somewhere else and seeing if we find effects anyway. If we do, that might be an indication that something else is going on in the data. I ran placebo threshold tests with two different models – the standard, old-fashioned close-margins one, where only the observations closest to the threshold are included, as well as a local-linear one, where a slightly wider window is used but linear controls for the forcing variable (the list placement) are included. The graphs are presented below. As a side-note, all models also included a dummy for the candidate supposedly at the top of the second page, to rule out previously mentioned effects of being at the top that could otherwise skew these estimates.
Is that just me, or isn’t it a bit peculiar that both of these large, independent datasets, and two types of specifications, seem to show significant effects just a few list places after the “real” threshold (that is, to the right of 0 in the graphs)? To be sure, there are numerous problems here, the confidence bands differ from zero in multiple places – but these places are different between the two datasets and also different between the close margins and the local linear models. Meanwhile the dip just after the threshold is apparent everywhere. Further, when running a multitude of models (close margins with different bandwidths, local linear with ditto, and flexible polynomials) and robustness checks (bandwidth sensitivity, regressions with placebo outcomes) with a threshold that is further down (list place 30, to be precise), this looks very consistent.
What about that caveat with candidates on multiple lines? The crucial point with this issue is that any such cases will push the real threshold back (as in earlier places on the list). Therefore, it couldn’t possible explain any bump to the right of zero, but would actually dilute any effect on the side of the threshold.
What does all this mean? Is it really just noise? Is the information regarding the cutoff on the ballots from the electoral authority simply wrong, and the real cutoff is a few places further down? Are the stated rules just not being followed? Have we captured some hitherto unknown oddball effect?
On the second account – if the rules for ballot ordering are simply not consistently followed – a simple google search reveals several examples of ballot lists from recent elections that appear to break these rules (see for example this, this or this). That in itself reveals a bit of, in my mind, unacceptable carelessness on behalf of whoever is typesetting these lists, since it poses a significant problem for people who (like me) may want to actually use this data, that is meticulously collected by the authorities. What’s the use then? Either way: is the occurence of rule-breaking in list typesetting common enough to produce these results? This is impossible to know given the data that Valmyndigheten provides.
I’m perhaps leaning towards noise as the most plausible explanation, but that is also slightly upsetting. It’s an extremely straightforward and intuitive hypothesis (that voters are less likely to pick someone from the backside of the ballot), and the datasets are very large – yet not a hint of an effect. And instead, clear, robust effects at an (apparently!) completely nonsensical threshold just slightly further down on the list. If I didn’t know better, I’d say someone was playing tricks on me.