Some failed replications: should we blame motivation crowding theory or MTurk?

I wrote last week about motivation crowding theory and in that post I hinted that I’d get back to the subject of replications of this phenomenon. A quick recap: motivation crowding is a phenomenon that can occur if someone, for example, acts in a certain way motivated by pro-social sentiments (other-regard, altruism or the like) and when financial rewards are introduced seizes to care about the previous motivation, now in favor of a simple self-interested rational calculus: is it worth the money I’m getting? That is – the pro-social motivation that previously drove the behavior is crowded out by the new incentive. Where money goes in, care for others goes out. The consequence of this can very well be that an added monetary incentive actually decreases the likelyhood of engaging in that very behavior.

Since I’ve been dealing a bit with motivation crowding, I thought it’d be interesting to see how two older studies would fare in a type of conceptual replication.

The first study is by Frey & Oberholzer-Gee (1997) and investigates the introduction of monetary incentives in advance of a referendum about the construction of a nuclear waste storage facility in a particular canton in Switzerland. The authors sent out surveys asking a number of swiss citizens whether they would vote yes or no to such a proposal. The treatment was that some respondents were also given a brief description of a proposal to compensate the inhabitants of that canton with a sizable sum of money (to be paid annually) for the inconvenience of having a waste site in their “backyard”. They were asked how they would vote if that proposal were to be put forward. The result, as you may have guessed, was that the ones who were asked about the policy in which they would receive substantial economic compensation were in fact much less likely to say that they would vote yes than those who were not offered any such compensation (the affirmative vote dropped from roughly 50% to roughly 25%). Voila – motivation crowding. Where they may previously have been motivated by a sense of responsibility (it has to be built somewhere, after all), they were now simply weighing the stated monetary payoff against the inconvenience of having the facility.

The other study is the famous charity collection study by Gneezy & Rustischini (2000). In the study, the authors randomly divided israeli high-schoolers, who are occasionally sent out to collect money for charity, into three groups: first, a control group, who would be sent out to collect money as usual, without compensation. Second, a group of students who were told they would get paid the equivalent of 1% of the money they collected. Finally the third group was identical to the second, but the compensation was 10%. The results were that the 1% group collected substantially less than the control, whereas the 10% group collected only slightly (and not significantly) less. So: there appears to be motivation crowding when incentives are introduced, but once they are there, the incentives have the right “sign” – higher incentives mean more effort.

Nuclear facility scenario

I wanted to see whether these two “classic” effects could be recreated in a more abstract setting – namely in a survey experiment on MTurk. Said and done, I constructed scenarios modeled on the two original studies. For the first one, I simply asked whether respondents would support the building of a nuclear waste storage facility in their county, where half of them were also told that they would receive economic compensation ($2000 per person and year, congruent with the original study). The vignette was formulated as follows, with the bold text appearing only in the treatment group (and thus forming the stimulus):

Please read the following scenario carefully and make sure that you have understood it. Then answer the questions.

Nuclear power plants produce waste products that need to be stored safely for a very long time after use. To store this waste, specific locations has to be found with characteristics that can guarantee its safe storage.

Suppose that the United States Department of Energy (the DOE), after completing exploratory drilling, proposes to build a storage facility for low- and mid-level radioactive waste in your hometown. Federal experts examine this proposition, and the federal government ratifies the proposal to build the storage facility in your community. Moreover, the government decides to compensate all residents of your community with $2000 per person and per year. In a referendum, you are asked to decide whether you accept or reject this proposition.

Following the vignette, the respondents were simply asked how they would vote. I had a total of 364 american respondents answer, and here is what came out:

nuclearwaste

Massive positive effects of the incentive: more than 20 percentage point increase with economic incentives. No motivation crowding in sight!

Charity volunteering study

For the second study, a slight change in the scenario was required, making it answer a somewhat (but not dramatically) different question. Respondents were first told some facts about childhood leukemia in the United States, and then told about a hypothetical charity that raises money in support of children battling leukemia. They were then randomized to receive either information that street fundraisers worked as volunteers, or that they received an hourly wage ($7.25). Finally, they were asked if they would consider working as a street fundraiser for this charity. The vignette, again with the treatment in bold:

Please read the following information carefully.

Leukemia, or cancer of the bone marrow and blood, is the most common form of cancer among children in the United States today. Around 10,000 children are diagnosed and up to 1,500 children die every year from leukemia. Treatments often cause painful and debilitating side effects such as nausea.

A newly formed charity is seeking street fundraisers to help collecting money. The collected money will go to a fund that will be used to support children battling leukemia, and is expected to be of great help to the affected children. Street fundraisers will be given the opportunity to collect money for 2-4 hours per month and are paid an hourly wage of $7.25.

This experiment thus departs from the original study in that the question was whether they would participate at all, not what their level of effort would be. As I said, this is conceptually speaking a slightly different question, but with regards to the underlying theory of motivation crowding, this should not matter. Additionally, only one level of compensation was tested. I also anticipated that social desirability bias might potentially be a huge issue (since respondents were well aware they would not be committed to anything), so I added a little extra touch: I also asked those who answered Yes whether they would like the author of the survey to get in touch with them if the opportunity arises. This, I figured, would weed out at least some of the “socially desirable” yay-sayers. A total of 264 respondents answered, and the results were:

volunteer

The proportion of rerspondents who say they would volunteer increases by 10 percentage points with economic incentives. A simple t-test for difference in group means gives $p<0.05$ .[1] Again, no motivation crowding in sight.

What now?

What makes it so that the effects are not only not repeated here, but that a very conventional, strong economic effect was found instead? I can see four possible explanations.

One explanation could be that MTurkers are already motivated by economic incentives to participate in the survey at all, and thus had their pro-social motivations crowded out at the outset. If this explanation carries any weight, it would be a potentially big problem for quite a lot of survey research that investigates questions involving any type of pro-social motivation or altruism.

Another explanation could be that the hypothetical nature of the scenarios fails to induce pro-social motivation in the first place. Although the respondents were asked to imagine that it was real, this simply may not translate to externally valid results.

A third explanation could be that the differences in design, particularly with regards to the second experiment, makes all the difference. In this case, as stated, it is not the level of effort that is measured (unlike the original study) but rather whether they would participate at all. Additionally, my respondents were told they would get a fixed hourly wage instead of a certain percentage of their collected money.

Or – perhaps – motivation crowding as a phenomenon is simply more rare, or more brittle, than the published literature so far suggests. Publication bias is a nasty bug. I would not dare pull the whole field into question based on two shoddy MTurk experiments, but the question needs to be asked. At any rate, I would like to find out which of the above explanations for my failure to find a crowding effect is true. If reason one or two above carry any water, it brings other survey experiments, particularly with participants that are compensated with money, into question too. As such, answers to these more general questions about vignette experiments on the one hand, and compensation of respondents on the other, would be methodologically valuable for others.

[1] Additionally, when 52 respondents who failed to correctly answer an attention check were removed from the sample, the effect size increased to 18 percentage points ( $p<0.001$ ), but as always this type of per protocol comparison is liable to be biased.

2 thoughts on “Some failed replications: should we blame motivation crowding theory or MTurk?”

spamgirl says:

April 11, 2016 at 1:50 pm

I would love to see your scenarios replicated in other pools – undergrads who are paid to participate or not, Prolific Academic, etc. I have a feeling that the fact that Turkers are labourers, not volunteer participants, exerts a bias on the data that few are actively investigating, but the role of which needs to be known to determine whether outcomes are representative or not. When in the past has the entire subject pool been financially dependent on the compensation received in exchange for study participation?

LikeLiked by 1 person

1. RAhlskog says:
  
  April 11, 2016 at 2:09 pm
  
  Precisely! This paper (http://nsse.indiana.edu/pdf/presentations/2015/AIR_Forum_2015_Cole_Sarraf_Wang_paper.pdf) had a few good references, but it still seems unclear whether incentives change motivations in a way that carries over to fictional scenarios in the survey. I might dig more into that when I get the chance!
  
  LikeLike