Jeff Tollefson :
(From previous issue)
Visitation to health clinics was 60% higher in participating communities than in the control group. Children in those communities also had a 23% reduction in illness and an 18% reduction in anaemia. Overnight hospital visits halved across several age ranges.
These data helped to solidify support for the programme. Now known as Prospera, it covers almost all of Mexico’s poorest citizens and has inspired similar initiatives across Latin America and into Africa.
“PROGRESA was one of the first major national programmes of its kind to get a rigorous evaluation,” says William Savedoff, who works on aid effectiveness and health policy at the Center for Global Development, a think tank in Washington DC. “Today conditional cash-transfer programmes are some of the most heavily evaluated programmes in the world, and that is I think a direct consequence of the Mexican experience.”
The idea of developing hard evidence to test public policies was bubbling up in parallel in the United States. One of the first trials began in 1994 with a small initiative to analyse the effect of supplying textbooks and uniforms as well as basic classroom improvements to a group of schools in Kenya. Economist Michael Kremer at Harvard University in Cambridge had taught in Kenya years earlier. A friend of his who worked for a non-profit group was initiating the programme, and Kremer suggested that the group roll it out as an experiment. “I didn’t necessarily expect anything to come of this,” he says.
Working with the group, Kremer collected data on students in 14 schools, half of which received the intervention. School attendance increased, but test scores did not. Similar results came from an experiment in 1995 that involved 100 schools. That trial suggested that providing textbooks had little effect on average test scores2, owing perhaps to language challenges – the textbooks were in English, which was not the native language for many students. Students who were already scoring higher than their peers, however, pulled further ahead if they had the books.
Kremer continued to run RCTs of other programmes, but it was Duflo – then a student of his – who pushed the idea into the mainstream. Duflo’s 1999 dissertation looked in part at an education initiative in Indonesia that had built 61,000 primary schools over 6 years in the 1970s. She wanted to test a common concern that such a rapid expansion would lead to a decline in the quality of education, thereby offsetting any gains. Running an experiment was impossible, but Duflo was able to use data on the differences across regions to show that the programme had, in fact, increased educational opportunities as well as wages.
This and other early work inspired Duflo to look at RCTs as a way to generate data and definitively measure the effectiveness of policies and programmes. “As soon as I had a longer time horizon and some money I started working on setting some up,” she says.
One of Duflo’s early papers3, published in 2004, capitalized on a 1993 amendment to India’s constitution that devolved more power over public investments to local councils and reserved the leadership of one-third of those councils, to be chosen at random, for women. Duflo realized that this effectively created a RCT that could test the effect of having women-led councils. In analysing the data, she found that councils led by women boosted political engagement by other women and directed investment towards issues raised by them. In some areas, women are in charge of obtaining drinking water, for instance, and councils led by women typically invested more in water infrastructure than did those run by men. “The scale of the policy and the topic were at the time unusual,” Duflo says. “It gave me a sense of the range of things that the tool could possibly cover.”
By the early 2000s, the randomistas were on the upswing. In 2002, Karlan, one of Duflo’s students, joined with her and other researchers to form Development Innovations – now known as Innovations for Poverty Action – in New Haven. The following year, Duflo co-founded what is now known as the Abdul Latif Jameel Poverty Action Lab (J-PAL) in Cambridge with fellow MIT economists Abhijit Banerjee and Sendhil Mullainathan.
The work quickly expanded, and J-PAL has now run nearly 600 evaluations in 62 countries, and trained more than 6,600 people. One of Duflo’s latest projects will revisit her dissertation on education in Indonesia, only this time with secondary schools and randomized control groups. “We will have a randomized version of a paper on the benefits to education soon I hope,” Duflo says.
One enthusiastic convert to the randomista philosophy is Rajiv Shah, a Gates Foundation official who became head of USAID in 2010. Once there he created a fund called Development Innovation Ventures (DIV) to test and scale up solutions to development problems, and he enlisted Kremer as its scientific director. The goal, Shah said, was to “move development into a new realm” through the use of evidence.
Since then DIV has invested in more than 100 development projects, and nearly half involve RCTs. One, conducted in Kenya by a pair of researchers from Georgetown University in Washington DC, tested a simple method for reducing traffic accidents that involve minibuses – collisions that Kremer calls major and increasing killers. “Two of them crash into each other, and 40 people die,” he says.
In 2008, the researchers worked with more than 1,000 drivers to place stickers on buses that urged passengers to speak up about reckless driving4. They then collected information from four major insurance companies and found that claims for serious accidents had dropped by 50% on buses with stickers compared with those without. DIV provided a grant to conduct a larger trial – which found that claims dropped by 25-33% – and a second grant of nearly $3 million to help to scale up the project throughout Kenya.
“The really big win is when developing countries, or firms or NGOs [non-governmental organizations] change their policies,” Kremer says. But one question now facing DIV is whether such a strategy – or indeed any project that proves effective in one setting – can be repackaged and deployed in other countries, where different cultural factors are at play (see Nature 523, 516-518; 2015).
Effecting policy change is the precise aim of the Global Innovation Fund, which was launched in September 2014 with $200 million over 5 years from the UK Department for International Development, USAID and others, and which follows the DIV model of rigorous testing. Interim director Jeffrey Brown, who is on loan from USAID, says that the fund has already received more than 1,800 applications for projects in 110 different countries and will be announcing its first suite of grants later this year. “We are essentially trying to become a bridge over the valley of death for good development ideas,” he says.
But such organizations still provide only a tiny fraction of the billions of dollars that are spent each year on development aid, let alone the trillions of dollars that are spent by governments on domestic social programmes. Even at lending institutions that have taken this evidence-based framework on board, the portion of investments that is covered by rigorous evaluations is small.
“The fad now is let’s pilot it, and if it works we’ll take it to scale.”
At the World Bank, which started a Development Impact Evaluation division in 2005, the number of projects receiving formal impact evaluations – through RCTs or other means – rose from fewer than 20 in 2003 to 193 in 2014, mostly covering things such as agriculture, health and education. But that still represents just 15% of the bank’s projects, says evaluation-division head Arianna Legovini, who leads a team of 23 full-time staff and has an annual budget of roughly $18 million. Although many of these evaluations more than pay for themselves over the long term, one constraint is the up-front cost: the average price of an impact evaluation is around $500,000. “If I did not have donor funding,” she says, “these studies just would not happen.”
The World Bank is trying to make the most of its resources by working directly with developing countries on implementation. More than 3,000 people have attended its workshops and training sessions since 2005, most of whom were government officials in developing countries that are receiving funds from the bank.
The bank is also making efforts to assess the impact-evaluation programme itself – although the analysis is based largely on whether payments for projects are made on time as a proxy for implementation of the initiatives. An analysis by Legovini and two of her team suggests that development projects that undergo a formal impact analysis are more likely to be implemented on time than are those that do not have evaluations, probably because of the extra attention that is given to initial set-up, roll-out and monitoring5.
This finding is good news for individual projects, but it is also a potential thorn in the side of many RCTs. Positive effects seen in a trial setting may disappear when the programme is scaled up, governments take over and all the extra attention disappears (see Nature 523, 146-148; 2015).
“The fad now is let’s pilot it, and if it works we’ll take it to scale,” says Annette Brown, who heads the Washington DC office of the International Initiative for Impact Evaluation, an organization that funds impact evaluations as well as meta-analyses of existing studies. Brown says that researchers and governments should probably conduct rigorous studies when any programme is scaled up to ensure that the results continue to hold true – just as the government in Haryana is doing now.
From a political perspective, the strongest argument in favour of well constructed RCTs – that they do not lie – may also be the biggest factor working against them. Local politicians often want to cut ribbons and release money into communities, whereas international donors, including governments and NGOs, want flagship programmes that show how they are improving the world. They do not welcome results showing that initiatives are not working. Even in Mexico, Levy says, some of the subsidies that he fought against when he created PROGRESA have regained political favour.
(To be continued)
But the randomistas have been accused of succumbing to their own biases. Some fear that their insistence on the RCT has skewed research towards smaller policy questions and given short-shrift to larger, macroeconomic questions. One example comes from Martin Ravallion. An economist at Georgetown University and a former research director at the World Bank, he cites an antipoverty programme in China that received $464 million from the bank in the 1990s. Although the programme involved road construction, housing, education, health and even conditional cash payments for poor families, a study based on data collected in 2005, 4 years after disbursement ended, found minimal average impact on citizens6. “That was the only long-term study of integrated rural development, which is the most common form of development assistance,” Ravallion says.
Yet some families did benefit, and by combining statistics with economic modelling, he and his team showed that the difference lay in basic issues, such as education level. For Ravallion, the message is that aid is best targeted at the literate poor, or more broadly at issues such as literacy. “Governments need to know these things,” he says. “They can’t just know about the subset of things that are amenable to randomization.”
To Alexis Diamond, a former student of Duflo’s who manages project evaluations at the International Finance Corporation, the private-sector development arm of the World Bank in Washington DC, the debate between the randomistas and the old-guard economists is in many ways about status and clout. The latter have spent their careers delving into ever more complex and abstract models, he says. And then “the randomistas came along and said ‘We don’t care about any of that. This is about who has a seat at the table’.”
Diamond says that he tries to strike a balance at his organization, where most evaluations still rely on a mixture of quantitative and qualitative data, including expert judgement.
Duflo shrugs off the debate and says that she is merely trying to provide government officials with the information – and tools – that they need to help them spend their money more wisely. “The best use of international aid money should be to generate evidence and lessons for national governments,” she says.
She points to a anti-pollution programme in industrial plants in the Indian state of Gujarat. Partnering with a group of US researchers, the state ran an experiment in 2009 that divided nearly 500 plants into 2 groups. Those in the control group continued with the conventional system, in which industries hire their own auditors to check compliance with pollution regulations. The others tested a scheme in which independent auditors were paid a fixed price from a common pool. The hope was that this would eliminate auditors’ fear of being black-balled for filing honest reports. And it did: independent auditors were 80% less likely to falsely give plants a passing grade, and many of the industrial plants covered by those audits responded by curbing their pollution. In January, regulators rolled out the programme across the state.
“My hope, in a best-case scenario, is that in the next ten years you are going to have many, many of these projects run as a matter of course by governments in the spaces where they want to learn,” Duflo says.
(Jeff came to Nature from Congressional Quarterly, where he covered energy, climate and the environment for two years.)