As I discussed in an earlier blog post, David Card and co-authors have just published an up-date of their 2010 meta-analysis on 207 active labour market programmes. While that blog post focused on the substantive lessons emerging, this one reflects on some of the methodological issues.
The first set of points relate to the studies that they consider. In terms of geographical coverage, about 25% of their sample covers ‘Germanic’ countries (Austria, Germany and Switzerland); another quarter are ‘Nordics’ (Denmark, Finland, Norway and Sweden); just over 10% are ‘Anglo-Saxon’ (Australia, Canada, New Zealand, U.K. and U.S.); the rest are lower / middle-income countries, of which about 10% are non-OECD. In contrast, in our evidence reviews, we only look at OECD countries. There’s clearly a trade-off here. Spreading the net wider provides more studies. But at least 16% of Card et al’s studies are for non-OECD countries. We might question the relevance of these to a UK context.
Card and co-authors includes programs from as far back as 1980, though the majority of estimates are from the 1990s and early 2000s. We’ve taken a similar approach in our reviews although it’s always tricky to know how far back we should go. In practice, the quality cut-offs that we use tends to restrict the number of older studies we consider because many of them don’t meet the quality hurdle we impose (more on this below).
Card and co-authors search criteria are based on
- Snowballing – asking NBER and IZA labour market economists to suggest studies; looking for studies which cite Kluve et al 2010;
- A Google scholar search;
- A search of the NBER working paper database (strings “training”, “active”, “public sector employment”, and “search assistance”);
- Searching some online databases (Initiative for Impact Evaluation’s “Repository of Impact Evaluation Published Studies,” the online project list of the Abdul Latif Jameel Poverty Action Lab (J‐PAL), and the list of Latin American program evaluations reviewed by Ibarrarán and Rosas (2009));
- looking for IZA research fellows profiles for publications by those interested in ‘program evaluation’.
In contrast, our search approach is more structured across many databases, although we don’t use snowballing or surveys of researchers. This is partly because we think our more systematic searches should find all relevant studies, but also because we wanted our approach to be replicable by other researchers (this is the reason why our online appendices that accompany each report spell out the full list of search terms used and places search – see, for example, this list of search terms for Employment Training.
As with any systematic review, once they have identified a set of relevant papers, Card et al have to decide which papers to include in their analysis: ‘Well-documented’ studies are included if they use micro data and include a counterfactual / control group design or some kind of selection correction. These are similar quality filters to the ones we use for our evidence reviews, but unlike us they don’t try to rank the studies they do include. We use internal rankings of the quality of the study to help us interpret results – for example, to check how confident we can be about findings that are based on the strongest or weakest methods.
Overall, 20% of the studies covered by Card and co-authors use Randomised Control Trials (RCTs), and over 60% of results after 2004 use RCTs. It’s encouraging to see RCT approaches spreading in ALMP evaluation, as documented in the Card et al study. This type of policy is amenable to randomisation and experimental techniques. For example, one of our demonstrators uses an RCT setting to evaluate an extension to the Work Programme in some London boroughs.
The final important thing to note is the difference in methodological approach. Card and co-authors conduct a formal meta-analysis. This means taking each study and breaking out individual results (on specific outcomes) and then running statistical analysis on these results. One set of results use a ‘vote count’ approach where they count up the number of positive significant, insignificant and negative significant effect estimates. This is a similar to the approach that we use (although we don’t conduct a formal meta-analysis).
For the 111 studies that look at employment outcomes (specifically, the probability of employment) the meta-analysis is also able to compare effect sizes. Here, the effect size is given by the impact of the programme on the employment rate of the treatment group (divided by the standard deviation of employment in the comparison group). We don’t do this for a number of reasons. Partly because the wide range of outcomes we cover, even in a single review, means we are unable to directly compare mean effect sizes (e.g. for employment and earnings from employment training programmes). But also because we wanted a methodology that we could apply to the wide range of areas that we consider – areas where meta-analysis are currently not available.
Given that we rely on vote counts for our work, it is reassuring to see Card and co-authors find vote count is a decent proxy for effect size. As they put it: the mean effect size for significant positive estimates is relatively large and positive, the mean effect size for significant negative estimates is relatively large and negative, and the mean effect size for insignificant estimates is close to zero. That is, variation in the sign and significance of programme effects is driven by the underlying effect sizes, rather than other factors (such as variation in sampling errors).
This provides some reassurance about our decision to use vote counts for employment training. But as our work develops we will need to place more emphasis on effect sizes and the cost-effectiveness of different approaches. While we would love to be able to draw on meta-analysis to underpin that work (as is the case at some of our sister What Works Centres) there isn’t enough high quality evidence available to allow this. For now, the paper by Card and co-authors will have to serve as an example of what will, hopefully, one day be possible for other local economic growth policy areas.