What Works Growth helps to make local growth policy more cost effective. Improving the use of evidence in policymaking is a key element of achieving this, with a particular focus on impact evaluation evidence.
Impact evaluation examines whether a policy, programme or project had an impact on an outcome. It is the only type of evidence that can establish whether the outcomes are due to the intervention (known as ‘causality’). Impact evaluation evidence helps us understand which policies are effective and ensures past experience is incorporated into future policymaking.
This webpage sets out why it is important to use evidence from impact evaluations, how they can be used, and some evaluation concepts that will help you understand evaluation findings. It also explains why evidence reviews are important and outlines the What Works Growth resources that can help with developing policy.
Why use impact evaluation?
When tackling a policy goal, it is important to choose policies that are likely to achieve it cost effectively. For example, if the aim is to increase employment within a local authority area, we want to be confident that the intervention being considered will achieve this. Ineffective policies waste money and undermine public confidence in approaches that could be effective. Impact evaluation is the only type of evidence that can establish effectiveness.
Causation: Impact evaluation is the only type of evaluation that can answer the question ‘did it work?’ – i.e. did the policy cause the change in outcomes. Other types of evaluation can answer other questions. For example, process evaluation can assess why the policy worked (or not).
Comparison: Impact evaluation can answer this question because it uses a comparison group. The people, businesses or places who benefited from a policy are compared with those who did not. The difference between these groups is used to understand the difference made by the policy.
Calibre: Impact evaluation can only answer the causation question – ‘did it work’ – if it is high quality and uses a well-chosen comparison group. That’s why What Works Growth evidence resources only draw on research which meets our quality standards. These are discussed in more detail later.
More information on impact evaluation is available here.
How can impact evaluation evidence be used?
To make decisions about an existing policy
Impact evaluation can be used to understand whether a policy is delivering its intended outcomes. This can inform decisions about the policy including whether funding should continue or the scale or focus should change. For example, if the evaluation finds your policy is not affecting the intended outcome, you may wish to stop funding. If the effect is only positive for one sub-group, then you may wish to scale back and focus on this group only and try different approaches for other groups.
This requires commissioning an impact evaluation. What Works Growth has published a range of ‘how to evaluate’ resources to help with evaluating policies. We also offer training and one-to-one support.
To inform new policies
When identifying and appraising different options for achieving a policy goal, reviewing the findings from existing impact evaluations can help you understand the potential effectiveness of the different options. This relies on previous policies having been evaluated and the result being available. Ideally, you should draw on all relevant evaluations and avoid ‘cherry-picking’. Below we discuss how evidence reviews can help.
Impact evaluation evidence can be used to develop a theory of change for the policy, setting out how the inputs and activities are expected to lead to outputs, outcomes, and impacts. One way of setting out the theory of change is a logic model. What Works Growth has published a guide to using logic models for local growth policy and offer training on using logic models.
Some evaluation concepts
When reviewing an impact evaluation, it is useful to understand what makes a good evaluation. This section outlines some concepts that should help.
Internal validity
An impact evaluation has internal validity if it accurately measures the causal relationship between the intervention and outcome. The highest quality impact evaluations, randomised controlled trials, are high in internal validity because randomisation ensures the control group is similar to the treatment group, meaning that the only difference between the groups is the intervention. The more potential explanations for the difference between outcomes for the treated and comparison groups, the lower the internal validity of the evaluation. Our guide to scoring the evidence orders different evaluation methods in terms of their internal validity, and should help with assessing the robustness of any evaluation you are using.
External validity
An impact evaluation has external validity if its findings can be applied to other settings, contexts or populations. Another term for this is generalisability. External validity requires the treated and comparison groups to be representative of the population you would like to extrapolate the results to. For example, the findings of an evaluation of an employment training programme to help unemployed young people (16 to 24 years old) move into work might not generalise to interventions targeted at helping older people (over 50 year olds) as the employment barriers faced by the two groups are different.
Having multiple impact evaluations of an intervention, in different settings and with different populations provides greater confidence about its effectiveness. Evidence reviews (see below), systematic reviews, and meta-evaluations synthetise findings across many impact evaluations. When using evaluation evidence to develop policy (most commonly during the appraisal stage), always consider how similar the context for the evaluated intervention is to your scenario. Place greater weight on evaluations from similar contexts.
Statistical significance
Impact evaluation uses statistical analysis. Having a larger number of units of observation (the people, businesses or areas for which data on outcomes is collected) in the treated and comparison groups increases the likelihood that if there is an effect, it will be detectable. The more units of observation you have, the more robust your findings. If the numbers are low, impact evaluation normally isn’t possible (and poor value for money). If you are considering commissioning an impact evaluation of your intervention you should consider how many units of observation there are likely to be and whether this is likely to lead to statistically significant results. If you are using an existing evaluation in policy development, check the number of observations.
Statistically significant results can be attributed to the intervention with confidence, with it unlikely they are due to chance or outside factors. For example, if an impact evaluation finds receiving business advice has a statistically significant effect on employment at supported firms, that means the effect on employment is unlikely to be due to chance. Statistical significance is measured at different levels. For example, a 95 percent confidence level means the chance the finding is due to chance is 5 percent (or one in 20), while a 99 percent confidence level means it is one in 100.
Statistical significance is one of the reasons why evaluating the impact of an intervention on a large number of outcomes is discouraged. For example, if an evaluation includes 20 outcomes and uses a 95 percent confidence level, one positive finding would be expected by chance.
Be cautious of non-significant results.
Some questions to ask when reviewing an impact evaluation
About the evaluation
- What were the evaluation questions? Are they causal questions?
- Is the evaluation relevant for our scenario? Are the context and issues similar?
About the methodology
- What methodology is used? How robust is it?
- How was the comparison group selected? What assumptions were made? Is it a good comparison group?
- How has the method been implemented?
About the findings
- What are the findings? Focus on those that are statistically significant.
- Does the evaluation answer the questions without bias (internal validity)?
- Can the results be generalised (external validity)?
Using impact evaluation evidence reviews
What are evidence reviews?
An evidence review is a systematic summary of the available evidence on a topic. It attempts to find all published evidence related to a specific research or policy question that meet given criteria, using search methodologies designed to be transparent, unbiased, and reproducible. Within this, one type of evidence review focuses specifically on impact evaluations that use causal methods.
Why use evidence reviews summarising multiple impact evaluations?
Using multiple impact evaluations provides a wider evidence base than looking at a single evaluation. For example, if only one evaluation shows positive effects on an outcome (such as employment), you may have concerns about whether it should work for you. A review covering 20 evaluations, should increase your confidence (assuming most evaluations found positive effects). Even if findings of the evaluations included in the review varied, this gives a more realistic view of likely outcomes and may provide insights into the when the intervention tended to be most effective (for example, for different groups or in different contexts).
As evidence reviews should include all evaluations that meet clearly specified selection criteria, this gives greater confidence that your policy is being based on the total evidence base rather than ‘cherry picking’ the evaluations that best support the case being made. Evidence reviews are also an efficient way to find information on previous successes and failures. This can help you develop policies based on what has worked in similar contexts and avoid those that have not.
Although good evidence reviews take a systematic, transparent and neutral approach, they can only find evaluations that have been undertaken and published. The systematic exclusion of studies showing particular results (especially negative results or no effects) is known as ‘publication bias’. Publication bias means that even though an evidence review includes all published studies, it may not accurately reflect the effectiveness of the policy. Consider publication bias when using evidence reviews for policy development. Even when reviews cover all available evaluations, be cautious of results based on one or a handful of studies.
Finally, whilst evidence from evaluations and evidence reviews is important, it should be used alongside other evidence. Policies should be informed by an accurate diagnosis of the challenges being faced, drawing on data, other research, and local knowledge.
Are evidence reviews always best?
Evidence reviews are helpful, however, if you are making decisions about an existing policy and you are able to undertake a robust impact evaluation, this will tell you whether your policy is working as intended in your context. Evidence reviews cannot do this.
What Works Growth evidence resources
What Works Growth evidence reviews
Our evidence reviews summarise the evaluation evidence on local growth policies. For example, our business advice evidence review summarises all the impact evaluations of business advice interventions in OECD countries that met our minimum standard and were published in English. Our reviews focus on broad policy areas – such as apprenticeships, business advice, employment training, innovation, or transport. Since 2020, What Works Growth have moved to publishing rapid evidence reviews. These meet the same standards as evidence reviews but involve a less intensive search process.
We use a modified version of the Maryland Scientific Methods Scale (SMS) to assess the robustness of impact evaluations. SMS 5 is the highest score, given to the most robust impact evaluations, whilst SMS 1 is the lowest. Scores reflect the extent to which the method deals with selection bias and how it has been implemented. Our guide to scoring the evidence sets out why we place more importance on some evidence, and explains what makes different evaluation methods more or less robust and thus more useful for understanding policy effectiveness. Our evidence reviews include any relevant evaluation that scores SMS 3 or above. Some rapid evidence reviews that are focused on policies where there is a more limited evidence base also include studies that score SMS 2.
The findings within evidence reviews are organised by economic outcomes, such as employment, wages, and productivity. Outcomes are categorised as:
- Positive – where the outcome is in the direction intended and statistically significant.
- Negative – where the outcome is opposite to the direction intended and statistically significant.
- No effect (or zero) – where there are no statistically significant effects.
- Mixed – where there is variation in findings (for example, the effect of a training course on likelihood of being in employment is positive for men but negative for women, or is positive in the short-term but has no effect over the medium- to long-term).
For example, in our business advice evidence review 17 studies look at the impact on employment, with six finding positive effects (i.e. the intervention led to an increase in employment), eight findings no effects, and three mixed effects.
Our evidence reviews also set out ‘best bets’ – the approaches that have performed most strongly based on the best available impact evaluations. They do not address the specifics of ‘what works where’ or ‘what will work for a particular place, individual or business’. Detailed local knowledge and context remain crucial. Exercise caution if you decide to introduce a programme which has not worked well elsewhere. Our rapid evidence reviews do not provide ‘best bets’ but set out ‘things to consider’.
When to use the evidence reviews?
Evidence reviews should be used during policy development to help identify and appraise options. They can also help define realistic objectives and sense check the outcomes and impacts in a theory of change.
What Works Growth toolkits
Whilst our evidence reviews focus on broad policy areas (for example, business advice), our toolkits focus on more specific interventions or implementation issues. For example, within the broad area of business advice, we have toolkits that look at public advisors, mentors, subsidised consultancy, tailored support, training, accelerators, incubators, investment promotion agencies, export promotion agencies, and export credit agencies. Some examples of toolkits focused on implementation issues include toolkits on mentoring and financial incentives under apprenticeships. Some toolkits relate to issues that apply across policy areas including increasing take-up and estimating local multipliers.
As with evidence reviews, toolkits summarise the impact evaluation evidence, focusing on studies that are from OECD countries and published in English. Studies are assessed against the Maryland Scientific Methods Scale (SMS). As the focus of a toolkit is narrower than an evidence review, the number of available studies tends to be smaller, so we apply a less stringent standard than for evidence reviews, with those scoring SMS 2 or above included.
Since 2020 we have moved away from publishing toolkits, publishing rapid evidence reviews instead.
When to use the toolkits?
As with evidence reviews, toolkits should be used during policy development to help consider specific interventions or elements of policy design. The limited number of evaluations available and the lower evidence standard mean greater care should be taken when applying their findings.
What Works Growth evidence briefings
Evidence briefings provide a framework to help think through the benefits and costs of pursuing a local growth policy. Some also consider wider benefits of the policy, such as health, that may impact on local growth over the long term. They draw on findings from (rapid) evidence reviews and toolkits, as well as economic evidence and theory.
When to use the evidence briefings?
As evidence briefings provide a framework to help think through the benefits and costs of a policy, including how to quantify these benefits, they can help you decide whether or not to pursue a policy. If the policy has already been decided upon, they can help you understand the scale of impacts you might achieve and how you can maximise the benefits. Understanding potential benefits and costs should also help you think about how best to monitor and evaluate the policy.