Skip to content
Blog

How to evaluate – Collect data

arrow down
chris-henry-oUMZnvYp9ao-unsplash

So far in this series of blogs, I’ve talked about the importance of starting early, of defining success, of thinking about what to evaluate and of finding a control group.

Today’s blog considers another key step – the importance of collecting data. It might seem simple, but the availability of appropriate data is often one of the biggest stumbling blocks, especially when trying to retrofit evaluation to programmes that are already well under way.

There are many points that could be discussed, but I am going to highlight three major ones.

First, you need information on the identity of programme participants. Which individuals, firms or areas have benefited from policy support? I think many people would be surprised that such information is often not readily available – even for accountability purposes if nothing else. But in the fifteen-or-so years I’ve been actively involved in thinking about the impact of government policy I’ve seen numerous occasions where such information is not being collected (even when large sums of money are involved). It can be a particular problem when second parties hand out the money on government’s behalf.

For what it is worth, my personal opinion is that such data should be gathered regardless of evaluation plans. But such data are critical to any attempt to understand the impact of policy. This information will need to be gathered and stored systematically and preferably in a way that allows it to be matched to other sources of information on participants (of which, more below). And if we truly want to get at questions of cost-effectiveness we also need to keep information on what kind of support participants have received (e.g. how much money) particularly if the kind of support can vary a lot across participants.

The second crucial data issue is to decide which data will best capture outcomes that are linked to the objectives of the programme. Thinking about how we define success should help clarify this. Once we’ve figured out the outcomes in which we are interested, there are two more things to think about:

  • we need this data before and after the programme so we can see whether there has been any measurable change in the outcome for participants;
  • we need similar before and after data for the control group that we are going to use as a comparison group for participants.

In fact, these two requirements – the use of before and after data, combined with a suitable control group – are key building blocks in our evidence reviews. Evaluations that don’t have both of these fall short of the evidence standards we set for our evidence reviews as we discuss further in our scoring guide.

The third decision that we need to make is how we will collect this data on outcomes. There are essentially two possibilities. The first is to use bespoke survey data that is collected specifically for evaluation purposes. Unfortunately such data can be expensive. It also brings a temptation to try to collect large amounts of data – on the process, on a large range of outcomes, on whether people think the programme is making a difference, etc. This then leads to large sprawling evaluation reports which consider a huge range of issues. For most local decision makers it’s hard to believe that the cost of bespoke data are justified.

I’d argue that we should focus attention on getting a smaller amount of data on key outcomes. At the end of the day, in combination with a suitable control group, information on only one or two outcomes such as changes in employment or wages allows us to answer the most important questions of all – did the policy work or is one kind of support more cost effective than another?

Another option for keeping costs down is to make much better use of secondary data. Using such data also helps address concerns about imposing a burden on the non-treated participants in the control group as the data is already being collected for other reasons (small bespoke surveys also help address that concern).

At the moment, using secondary data is easier for firms and areas than it is for individuals. For example, for firms, if we have information on firm names, company house identifiers and postcodes we can achieve very good matches with administrative data (such as the Inter-Departmental Business Register) which already provides information on key outcomes of interest (particularly employment). Even for individuals, efforts are being made to improve access to data sets – such as those held by the Department for Work and Pensions – which provide detail on a range of outcomes. Indeed, writing on our blog earlier this week, Majeed Neky described how this data – combined with a randomised control trial – will hopefully allow a group of London councils to evaluate their ‘Working Capital’ active labour market programme.

Of course, one problem with these secondary data sources is that there is usually a time lag before they are available. Whether this is a major problem will depend on the time frame over which any effects of the policy will be felt. For many local economic growth programmes, where effects are expected to be longer term, this shouldn’t be such a problem – something we’ll discuss more in our next blog.