Grade Delusion

Posted on September 4, 2012

There is a terrible failure in reasoning amongst politicians and the media that exam grades and standards have a causal link, that somehow a rise or fall in grades denotes the opposite movement in standards. Not only is this reasoning flawed and the argument both invalid and unsound but the premise that grade inflation or deflation are somehow critical is irrelevant.

The media have been whipping up a storm about grade inflation for years now. As more and more students obtain higher grades they’ve deduced that the only reasonable cause is that exams are getting easier. Apart from the fact that correlation does not imply causation grades are not even a measurement of difficulty anyway. Grades are orthogonal to difficulty. Two people could take the same exam with the same difficulty of questions and give the same answers to those same questions. The difficulty of the individual questions is fixed. However, you could use two different marking schemes and the two different individuals would get different grades. This could be achieved by simply adjusting a single variable, such as the grade boundaries. This is, essentially, what Ofqual did.

Where grades relate to difficulty is in terms of the difficulty in obtaining the grade, not in the difficulty of the exam itself. So, reusing our above example, the questions are the same level of difficulty, but by moving the grade boundaries up or down we make it harder or easier to achieve a particular grade. This is an important distinction as what this means is you can have an easy exam where it is hard to get a top grade (because you’d have to get 100%) or a hard exam where it is easy to get a top grade (because you’d only need 5%).

Incidentally this relationship is one of the many reasons why fixed percentages in grade boundaries (that is to say top 10% get A, top 20% get B etc.) are fundamentally flawed. If you change the difficulty of the exam the grades move in a way that makes it appear that the two exams are equal in difficulty and this breaks the critical purpose of the grading system: to compare peers.

Ofqual really, really cocked up when they moved the grade boundaries suddenly. They should know better, this is their business, they need their heads placed on a chopping block. The single most crucial purpose of exam grades (grades, not exams) is to enable the comparison of peers. That is to say I advertise a job for someone with nine GCSEs, some recently qualified students apply for the job, and I need to be able to tell which one is better qualified. I will use their grades to compare them. Ofqual screwed that up, they broke the grades. If I had two people who took their exams in Spring 2012 and two who took them before I would not be able to compare their grades side by side. The same would be true for University, College or Sixth Form applicants.

What Ofqual forgot was that peers are compared across a time period. So two applicants who took their exams a year apart need to be distinguishable. This is why there is a preference for calculating grades by attempting to keep the exams at the same level of difficulty and fixing the boundaries around marks. This way, students who sit two separate exams at different times can be compared. Using fixed percentages breaks this, moving the grade boundaries breaks this.

The problem is there is a risk of grade creep. The only way of protecting against grade inflation is to make performance something both measurable and repeatable. Because we can do the former people make the mistake of believing we have the later which is not the case. Athletics is a good example of where both can be achieved: an indoor 100m sprint, for example, where the biggest influence is the length of track and runners, can be run again and again fairly consistently by the runners who can then make direct comparisons between themselves and each other. But exams need to protect themselves against cheating which means that inherently they cannot be repeatable. If we can’t make exams repeatable we can’t measure difficulty in an absolute way and because of a number of statistical and rounding errors you have to accept that inherent in this system is a risk of grade inflation or deflation.

But grade inflation and deflation are irrelevant. Firstly, as already demonstrated, the grades don’t reflect the difficulty of the exam. Even if you’re trying your hardest to link the difficulty of the questions and the difficulty in obtaining the grades (apparently Gove can achieve the impossible however) it’s not the end of the world if you let a little drift in. Why? Because we don’t break the core purpose of exams: to compare peers.

Now this is where the media get’s really riled. They’ll point to the number of students who got A grades in the 1980s and the number now and say look, massive grade inflation, exams are getting easier (repeat: correlation is not causation). But it is irrelevant. No sane person is trying to compare someone’s English GSCE from nearly thirty years ago with someone’s now - apart from those students who want to gloat that they did better than their parents. It’s absolutely irrelevant. Those exam grades are only useful for a set period of time. Once your GCSE grades get you into college and you’ve done your A levels nobody cares about them anymore, likewise once your A levels get you into university and you have your degree no-one looks at them, and further more after a few years of work experience people forget your degree. Do you care what GCSEs your mechanic got? Or your lawyer or doctor? No, it’s absolutely irrelevant, they are measured on other things.

The other statistic is our position on International Tables such as the OECD/Programme for International Student Assessment (PISA). People point out that while our grades go up we fall in the tables. Again a bucket load of faulty reasoning trying to link two complex variables together and extract causation (please stop doing this, it’s getting boring). Apart from the fact we could be getting better, just not as fast as everyone else, or that the criteria is biased toward certain education systems or a host of other causes table positions are a stupid measure of performance. Here’s an example: in the 2012 Olympic 100m mens final, apart from Asafa Powell, all runners finished under 10 seconds. An amazing achievement: the difference between Usain Bolt, in first place and Richard Thompson in seventh was a paltry 0.35 seconds. Yet if you look at the tables Richard Thompson came second to last. This is the same error we are in danger of making by looking at positions in tables, instead we should be using more sophisticated statistical methods to determine if we are the educational difference of 0.35 seconds away from the fastest man in the world or several.

What is important is that our exams are robust, challenge our children, instil knowledge and prepare them for the future. But we must first acknowledge that grades are absolutely orthogonal to this otherwise we risk ruining our education system in pursuit of a worthless goal. At the moment it is worrying that our politicians, the media, and Ofqual and even Ofsted are obsessing over the wrong things.