In my last blog post I made the assertion that randomizing questions does not affect exam difficulty. In other words, if Person A gets a set of exam questions in one order and Person B gets the same questions but in a different order, I asserted that both exams will be of the same difficulty.
I also made the same claim about choice order in multiple choice questions. In other words, If Person A sees choices A, B, C, D and Person B sees the same question but with the choices in the order B, D, A, C, I asserted that the two questions were of the same difficulty.
This is important because many (most?) test developers use both the question randomization and the choice randomization features of their testing system to make it more difficult to cheat.
Imagine if this weren’t true. Then each person who took randomized exams (questions and/or choices) would have an exam with a different difficulty level. Having a single passing score would be meaningless!
One of my readers posted a comment asking me for studies to back up these assertions. I didn’t have the studies at my fingertips so I started looking through the research literature. What I found made me realize that it’s a bit more complicated than I thought.
Here’s what I found:
In this study:
The authors were not looking at performance per se but at perception of performance. (However, they did also look at performance as well.) They did find (as have other studies) that, as the title suggests, students whose test questions were ordered from Easy to Hard perceived that they had done better on the exam than students whose questions were ordered from Hard to Easy. They also included a group whose question difficulty was arranged randomly. But how did they actually perform on the test? The authors found that in one of their experiments the Easy-to-Hard group outperformed both the Hard-to-Easy group and the Randomized group (small effect). In other experiments there was no statistically significant difference in performance.
In this study:
the researchers compared tests in which items were sequenced in the order of the curriculum (S) to tests in which the items were randomized (R). The conclusion, in the authors’ words:
“… no evidence was found for superior performance on S-format tests in any of the three experiments.”
But what about the condition we are interested in — where everyone gets a randomized test?
So, let’s look at one more study:
The authors considered four combinations of randomized and partially randomized (randomized within topic) questions along with randomized and non-randomized choices. Their conclusions? In the authors’ own words:
“… we investigated the effect of randomization of the questions and possible answers on student performance and found that, as might be expected from previous empirical studies, there was no evidence for any effect.”
So, I stand by my assertion that randomization of questions and choices does not affect performance (though it can affect perception of performance).