I’ve written and reviewed tens of thousands of test questions. To state the obvious to anyone who has ever done it: Writing good test questions is difficult. It’s the reason that serious testing organizations, like those that produce the SATs and the ACTs, spend millions of dollars creating quality questions.
Those of us who work in corporate training don’t have this luxury. We often do it along with our other responsibilities or we farm it out to vendors (who are often not very good at it either). So what to do? There are standard rules for writing valid questions (anyone who wants these rules just e-mail us (firstname.lastname@example.org) and we’ll be happy to send them along), but even if you follow the rules that’s no guarantee that the questions will do what you intend them to do.
And what’s that? What does a good question do? It measures the learner’s mastery of the subject. Ideally a good question is capable of classifying learners into two groups: those who have mastered the material and those who have not. In testing circles this is called the question’s ability to “discriminate.”
So, how do you know if your questions discriminate? Fortunately, there is a simple statistic that will tell you, and Intela produces this statistic for every question you use in our assessments. The statistic is called the point-biserial correlation and here’s what it looks like in Intela:
Reading from left to right: That’s the question on the left, then its choice distribution, shown graphically (how many test takers selected each choice), followed by the percent of learners who got the question correct (the question’s difficulty level) and, finally, the point-biserial correlation in the right-most column. Here’s how to interpret it:
- Like all correlations it’s a number that is between -1 and 1.
- Numbers above zero mean that the question discriminates positively. Learners with higher exam scores tend to get this question correct and learners with lower exam scores tend to get this question wrong. It’s a good question.
- Numbers below zero mean that the question has had the opposite effect. Learners with low exam scores are getting it right and learners with high exam scores are getting it wrong. Most likely one of two things is wrong with the question: (1) the question is confusing or poorly worded or (2) you selected the incorrect correct answer when you created the question. Either way: You must discard or correct this question.
- What about point-biserial correlations at or near zero? This means the question doesn’t discriminate at all. What this most likely means is that everyone is getting this question correct. That’s not necessarily bad. It just means it’s an easy question.
At Intela we have a lot of testing expertise and we share it with clients and non-clients through our workshop on “Creating Fair, Valid and Reliable Assessments.” Let us know if you are interested.