In deeply studying assessment during the past decade, I have worked with many of the gurus. All explain the research problem of sending assessment samples to educators for scoring and receiving results that range an enormous spectrum. Same paper or test can receive range of 30-100 on 100-pt scale. Interestingly, math teachers tend to span the greatest spectrum because of practice of partial credit.
With a four to six point scale, though, quality descriptors or standards can be prepared and agreed upon prior to assessment. Reliability and validity of scoring skyrockets with well-understood, tighter scales. And can we really discern among an 83, and 85, and an 87? (Dare you to write quality descriptors for a full 100-pt scale!) Computing made 100-pt averaging vogue, but maybe this is a case of “we can, but should we?”
If you’re not into a grading revolution, perhaps we could use criterion-referenced A-B-C-D-F (not norm referenced). Several experts recommend A-B-C-NY (“Not Yet”). Or we could use the first letter of certain criterion-referenced categories, or we could use 1-4 or 1-6. As Guskey explains, there are advantages and disadvantages to any scoring/grading system, but I do hope we move further and further away from the 100-pt, average-based system.
60-60-60 #24: What if we disaggregated the single score?
60-60-60 #25: What if we used karate belts instead of averages?