Quantifying Metacognition — Some Numeracy behind Self-Assessment Measures

Ed Nuhfer, Retired Professor of Geology and Director of Faculty Development and Director of Educational Assessment, enuhfer@earthlink.net, 208-241-5029

Early this year, Lauren Scharff directed us to what might be one of the most influential reports on quantification of metacognition, which is Kruger and Dunning’s 1999 “Unskilled and Unaware of It: How Difficulties in Recognizing One’s Own Incompetence Lead to Inflated Self-Assessments.” In the 16 years that since elapsed, a popular belief sprung from that paper that became known as the “Dunning-Kruger effect.” Wikipedia describes the effect as a cognitive bias in which relatively unskilled individuals suffer from illusory superiority, mistakenly assessing their ability to be much higher than it really is. Wikipedia thus describes a true metacognitive handicap in a lack of ability to self-assess. I consider Kruger and Dunning (1999) as seminal because it represents what may be the first attempt to establish a way to quantify metacognitive self-assessment. Yet, as time passes, we always learn ways to improve on any good idea.

At first, quantifying the ability to self-assess seems simple. It appears that comparing a direct measure of confidence to perform taken through one instrument with a direct measure of demonstrated competence taken through another instrument should do the job nicely. For people skillful in self-assessment, the scores on both self-assessment and performance measures should be about equal. Seriously large differences can indicate underconfidence on one hand or “illusory superiority” on the other.

The Signal and the Noise

In practice, measuring self-assessment accuracy is not nearly so simple. The instruments of social science yield data consisting of the signal that expresses the relationship between our actual competency and our self-assessed feelings of competency and significant noise generated by our human error and inconsistency.

In analogy, consider signal as your favorite music on a radio station, the measuring instrument as your radio receiver, the noise as the static that intrudes on your favorite music, and the data as the actual sound mix of noise and signal that you hear. The radio signal may truly exist, but unless we construct suitable instruments to detect it, we will not be able to generate convincing evidence that the radio signal even exists. Failures can lead to the conclusions that metacognitive self-assessment is no better than random guessing.

Your personal metacognitive skill is analogous to an ability to tune to the clearest signal possible. In this case, you are “tuning in” to yourself—to your “internal radio station”—rather than tuning the instruments that can measure this signal externally. In developing self-assessment skill, you are working to attune your personal feelings of competence to reflect the clearest and most accurate self-assessment of your actual competence. Feedback from the instruments has value because they help us to see how well we have achieved the ability to self-assess accurately.

Instruments and the Data They Yield

General, global questions such as: “How would you rate your ability in math?” “How well can you express your ideas in writing?” or “How well do you understand science?” may prove to be crude, blunt self-assessment instruments. Instead of single general questions, more granular instruments like knowledge surveys that elicit multiple measures of specific information seem needed.

Because the true signal is harder to detect than often supposed, researchers need a critical mass of data to confirm the signal. Pressures to publish in academia can cause researchers to rush to publish results from small databases obtainable in a brief time rather than spending the time, sometimes years, needed to generate the database of sufficient size that can provide reproducible results.

Understanding Graphical Depictions of Data

Some graphical conventions that have become almost standard in the self-assessment literature depict ordered patterns from random noise. These patterns invite researchers to interpret the order as produced by the self-assessment signal. Graphing of nonsense data generated from random numbers in varied graphical formats can reveal what pure randomness looks like when depicted in any graphical convention. Knowing the patterns of randomness enables acquiring the numeracy needed to understand self-assessment measurements.

Some obvious questions I am anticipating follow: (1) How do I know if my instruments are capturing mainly noise or signal? (2) How can I tell when a database (either my own or one described in a peer-reviewed publication) is of sufficient size to be reproducible? (3) What are some alternatives to single global questions? (4) What kinds of graphs portray random noise as a legitimate self-assessment signal? (5) When I see a graph in a publication, how can I tell if it is mainly noise or mainly signal? (6) What kind of correlations are reasonable to expect between self-assessed competency and actual competency?

Are There Any Answers?

Getting some answers to these meaty questions requires more than a short blog post, but some help is just a click or two away. This blog directs readers to “Random Number Simulations Reveal How Random Noise Affects the Measurements and Graphical Portrayals of Self-Assessed Competency” (Numeracy, January 2016) with acknowledgments to my co-authors Christopher Cogan, Steven Fleisher, Eric Gaze and Karl Wirth for their infinite patience with me on this project. Numeracy is an open-source journal, and you can download the paper for free. Readers will likely see self-assessment literature in different ways way after reading the article.