Scratch and Win or Scratch and Lose? Immediate Feedback Assessment Technique

By Aaron S. Richmond, Ph. D., Metropolitan State University of Denver

When prepping my courses for this spring semester, I was thinking about how I often struggle with providing quick and easy feedback on quiz and exam performance to my students. I expressed this to my colleague, Dr. Anna Ropp (@AnnaRopp), and she quickly suggested that I check out Immediate Feedback Assessment Technique (IF-AT) by Epstein Educational Enterprises. When she showed me the IF-ATs, I was intrigued and thought I might as well give it a try—so I ordered some. IF-AT is used to instantaneously provide performance feedback to learners by allowing students to scratch off what they believe to be the correct answer on a multiple-choice exam, quiz or test. See Figure 1a and 1b for student examples of a completed IF-AT.  Students can find out what the incorrect or correct answer is by just scratching the chosen answer (see question 1 in Figure 1a). Students can scratch more than one answer to find the correct answer (see question 2 in Figure 1a). You may also use it as a way of providing partial credit for sequenced attempts (e.g., scratch 1 choice for full credit if correct, then scratch a second choice, and maybe a third, to get decreasing amounts of partial credit). See question 6 in Figure 1b for an example of this.  Epstein and colleagues suggest that IF-AT not only assesses student learning, but it can also teach at the same time. However, it occurred to me that this is not only an assessment and teaching tool, rather it is a great opportunity to increase metacognition.

                                                        (a)                                                (b)

Figure 1. Completed and Unscored 10-Question IF AT Completed 10-Question IF AT Student and Teacher Scored

How to Use IF-AT
Epstein and colleagues suggest that IF-AT is fair, fast, active, fun, and respectful and builds knowledge. The IF-AT scratch assessments come in 10, 25, or 50-item test options with 5 different orders of correct answers. The Epsteins suggest that IF-AT can be used in many ways. For example, they can be used for chapter tests; individual study (at home or in class); quizzes; pyramidal-sequential-process quizzing; exams; team-based and cooperative-learning; study-buddy learning; and most importantly as a feedback mechanism (see http://www.epsteineducation.com/home/about/uses.aspx website for further explanation). There are several metacognitive functions (although the Epstein’s do not couch their claims using this term) of the IF-AT. First, the Epstein’s argue that you can arrange your IF-AT so that the first question (and the immediate feedback of the correct answer) can be used in a pyramidal sequential process. That is, the correct answer to the first question is needed to answer subsequent questions as it is foundational knowledge for the remaining question. This sequential process allows the instructor and student to pin-point where the student’s knowledge of the integrated content broke down. This is implicit metacognitive modeling of a student’s metacognitive knowledge that should be made explicit. Meaning, by explaining to your students how the exam is set up, students can use cues and knowledge from previous questions and answers on the test to assist in their understanding of subsequent questions and answers. This is a key step to the learning process. Second, the IF-AT may also be used in a Team-Based way (i.e., distributed cognition) by forming groups, problem solving, and the team discovering what the correct answer is. IF-AT may also be used in dyads to allow students to discuss correct and incorrect answers. Students read a question, discuss the correct and incorrect answer, then cooperatively make a decision and receive immediate feedback. Third, IF-AT may be used to increase cognitive and metacognitive strategies. That is, by providing feedback immediately, students (if you explicitly instruct them to do so) may adjust their cognitive and metacognitive strategies for future study. For example, if a student used flashcards to study, and did poorly, they may want to adjust how they construct and use flashcards (e.g., distributed practice). Finally, and most importantly, IF-AT may improve student’s metacognitive regulation via calibration (i.e., the accuracy of knowing when you do and don’t know the answer to a question). That is, by providing immediate feedback, students may become more accurate in their judgments of knowing or even their feelings of knowing based on the feedback.

Is it Scratch to Win or is Scratch to Lose?
As described, by using the IF-AT, students get immediate feedback on whether they got the question correct, incorrect, and what is the appropriate answer. From a metacognitive perspective, this is outstanding. Students can calibrate (i.e., adjust their estimations and confidence in knowing an answer) in real-time, engage in distributed cognition, provide feedback on choice of cognitive and metacognitive strategies, can increase cognitive monitoring, and regulation control. These are all WIN, WIN, WIN, byproducts. HOWEVER, is there a down-side to instantaneously knowing you are wrong? That is, is there an emotional regulation and reactivity to IF-AT? As I have been implementing the use of the IF-AT, I have noticed (anecdotally) that about 1 in 10 students react negatively and it seems to increase their test anxiety. Presumably, the other 90% of the students love it and appreciate the feedback. Yet, what about the 10%? Does IF-AT stunt or hinder their performance? Again, my esteemed colleague Dr. Anna Ropp and I engaged in some scholarly discourse to answer this question, and Anna suggested that I make the first 3-5 questions on each IF-AT “soft-ball” questions. That is, questions that 75% of students will get correctly so that students’ fears and anxiety is remediated to some degree. Another alternative is to provide students with a copy of the test or exam and let them rank order or weight their answers (see Chris Was’ IwM Blog, 2014; on how to do this). Despite these two sound suggestions, there still may be an affective reaction that could be detrimental to student learning. To date, there has been no research to investigate this issue and there are only a hand full of well-designed studies to investigate IF-AT (e.g., Brosvic et al., 2006; Dihov et al., 2005; Epstein et al., 2002, 2003; Slepkov & Sheill, 2014). As such, more well-constructed and executed empirical research is needed to investigate this issue (Hint: all you scholars looking for a SoTL project…here’s your sign).

Concluding Thoughts and Questions for You
After investigating, reflecting on, and using IF-AT in my classroom, I think that it is a valuable assessment tool in your quiver of assessments to increase metacognition—but of course not an educational panacea. Furthermore, in my investigation of this assessment technique, (as usual), more questions popped up on the use of IF-AT. So, I will leave you with a charge and call to help me answer the questions below:

  1. Are there similar assessments that provide immediate feedback that you use? If so, are they less expensive or free?
  2. If you are using IF-AT, what is your favorite way to use it?
  3. Do you think IF-AT could cause substantial test anxiety? If so, to whom and to what level within your classes?
  4. How could you use IF-AT be used as a tool for calibration more efficiently? Or, what other ways do you think IF-AT can be used to increase metacognition?
  5. I think there are enormous opportunities for SoTL on IF-AT (e.g., the effects on calibration, distributed cognition, cognitive monitoring, conditional knowledge of strategy use, etc.), which means we all have some more work to do. J

References
Brosvic, G. M., Epstein, M. L., Dihoff, R. E., & Cook, M. J. (2006). Acquisition and retention of Esperanto: The case for error correction and immediate feedback. The Psychological Record56(2), 205.

Dihoff, R. E., Brosvic, G. M., Epstein, M. L., & Cook, M. J. (2005). Adjunctive role for immediate feedback in the acquisition and retention of mathematical fact series by elementary school students classified with mild mental retardation. The Psychological Record55(1), 39.

Epstein, M. L., Brosvic, G. M., Costner, K. L., Dihoff, R. E., & Lazarus, A. D. (2003). Effectiveness of feedback during the testing of preschool children, elementary school children, and adolescents with developmental delays. The Psychological Record53(2), 177.

Epstein, M. L., Lazarus, A. D., Calvano, T. B., & Matthews, K. A. (2002). Immediate feedback assessment technique promotes learning and corrects inaccurate first responses. The Psychological Record52(2), 187.

Slepkov, A. D., & Shiell, R. C. (2014). Comparison of integrated testlet and constructed-response question formats. Physical Review Special Topics-Physics Education Research10(2), 020120.

Was, C. (2014, August). Testing improves knowledge monitoring. Improve with Metacognition. Retrieved from https://www.improvewithmetacognition.com/testing-improves-knowledge-monitoring/


Collateral Metacognitive Damage

Why Seeing Others as “The Little Engines that Could” beats Seeing Them as “The Little Engines Who Were Unskilled and Unaware of It”

by Ed Nuhfer,Ph.D. Professor of Geology, Director of Faculty Development and Director of Educational Assessment, California State Universities (retired)

What is Self-Assessment?

At its root, self-assessment registers as an affective feeling of confidence in one’s ability to perform in the present. We can become consciously mindful of that feeling and begin to distinguish the feeling of being informed by expertise from the feeling of being uninformed. The feeling of ability to rise in the present to a challenge is generally captured by the phrase “I think I can….” Studies indicate that we can improve our metacognitive self-assessment skill with practice.

Quantifying Self-Assessment Skill

Measuring self-assessment accuracy assessment lies in quantifying the difference between a felt competence to perform and a measure of the actual competence demonstrated. However, what at first glance appears to be a calculation of simple subtraction has proven to be a nightmarish challenge to a researcher’s efforts in presenting data clearly and interpreting it accurately. I speak of this “nightmare” with personal familiarity. Some colleagues and I recently summarized different kinds of self-assessments, self-assessment’s relationship to self-efficacy, the importance of self-assessment to achievement, and the complexity of interpreting self-assessment measurements (Nuhfer and others, 2016; 2017).

Can we or can’t we do it?

The children’s story, The Little Engine that Could is a well-known story of the power of positive self-assessment. The throbbing “I think I can, I think I can…” and the success that follows offers an uplifting view of humanity’s ability to succeed. That view is close to the traits of the “Growth Mindset” of Stanford Psychologist Carol Dweck (2016). It is certainly more uplifting than an alternative title, The Little Engine that Was Unskilled and Unaware of It,” which predicts a disappointing ending to “I think I can I think I can….” The dismal idea that our possible competence is capped by what nature conferred at birth is a close analog to the traits of Dweck’s “Fixed Mindset,” which her research revealed as toxic to intellectual development.

As writers of several Improve with Metacognition blog entries have noted, “Unskilled and Unaware of It” are key words from the title of a seminal research paper (Kruger & Dunning, 1999) that offered one of the earliest credible attempts to quantify the accuracy of metacognitive self-assessment. That paper noted that some people were extremely unskilled and unaware of it. Less than a decade later, psychologists were claiming: “People are typically overly optimistic when evaluating the quality of their performance on social and intellectual tasks” (Ehrlinger and others, 2008). Today, laypersons cite the “Dunning-Kruger Effect” and often use it to label any individual or group that they dislike as “unskilled and unaware of it.” We saw the label being applied wholesale in the 2016 presidential election, not just to the candidates but also to the candidates’ supporters.

Self-assessment and vulnerability

Because self-assessment is about taking stock of ourselves rather than judging others, using the Dunning-Kruger Effect to label others is already on shaky ground. But are the odds that those we are tempted to label as “unskilled and unaware of it” likely to be correct? While the consensus in the literature of psychology seems to indicate that they are, our investigation of the numeracy underlying the consensus indicates otherwise (Nuhfer and others, 2017).

We think that nearly two decades of replicated studies that concluded that people are “…typically overly optimistic…” exhibited replication because they all relied on variants of a unique graphic introduced in the seminal paper in 1999. These graphs generate artifact patterns from both actual data and random numbers that are patterns expected from a Dunning-Kruger Effect, and the artifacts are easily mistaken for expressions of actual human self-assessment traits.

After gaining some understanding of the hazards presented by the devilish nature of self-assessment measures, our quantitative results showed that people, in general, have a surprisingly good awareness of their capabilities (Nuhfer and others, 2016, 2017). About half of our studied populace of over a thousand students and faculty accurately self-assessed their performance within ± 10 percentage points (ppts), and about two-thirds of people proved accurate within ±15 ppts. About 25% might qualify as having inadequate self-assessment skills (greater than ± 20 ppts), but only about 5% of our academic populace might merit the label “unskilled and unaware of it” (overestimated their abilities by 30 ppts or more). Odds seem high against a randomly selected person being seriously “unskilled and unaware of it” and are very high against this label being validly applicable to a group.

Others often rise to the expectations we have of them.

Consider the collective effects of people’s accepting beliefs about themselves and others as “unskilled and unaware of it.” This negative perspective can predispose an organization to accept, as a given, that people are less capable than they really are. Further, for those of us with power, such as instructors over students or tenured peers over untenured instructors, we should become aware of a term called “gaslighting.” In gaslighting, our negatively biased actions or comments may result in taking away the self-confidence of others who accept us as credible, trustworthy, and important to their lives. This type of influence can lead to lower performance, thus seeming to substantiate the initial negative perspective. When gaslighting is deliberate, it constitutes a form of emotional abuse.

Aren’t you CURIOUS yet?

Wondering about your self-assessment skills and how they compare with those of novices and experts? Give yourself about 45 minutes and try the self-assessment instrument used in our research at <http://tinyurl.com/metacogselfassess>. You will receive a confidential report if you furnish your email at the end of completing that self-assessment.

Several of us, including our blog founder Lauren Scharff, will be presenting the findings and implications of our recent numeracy studies in August, at the Annual Meeting of the American Psychological Association in Washington DC. We hope some of our fellow bloggers will be able to join us there.

References

Dweck, C. (2016). Mindset: The New Psychology of Success. New York: Ballantine.

Ehrlinger J., Johnson, K., Banner M., Dunning, D., and Kruger, J. (2008). Why the unskilled are unaware: Further explorations of absent self-insight among the incompetent. Organizational Behavior and Human Decision Processes 105: 98–121. http://dx.doi.org/10.1016/j.obhdp.2007.05.002.

Kruger, J. and Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one‘s own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology 77: 1121‒ 1134. http://dx.doi.org/10.1037/0022- 3514.77.6.1121

Nuhfer, E. B., Cogan, C., Fleisher, S., Gaze, E., and Wirth, K., (2016). Random number simulations reveal how random noise affects the measurements and graphical portrayals of self-assessed competency.” Numeracy 9 (1): Article 4. http://dx.doi.org/10.5038/1936-4660.9.1.4

Nuhfer, E. B., Cogan, C., Fleisher, S., Wirth, K. and Gaze, E., (2017), “How random noise and a graphical convention subverted behavioral scientists’ explanations of self-assessment data: Numeracy underlies better alternatives. Numeracy: 10 : (1): Article 4.

<DOI: http://dx.doi.org/10.5038/1936-4660.10.1.4>