September 1, 2022 - Improve with Metacognition

by Steven Fleisher, California State University
Michael Roberts, DePauw University
Michelle Mason, University of Wyoming
Lauren Scharff, U. S. Air Force Academy
Ed Nuhfer, Guest Editor, California State University (retired)

Self-assessment measures and categorizing mindset preference both employ self-reported metacognitive responses that produce noisy data. Interpreting noisy data poses difficulties and generates peer-reviewed papers with conflicting results. Some published peer-review works question the legitimacy and value of self-assessment and mindset.

Yeager and Dweck (2020) communicate frustration when other scholars deprecate mindset and claim it makes no difference under what mindset students pursue education. Indeed, that seems similar to arguing that enjoyment of education and students’ attitudes toward it makes no difference in the quality of their education.

We empathize with that frustration when we recall our own from seeing in class after class that our students were not “unskilled and unaware of it” and reporting those observations while a dominant consensus that “Students can’t self-assess” proliferated. The fallout that followed from our advocacy in our workplaces (mentioned in Part 2 of the entries on privilege) came with opinions that since “the empiricists have spoken,” there was no reason we should study self-assessment further. Nevertheless, we found good reason to do so. Some of our findings might serve as an analogy to demonstrating the value of mindsets despite the criticisms being leveled against them.

How self-assessment research became a study of mindset

In the summer of 2019, the guest editor and the first author of this entry taught two summer workshops on metacognition and learning at CSU Channel Islands to nearly 60 Bridge students about to begin their college experience. We employed a knowledge survey for the weeklong program, and the students also took the paired-measures Science Literacy Concept Inventory (SLCI). Students had the option of furnishing an email address if they wanted a feedback letter. About 20% declined feedback, and their mean score was 14 points lower (significant at the 99.9% confidence level) than those who requested feedback.

In revisiting our national database, we found that every campus revealed a similar significant split in performance. It mattered not whether the institution was open admissions or highly selective; the mean score of the majority who requested feedback (about 75%) was always significantly higher than those who declined feedback. We wondered if the responses served as an unconventional diagnosis of Dweck’s mindset preference.

Conventional mindset diagnosis employs a battery of agree-disagree queries to determine mindset inclination. Co-author Michael Roberts suggested we add a few mindset items on the SLCI, and Steven Fleisher selected three items from Dweck’s survey battery. After a few hundred student participants revealed only a marginal definitive relationship between mindset diagnosed by these items and SLCI scores, Steve increased our items to five.

Who operates in fixed, and who operates in growth mindsets?

The personal act of choosing to receive or avoid feedback to a concept inventory offers a delineator to classify mindset preference that differs from the usual method of doing so through a survey of agree-disagree queries. We compare here the mindset preferences of 1734 undergraduates from ten institutions using (a) feedback choice and (b) the five agree-disagree mindset survey items that are now part of Version 7.1a of the SLCI. That version has been in use for about two years.

We start by comparing the two groups’ demonstrable competence measured by the SLCI. Both methods of sorting participants into fixed or growth mindset preferences confirmed a highly significant (99.9% confidence) greater cognitive competence in the growth mindset disposition (Figure 1A). As shown in the Figure, feedback choice created two groups of fixed and growth mindsets whose mean SLCI competency scores differed by 12 percentage points (ppts). In contrast, the agree-disagree survey items defined the two groups’ means as separated by only 4 ppts. However, the two methods split the student populace differently, with the feedback choice determining that about 20% of the students operated in the fixed mindset. In contrast, the agree-disagree items approach determined that nearly 50% were operating in that mindset.

We next compare the mean self-assessment accuracy of the two mindsets. In a graph, it is easy to compare mean skills between groups by comparing the scatter shown by one standard deviation (1 Sigma) above and below the means of each group (Figure 1B). The group members’ scatter in overestimating or underestimating their actual scores reveals a group’s developed capacity for self-assessment accuracy. Groups of novices show a larger scatter in their group’s miscalibrations than do groups of those with better self-assessment skills (see Figure 3 of resource at this link).

Figure 1. A. Comparisons of competence (SLCI scores) of 1734 undergraduates between growth mindset participants (color-coded blue) and fixed mindset participants (color-coded red) mindsets as deduced by two methods: (left) agree-disagree survey items and (right) acceptance or opting-out or receiving feedback. “B” displays the measures of demonstrated competence spreads of one standard deviation (1 Sigma) in growth (blue) and fixed mindset (red) groups as deduced by the two methods. The thin black line at 0 marks a perfect self-assessment rating of 0, above which lie overconfident estimates and below which lie underconfident estimates in miscalibrations of self-assessed accuracy. The smaller the standard deviation revealed by the height of the rectangles in 2B, the better the group’s ability to self-assess accurately. Differences shown in A of 4 and 12 ppts and B of 2.3 and 3.5 ppts are differences between means.

On average, students classified as operating in a growth mindset have better-calibrated self-assessment skills (less spread of over- and underconfidence) than those operating in a fixed mindset by either classification method (Figure 1B). However, the difference between fixed and growth was greater and more statistically significant when mindset was classified by feedback choice (99% confidence) rather than by the agree-disagree questions (95% confidence).

Overall, Figure 1 supports Dweck and others advocating for the value of a growth mindset as an asset to learning. We urge contextual awareness by referring readers to Figure 1 of Part 1 of this two-part thematic blog on self-assessment and mindset. We have demonstrated that choosing to receive or decline feedback is a powerful indicator of cognitive competence and at least a moderate indicator of metacognitive self-assessment skills. Still, classifying people into mindset categories by feedback choice addresses only one of the four tendencies of mindset shown in that Figure. Nevertheless, employing a more focused delineator of mindset preference (e.g., choice of feedback) may help to resolve the contradictory findings reported between mindset type and learning achievement.

At this point, we have developed the connections between self-assessment, mindset, and feedback we believe are most valuable to the readers of the IwM blog. Going deeper is primarily of value to those researching mindset. For them, we include an online link to an Appendix to this Part 2 after the References, and the guest editor offers access to SLCI Version 7.1a to researchers who would like to use it in parallel with their investigations.

Takeaways and future direction

Studies of self-assessment and mindset inform one another. The discovery of one’s mindset and gaining self-assessment accuracy require knowing self, and knowing self requires metacognitive reflection. Content learning provides the opportunity for developing the understanding of self by practicing for self-assessment accuracy and acquiring the feeling of knowing while struggling to master the content. Learning content without using it to know self squanders immense opportunities.

The authors of this entry have nearly completed a separate stand-alone article for a follow-up in IwM that focuses on using metacognitive reflection by instructors and students to develop a growth mindset.

References

Dweck, C. S. (2006). Mindset: The new psychology of success. New York: Random House.

Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112. https://doi.org/10.3102/003465430298487

by Ed Nuhfer, Guest Editor, California State University (retired)
Steven Fleisher, California State University
Michael Roberts, DePauw University
Michelle Mason, University of Wyoming
Lauren Scharff, U. S. Air Force Academy
Ed Nuhfer, Guest Editor, California State University (retired)

This Appendix stresses numeracy and employs a dataset of 1734 participants from ten institutions to produce measures of cognitive competence, self-assessed competence, self-assessment accuracy, and mindset categorization. The database is sufficient to address essential issues introduced in our blogs.

Finding replicable relationships in noisy data employs groups from a database collected from instruments proven to produce high-reliability measures. (See Figure 10 at this link.). If we assemble groups, say, groups of 50, as shown in Figure 1 B, we can attenuate the random noise in individuals’ responses (Fig. 1A) and produce a clearer picture of the signal hidden within the noise (Fig. 1B).

Figure 1 Raw data person-by-person on over 9800 participants (Fig. 1 A) shows a highly significant correlation between measures of actual competence from SLCI scores and postdicted self-assessed competence ratings. Aggregating the data into over 180 groups of 50 (Fig. 1 B) reduces random noise and clarifies the relationship.

Random noise is not simply an inconvenience. In certain graphic types, random noise generates patterns that do not intuitively appear random. Researchers easily interpret these noise patterns as products of a human behavior signal. The “Dunning-Kruger effect” appears built on many researchers doing that for over twenty years.

Preventing confusing noise with signal requires knowing what randomness looks like. Researchers can achieve this by ensuring that the surveys and test instruments used in any behavioral science study have high reliability and then constructing a simulated dataset by completing these instruments with random number responses. The simulated population should equal that of the participants in the research study, and graphing the simulated study should employ the same graphics researchers intend to present the participants’ data in a publication.

The 1734 participants addressed in Parts 1 and 2 of this blog’s theme pair on mindset are part of the larger dataset represented in Figure 1. The number is smaller than 9800 because we only recently added mindset questions.

The blog containing this Appendix link showed the two methods of classifying mindset as consistent in designating growth mindset as associated with higher scores on cognitive measures and more accurate self-assessments. However, this finding does not directly test how the two classification methods are related to one another. The fact noted in the blog that the two methods classified people differently indicated a reason to anticipate that the two may not prove to be directly statistically related.

We need to employ groups to attenuate noise, and ideally, we want large groups with good prospects of a spread of values. We first picked the groups associated with furnishing information about privilege (Table 1) because these are groups large enough to attenuate random noise. Further, the groups displayed highly significant statistical spreads when we looked at self-assessed and demonstrable competence within these categories. Note well: we are not trying to study privilege aspects here. Our objective, for now, is to understand the relationship between mindset defined by agree-disagree items and mindset defined by requests for feedback.

We have aggregated our data in Table 1 from four parameters to yield eight paired measures and are ready to test for relationships. Because we already know the relationship between self-assessed competence and demonstrated competence, we can verify whether our existing dataset of 1734 participants presented in 8 paired measures groups is sufficient to deduce the relationship we already know. Looking at self-assessment serves as a calibration to help answer, “How good is our dataset likely going to be for distinguishing the unknown relationships we seek about mindset?”

Table 1. Mindset and self-assessment indicators by large groups. The table reveals each group’s mindset composition derived from both survey items and feedback and the populace size of each group.

Figure 2 shows that our dataset in Table 1 proved adequate in capturing the known significant relationship between self-assessed competence and demonstrated competence (Fig. 2A). The fit-line slope and intercept in Figure 2A reproduce the relationship established from much larger amounts of data (Fig. 1 B). However, the dataset did not confirm a significant relationship between the results generated by the two methods of categorizing people into mindsets (Fig. 2B).

In Figure 2B, there is little spread. The plotted points and the correlation are close to significant. Nevertheless, the spread clustered so tightly that we are apprehensive that the linear relationship would replicate in a future study of a different populace. Because we chose categories with a large populace and large spreads, more data entered into these categories probably would not change the relationships in Figure 2A or 2B. More data might bump the correlation in Figure 2B into significance. However, this could be more a consequence of the spread of the categories chosen for Table 1 than a product of a tight direct relationship between the two methods employed to categorize mindset. However, we can resolve this by doing something analogous to producing the graph in Figure 1B above.

Figure 2. Relationships between self-assessed competence and demonstrated competence (A) and growth mindset diagnosed by survey items and requests for feedback (B). The data graphed is from Table 1.

We next place the same participants from Table 1 into different groups and thereby remove the spread advantages conferred by the groups in Table 1. We randomize the participants to get a good mix of the populace from the ten schools, sort the randomized data by class rank to be consistent with the process used to produce Figure 1B and aggregate them into groups of 100 (Table 2).

Table 2. 1700 students are randomized into groups of 100, and the means are shown for four categories for each group.

The results employing different participant groupings appear in Figure 3. Figure 3A confirms that the different groupings in Table 2 attenuate the spread introduced by the groups in Table 1.

Figure 3. The data graphed is from Table 2. Relationships between self-assessed competence and demonstrated competence appear in (A). In (B), plotting classified by agree-disagree survey items versus mindset classified by requesting or opting out of feedback fails to replicate the pattern shown in Figure 2 B

The matched pairs of self-assessed competence and demonstrable competence continue in Figure 3A to reproduce a consistent line-fit that despite diminished correlation that still attains significance like Figures 1B and 2A.

In contrast, the ability to show replication between the two methods for categorizing mindsets has completely broken down. Figure 2B shows a very different relationship from that displayed in 1B. Deducing the direct relationship between the two methods of categorizing mindset proves not replicable across different groups.

To allow readers who may wish to try different groupings, we have provided the raw dataset used for this Appendix that can be downloaded from https://profcamp.tripod.com/iwmmindsetblogdata.xls.

Takeaways

The two methods of categorizing mindset, in general, designate growth mindset as associated with higher scores on tests of cognitive competence and, to a lesser extent, better self-assessment accuracy. However, the two methods do not show a direct relationship with each other. This indicates the two are addressing different dimensions of the multidimensional character of “mindsets.”

Improve with Metacognition

Day: September 1, 2022

Metacognition and Mindset for Growth and Success: Part 2 – Documenting Self-Assessment and Mindset as Connected

How self-assessment research became a study of mindset

Who operates in fixed, and who operates in growth mindsets?

Takeaways and future direction

References

Metacognition and Mindset for Growth and Success: APPENDIX to Part 2 – Documenting Self-Assessment and Mindset as Connected