Distributed Metacognition: Insights from Machine Learning and Human Distraction

by Philip Beaman, Ph.D., University of Reading, UK

Following the success of Google’s AlphaGo programme in competition with a human expert over five games, a result previously considered beyond the capabilities of mere machines (https://deepmind.com/alpha-go), there has been much interest in machine learning. Broadly speaking, machine learning comes in two forms: supervised learning (where the machine is trained by means of examples and errors it makes are corrected) or unsupervised learning (where there is no error signal to indicate previous failures). AlphaGo, as it happens, used supervised learning based upon examples of human expert-level games and it is this type of learning which looks very much like meta-cognition, even though the meta-cognitive monitoring and correction of the machine’s performance is external to the system itself, although not necessarily to the machine which is running the system. For example: an artificial neural network (perhaps of the kind which underpins AlphaGo) is trained to output Y when presented with X by means of a programme which stores training examples – and calculates the error signal from the neural network’s first attempts – outside the neural network software itself but on the same hardware. This is of interest because it illustrates the fluid boundary between a cognitive system (the neural network implemented on computer hardware) and its environment (other programmes running on the same hardware to support the neural network) and demonstrates that metacognition, like first-order cognition, is often a form of situated activity. Here, the monitoring and the basis for correction of performance is (like all supervised learning) external to the learning system itself.

In contrast, when psychologists talk about metacognition, we tend to assume that all the processing is going on internally (in the head), whereas in fact it is usually only partly in the head and partly in the world. This is not news to educationalists or to technologists: learners are encouraged to make effective use of external aids which help manage work and thought, but external aids to cognition are often overlooked by psychological theories and investigations. This was not always the case. In the book “Plans and the Structure of Behaviour” which introduced the term “working memory” to psychology, Miller, Galantner and Pribram (1960) spoke of working memory as being a “special state or place” used to track the execution of plans where the place could be in the frontal lobes of the brain (a prescient suggestion for the time!) or “on a sheet of paper”. This concept that was originally defined wholly functionally has, in subsequent years, morphed into a cognitive structure with a specific locus, or loci, of neural activity (e.g., Baddeley, 2007; D’Esposito, 2007; Henson, 2001; Smith, 2000).

We have come across the issue of distributed metacognition in our own work on auditory distraction. For many years, our lab (along with several others) collected and reported data on the disruptive effects of noise on human cognition and performance. We carefully delineated the types of noise which cause distraction and the tasks which were most sensitive to distraction but – at least until recently – neither we nor (so far as we know) anyone else gave any thought to meta-cognitive strategies which might be employed to reduce distraction outside the laboratory setting. Our experiments all involved standardized presentation schedules of material for later recall and imposed environmental noise (usually over headphones) which participants were told to ignore but which they could not avoid. The results of recent studies which both asked participants for their judgments of learning (JoLs) concerning the material and gave them the opportunity to control their own learning or recall strategy (e.g., Beaman, Hanczakowski & Jones, 2014) are of considerable interest. Theoretically, one of three things might happen: meta-cognition might not influence ability to resist distraction in any way, meta-cognitive control strategies might ameliorate the effects of distraction, or meta-cognition might itself be affected by distraction potentially escalating the disruptive effects. For now, let’s focus on the meta-cognitive monitoring judgments since these need to be reasonably accurate in order for people to have any idea that distraction is happening and that counter-measures might be necessary.

One thing we found was that people’s judgments of their own learning was fairly well-calibrated, with judgements of recall in the quiet and noise conditions mirroring the actual memory data. This is not a surprise because earlier studies, including one by Ellermeier and Zimmer (1997) also showed that , when asked to judge their confidence in their memory, people are aware of when noise is likely to detract from their learning. What is of interest, though, is where this insight comes from. No feedback was given after the memory test (i.e., in neural network terms this was not supervised learning) so it isn’t that participants were able to compare their memory performance in the various conditions to the correct answers. Ellermeier and Zimmer (1997) included in their study a measure of participants’ confidence in their abilities before they ever took the test and this measure was less well calibrated with actual performance so this successful metacognitive monitoring does seem to be dependent upon recent experience with these particular distractors and the particular memory test used, rather than being drawn from general knowledge or past experience. What then is the source of the information used to monitor memory accuracy (and hence the effects of auditory distraction on memory)? In our studies, the same participants experienced learning trials in noise and in quiet in the same sessions and the lists of items they were required to try and recall were always of the same set length and recalled by means of entering into a physical device (either writing or typing responses). Meta-cognitive monitoring, in other words, could be achieved in many of our experiments by learning the approximate length of the list to be recalled and comparing the physical record of number of items recalled with this learned number on a trial-by-trial basis. This kind of meta-cognitive monitoring is very much distributed because it relies upon the physical record of the number of items recalled on each trial to make the appropriate comparison. Is there any evidence that something like this is actually happening? An (as yet unpublished) experiment of ours provides a tantalising hint: If you ask people to write down the words they recall but give one group a standard pen to do so and another group a pen which is filled with invisible ink (so both groups are writing their recall, but only one is able to see the results) then it appears that monitoring is impaired in the latter case – suggesting (perhaps) that meta-cognition under distraction benefits from distributing some of the relevant knowledge away from the head and into the world.

References:

Baddeley, A. D. (2007). Working memory, thought and action. Oxford: Oxford University Press.

Beaman, C. P., Hanczakowski, M., & Jones, D. M. (2014). The effects of distraction on metacognition and metacognition on distraction: Evidence from recognition memory. Frontiers in Psychology, 5, 439.

D’Esposito, M. (2007) From cognitive to neural models of working memory. Philosophical Transactions of the Royal Society B: Biological Sciences, 362, 761-772.

Ellermeier, W. & Zimmer, K. (1997). Individual differences in susceptibility to the “irrelevant sound effect” Journal of the Acoustical Society of America, 102, 2191-2199.

Henson, R. N. A. (2001). Neural working memory. In: J. Andrade (Ed.) Working memory in perspective. Hove: Psychology Press.

Miller, G. A., Galanter, E. & Pribram, K. H. (1960). Plans and the structure of behavior. New York: Holt.

Smith, E. E. (2000). Neural bases of human working memory. Current Directions in Psychological Science, 9, 45-49.