Theoretical Account of Capacity Limits: Unresolved Issues

Below, I will address several fundamental theoretical questions about capacity limits. (1) Why does the capacity limit occur? (2) What is the nature of this limit: Is there a single capacity limit or are there multiple limits? (3) What are the implications of the present arguments for alternative theoretical accounts? (4) Finally, what are the boundaries of the central-capacity-limit account? An enigma in Miller (1956), regarding absolute judgments, will be touched upon to examine the potential breadth of the present framework.

4.1. Why the capacity limit? Future research must establish why there is a limit in capacity. One possible reason is that the capacity limit has become optimized through adaptive processes in evolution. Two relevant teleological arguments can be made on logical grounds. Recently, as well, arguments have been made concerning the physiological basis of storage capacity limits. Any such account must consider details of the data including the individual variability in the capacity limit estimates that have been observed as discussed in Section 3.1.3 above. These issues will be addressed in turn.

4.1.1. Teleological accounts. Several investigators have provided mathematical arguments relevant to what the most efficient size of working memory would be. Dirlam (1972) asked if there is any one chunk size that is more efficient than any other if it is to be the basis of a memory search. He assumed that STM is a multi-level, hierarchically structured system and that the search process is random. The nodes at a particular level are searched only until the correct one is found, at which time the search is confined to subnodes within that node. This process was assumed to continue until, at the lowest level of the hierarchy, the searched-for item is identified. In other words, the search was said to be self-terminating at each level of the hierarchy. Dirlam then asked what rule of chunking would minimize the expected number of total node and item accesses regardless of the number of items in the list, and calculated that the minimum would occur with an average chunk size of 3.59 items at each level of the hierarchy, in close agreement with the capacity of short-term memory that has been observed empirically in many situations (see above).

MacGregor (1987) asked a slightly different question: What is the maximal number of items for which a one-level system is more efficient than a two-level system? The consequences of both self-terminating search and exhaustive search assumptions were examined. A concrete example would help to explain how. Suppose that one received a list that included eight items. Further suppose that one had the option of representing this list either in an unorganized manner or as two higher-level chunks, each containing 4 items. With a self-terminating search method, if one had to search for a particular letter in the unorganized list, one would search on the average through 4.5 of the items (the average of the numbers 1 through 8). If one had to search through the list organized into two chunks, one would have to search on the average through 1.5 chunks to find the right chunk and then an average of 2.5 items within that chunk to find the right item, or 4.0 accesses in all. On the average, the hierarchical organization would be more efficient. With an exhaustive search method, if one had to search for a particular letter in the unorganized list, one would have to search through 8 items. For the organized list, one would need 2 searches to find the right chunk and then 4 searches to find the right item within that chunk, or 6 accesses in all. On the average, again, the organized list would be more efficient. In contrast, consider a self-terminating search for a list of 4 items that could be represented in an unorganized manner or as 2 chunks of 2 items each. The unorganized list would require an average of 2.5 searches whereas the organized list would require that 1.5 clusters and 1.5 items in that cluster be examined, for a total of 3.0 searches. In this case, the unorganized list is more efficient on average. MacGregor calculated that organizing list items into higher-level chunks is beneficial with an exhaustive or a self-terminating search when there are more than 4 or 5.83 items, respectively.

Although these theoretical findings depend on some untested assumptions (e.g., that the difficulty of search is the same at every level of a hierarchy), they do provide useful insight. The empirically observed capacity limit of about 4 chunks corresponds to what has been predicted for how many items can be advantageously processed in an ungrouped manner when the search is exhaustive (MacGregor, 1987). These theoretical and empirical limits may correspond because very rapid searches of unorganized lists are, in fact, exhaustive (Sternberg, 1966). However, slower, self-terminating searches along with more elaborate mental organization of the material also may be possible, and probably are advantageous if there is time to accomplish this mental organization. That possibility can help to explain why the empirically observed limit of about 4 chunks is close to the optimal chunk size when multiple levels of organization are permitted in a self-terminating search (Dirlam, 1972).

Another teleological analysis can be formulated on the basis of Kareev (1995). He suggested that a limited working memory is better than an unlimited one for detecting imperfect correlations between features in the environment. To take a hypothetical example, there could be a population with a 70% correlation between the height of an individual and the pitch of his or her voice. In a statistical procedure, when one uses a limited sample size to estimate the correlation (e.g., an observation of 4-8 individuals), the modal value of the observed correlation is larger than the population value. The smaller the sample size, the higher the modal value. Thus, a smaller sample size would increase the chances that a moderate correlation would be noticed at all. In human information processing, the limit in the sample size could be caused by the capacity limit of the observer's short-term memory; more samples may have been observed but the observer bases his or her perceived estimate of the correlation on only the number of examples that fit into the focus of attention at one time. Kareev, Lieberman, and Lev (1997) showed that, in fact, low-working-memory subjects were more likely to notice a population correlation of.2 -.6. In this regard, it bears mention that in the statistical sampling procedure, the modal value of the sample correlations for sample sizes of 6 and 8 were shown to be only moderately greater than the true population value (which was set at.6 or.7); but for a sample size of 4, the modal value of the sample correlations was almost 1.0. Here, then, is another reason to believe that a basic capacity limit of 4 could be advantageous. It could take a moderate correlation in the real world and turn it into a perceived strong correlation. At least, this could be advantageous to the extent that decisiveness in decision-making and definiteness in the perception of stimulus relationships are advantageous. For example, it makes sense to walk away from someone displaying traits that are moderately correlated with injurious behavior, and it makes sense to perceive that people usually say please when they are asking for a favor.

There is a strong similarity between the theoretical analysis of Kareev and earlier proposals that a large short-term memory capacity can be a liability rather than a strength in the early stages of language learning. Newport (1990) discussed a "less is more" hypothesis to explain why language learners who are developmentally immature at the time of initial learning have an advantage over more mature learners for some language constructs. An alternative to the nativist theory of language learning, this theory states that immature language learners grasp only small fragments of language at a time, which helps them to break up a complex language structure into smaller parts. Consistent with this proposal, Elman (1993) found that a computer implementation of a parallel distributed processing model of cognition learned complex language structure more easily if the short-term memory capacity of the model started out small and increased later in the learning process, rather than taking on its mature value at the beginning of learning.

Below, neurophysiological accounts of capacity limits will be reviewed. The teleological arguments still will be important to the extent that they can be seen as being consistent with the physiological mechanisms underlying capacity limits (or, better yet, motivating them).

4.1.2. Neurophysiological accounts. In recent years, a number of investigators have suggested a basis of capacity limits that can be traced back to information about how a single object is represented in the brain. In a theoretical article on visual shape recognition, Milner (1974, p. 532) suggested that "cells fired by the same figure fire together but not in synchrony with cells fired by other figures...Thus, features from a number of figures could be detected and transmitted through the network with little mutual interference, by a sort of time-sharing arrangement." In support of this hypothesis, Gray, König, Engel, and Singer (1989) in an experiment on cats, found that two columns of cortical cells that represented different portions of the visual field were active in a correlated manner only if they were stimulated by different portions of the same object, and not if they were stimulated by different objects. This led to the hypothesis that the synchronization of activity for various features represents the binding of those features to form an object in perception or STM. More recently, these findings have been extended to humans. Tiitinen et al. (1993) found that the 40-Hz oscillatory cycle, upon which these synchronizations are thought to ride, is enhanced by attention in humans. Rodriguez et al. (1999) reported electrical synchronizations between certain widely separated scalp locations 180-360 msec after a stimulus was presented when an object (a silhouetted human profile) was perceived, but not when a random field (actually an upside down profile not detected as such) was perceived. The scalp locations appeared to implicate the parietal lobes, which Cowan (1995) also proposed to be areas involved in the integration of features to form objects. Miltner, Braun, Arnold, Witte, and Taub (1999) further showed that the binding can take place not only between perceptual features, but also between a feature and an activated mental concept. Specifically, cyclic activity in the gamma (20-70 Hz) band was synchronized between several areas of the brain in the time period after the presentation of a conditioned stimulus (CS+), a color illuminating the room, but before the presentation of the unconditioned stimulus (UCS), electric shock that, as the subjects had learned, followed the conditioned stimulus. No such synchronization occurred after a different color (CS-) that did not lead to electric shock.

If objects and meaningful events can be carried in the synchronized activity of gamma wave activity in the brain, then the question for STM capacity becomes, "How many objects or events can be represented simultaneously in the brain?" Investigators have discussed that. Lisman and Idiart (1995) suggested that "each memory is stored in a different high-frequency ('40 hertz') subcycle of a low-frequency oscillation. Memory patterns repeat on each low-frequency (5 to 12 hertz) oscillation, a repetition that relies on activity dependent changes in membrane excitability rather than reverberatory circuits." In other words, the number of subcycles that fit into a low-frequency cycle would define the number of items that could be held in a capacity-limited STM. This suggestion was intended by Lisman and Idiart to motivate the existence of a memory span of about seven items (e.g., [40 subcycles / sec] / [5.7 cycles / sec] = 7 subcycles / cycle). However, it could just as well be used to motivate a basic capacity of about 4 items (e.g., [40 subcycles / sec] / [10 cycles / sec] = 4 subcycles / cycle). This proposal also was intended to account for the speed of retrieval of information stored in the capacity-limited STM but, again, just as well fits the 4-item limit. If 40 subcycles occur per second then each subcycle takes 25 msec, a fair estimate of the time it takes to search one item in STM (Sternberg, 1966). Luck and Vogel (1998) made a proposal similar to Lisman and Idiart but made it explicit that the representation of each item in STM would involve the synchronization of neural firing representing the features of the item. The STM capacity limit would occur because two sets of feature detectors that fire simultaneously produce a spurious synchronization corrupting memory by seeming to come from one object.

Other theorists (Hummel and Holyoak, 1997; Shastri and Ajjanagadde, 1993) have applied this neural synchronization principle in a way that is more abstract. It can serve as an alternative compatible with Halford et al.'s (1998) basic notion of a limit on the complexity of relations between concepts, though Halford et al. instead worked with a more symbolically based model in which "the amount of information that can be represented by a single vector is not significantly limited, but the number of vectors that can be bound in one representation of a relation is limited" (p. 821). Shastri and Ajjanagadde (1993) formulated a physiological theory of working memory very similar to Lisman and Idiart (1995), except that the theory was meant to explain "a limited-capacity dynamic working memory that temporarily holds information during an episode of reflexive reasoning" (p. 442), meaning reasoning that can be carried out "rapidly, spontaneously, and without conscious effort" (p. 418). The information was said to be held as concepts or predicates that were in the form of complex chunks; thus, it was cautioned, "Note that the activation of an entity together with all its active superconcepts counts as only one entity" (p. 443). It was remarked that the bound on the number of entities in working memory, derived from facts of neural oscillation, falls in the 7 + 2 range; but the argument was not precise enough to distinguish that from the lower estimate offered in the present paper. Hummel and Holyoak (1997) brought up similar concepts in their theory of thinking with analogies. They defined "dynamic binding" (a term that Shastri and Ajjanagadde also relied upon to describe how entities came about) as a situation in which "units representing case roles are temporarily bound to units representing the fillers of those roles" (p. 433). They estimated the limit of dynamic binding links as "between four and six" (p. 434). In both the approaches of Shastri and Ajjanagadde (1993) and Hummel and Holyoak (1997), these small limits were supplemented with data structures in long term memory or "static bindings" that appear to operate in the same manner as the long-term working memory of Ericsson and Kintsch (1995), presumably providing the "active superconcepts" that Shastri and Ajjanagadde mentioned.

One problem for the interpretation of synchronous oscillations of nervous tissue is that they can be observed even in lower animals in situations that appear to have little to do with the possibility of conscious awareness of particular stimuli (e.g., Braun, Wissing, Schäfer, & Hirsch, 1994; Kirschfeld, 1992). This, in itself, need not invalidate the role of oscillations in binding together the features of an object or the objects in a capacity-limited store in humans. It could be the case that mechanisms already present in lower animals form the basis of more advanced skills in more advanced animals, just as the voice apparatus is necessary for speech but is present even in non-speaking species. Thus, von der Malsburg (1995, p. 524) noted that "As to the binding mechanism based on temporal signal correlations, its great advantage [is] being undemanding in terms of structural requirements and consequently ubiquitously available and extremely flexible..."

4.1.3. Reconciliation of teleological and neurophysiological accounts. One concern here is whether the teleological and physiological accounts of capacity limits are consistent or inconsistent with one another. The process of scanning through the items in STM has been employed theoretically by both the teleological and the physiological theorists. For example, the teleological argument that MacGregor (1987) built using an exhaustive scan resulted in the conclusion that the scan would be most efficient if the number of items per group were 4. This conclusion was based on the assumption that the amount of time it takes to access a group to determine whether a particular item is present within it is equal to the amount of time it then takes to access each item within the appropriate group once that group is selected, so as finally to identify the probed item. This concept can be mapped directly onto the concept of the set of items (or chunks) in capacity-limited STM being represented by a single cycle of a low-frequency oscillation (5 to 12 Hz) with each item mapped onto a different cycle of a 40-Hz oscillation, riding on top of the 5 to 12 Hz oscillation. These figures are in line with the teleological data and memory capacity data reviewed above if the rate for the slow oscillation is close to about 10 Hz, so that four items would fit in each of the slower cycles. As suggested by Shastri and Ajjanagadde (1993) and others, the cyclic search process could be employed recursively. For example, at one point in a probed recognition process there could be up to 4 chunks in the capacity-limited store. Once the correct chunk is identified, the contents of STM would be replaced by the items contained within that chunk, now "unpacked," so that the contents of the chunk can be scanned in detail. In present theoretical terms, the focus of attention need not focus on multiple levels of representation at the same time.

4.1.4. What is the basis of individual differences? We will not have a good understanding of capacity limits until we are able to understand the basis of the marked developmental and individual differences in measured capacity that were observed by Cowan et al. (1999) and comparable individual differences observed in other procedures (Henderson, 1972; Luck & Vogel, personal communication, January 18, 1999). One possible basis would be individual differences in the ratio of slow to fast oscillatory rhythms. Miltner et al. found most rapid oscillatory activity at 37-43 Hz, but some residual activity at 30-37 Hz and 43-48 Hz. One can combine a 12-Hz slow cycle with a 30-Hz rapid cycle to predict the low end of the range of memory capacities (12/30 = 2.5 items), or one can combine an 8-Hz slow cycle with a 48-Hz fast cycle to predict the high end of the range (8/48 = 6 items). According to these figures, however, one would not expect the slow cycle to go below 8 Hz given the capacity limits observed empirically. Here, then, is a physiological prediction based on a combination of existing physiological and behavioral results. An important next step may be the acquisition of data that can help to evaluate the psychological plausibility of the theoretical constructs surrounding this type of theory. As one promising example, the finding of Tiitinen et al. (1993) that the 40-Hz neural cycle is enhanced by attention is consistent with the present suggestion that the fundamental storage capacity limit of about 4 items is based on the 40-Hz cycle and is in essence a limit in the capacity of the focus of attention. It is easy to see how research on this topic also could clarify the basis of individual differences in capacity. Specifically, one could determine if individual differences in oscillatory rates mirror behavioral differences in the limited storage capacity.

It remains to be explained why attended speech shows such an intriguing, simple relationship to unattended speech (Figure 4) in which attended speech is increased above the unattended speech limit by a variable amount. This figure makes it apparent that individuals use the same processes in both conditions, plus supplementary processes for attended speech. This difference might be accounted for most simply by the process of chunking (formation of inter-item associations) during attended list presentations.

It is important not to become too reductionistic in the interpretation of biological effects. It is possible that stimulus factors and/or behavioral states modulate biological cycle frequencies under some circumstances. Some studies with an automatized response or a rapid response have resulted in smaller individual differences. The highly trained subjects of Sperling (1960) appeared to produce capacity (whole report) estimates deviating from the population mean by no more than about 0.5 items, although there were few subjects. In an enumeration task in which a reaction time measure defined the subitizing range, Chi and Klahr (1975) found no difference between 5- and 6-year-olds versus adults in the subitizing range. Perhaps there is an intrinsic, baseline capacity of the focus of attention that shows few differences between individuals, and perhaps under some circumstances but not others, the level and direction of effort at the time of recall modulate that capacity. Further study of individual differences in memory capacity is thus likely to be important theoretically.

4.2. Central capacity or separate capacities? In most of the research that I have discussed, the capacity limit is examined with a coherent field of stimulation. I have not directly tackled the question of whether there is one central capacity limit or whether there are separate limits for domains of cognition (e.g., separate capacities for the visual versus auditory modalities; for verbal versus spatial representational codes; or, perhaps, for two collections of items distinguished by various other physical or semantic features). According to the models of Cowan (1988, 1995) and Engle et al. (1999), the capacity limit would be a central one (the focus of attention). Some fine points must be kept in mind on what should count as evidence for or against a central limit.

Ideally, evidence for or against a central capacity limit could be obtained by looking at the number of items recalled in two tasks, A and B, and then determining whether the total number of items recalled on a trial can be increased by presenting A and B together and adding the number of items recalled in the two tasks. For example, suppose that one can recall 3 items in Task A and 4 items in Task B, and that one can recall 6 items all together in a combined, A + B task. Although performance on the component tasks is diminished when the tasks are carried out together, the total number of items recalled is greater than for either task presented alone. This savings would serve as initial evidence for the existence of separate storage mechanisms (with or without an additional, central storage mechanism). Further, if there were no diminution of performance in either task when they were combined, that would serve as evidence against the central storage mechanism or capacity limit.

This type of reasoning can be used only with important limitations, however. As discussed above, several different mechanisms contribute to recall, including not only the capacity-limited focus of attention, but also the time- or interference-limited sources of activation of long-term memory (sensory stores, phonological and spatial stores, and so on). If the focus of attention could shift from examining one source of activation to examining another dissimilar source, it would be possible to recall items from Task A and then shift attention to activated memory representations of the items in Task B, bringing them into the focus of attention for recall in turn. If all of the information need not be entered into the focus of attention at one time, performance in the combined task would overestimate central storage capacity. This possibility contaminates many types of evidence that initially look as if they could provide support for multiple capacity-limited stores. These include various studies showing that one can recall more in two tasks with different types of materials combined than in a single task, especially if the modalities or types of representations are very different (Baddeley, 1986; Frick, 1984; Greene, 1989; Henderson, 1972; Klapp & Netick, 1988; Luck & Vogel, 1997; Martin, 1980; Penney, 1980; Reisberg, Rappaport, & O'Shaughnessy, 1984; Sanders & Schroots, 1969; Shah & Miyake, 1996).

Theoretically, it should be possible to overcome methodological problems in order to determine if there are true multiple capacity limits. One could make it impossible for the subject to rehearse items during presentation of the materials by using complex arrays of two types concurrently; perhaps concurrent visual and auditory arrays. It would also be necessary to make sure that the focus of attention could not be used recursively, shifting from one type of activated material to the next for recall. If the activated representations were sensory in nature, this recursive recall might be prevented simply by backward-masking one or both of the types of materials. These requirements do not seem to have been met in any extant study. Martin (1980) did use simultaneous left- and right-sided visual and auditory lists (4 channels at once, only 2 of them meaningful at once, with sequences of 4 stimuli presented at a fast, 2 / sec rate on each of the 4 channels). She found that memory for words presented concurrently to the left and right fields in the same modality was, on the average, 51.6% correct, whereas memory for pairs containing one printed and one spoken word was 76.9% correct. However, there was nothing to prevent the shifting of attention from visual to auditory sensory memory in turn.

Another methodological possibility is to document the shifting of attention rather than preventing it. This can be accomplished with reaction time measures. One enumeration study is relevant. Atkinson, Francis, and Campbell (1976) presented two sets of dots separated by their organization into lines at different orientations, by two different colors, or by their organization into separate groups. Separation by spatial orientation or grouping was capable of eliminating errors when there was a total of 5 to 8 dots. Color separation reduced, but did not eliminate, errors. However, the grouping did not reduce the reaction times in any of these studies. It seems likely that some sort of apprehension process took place separately for each group of 4 or fewer dots and that the numbers of dots in each group were then added together. Inasmuch as the reaction times were not slower when the items were grouped, one reasonable interpretation is that subitizing in groups and then adding the groups is the normal enumeration process for fields of 5 or more dots, even when there are no physical cues for the groups. The addition of physical cues simply makes the subitizing process more accurate (though not faster). This study provides some support for Mandler's (1985) suggestion that the capacity limit is for sets of items that can be combined into a coherent scheme. By dividing the sensory field into two coherent, separable schemes, the effective limit can be increased; but different schemes or groups can become the limit of attention only one at a time, explaining why perceptual grouping cues increase accuracy without altering the reaction times.

Physiological studies also may help if they can show a reciprocity between tasks that do not appear to share specific processing modes. One study using event-related potentials (ERPs) by Sirevaag, Kramer, Coles, and Donchin (1989) is relevant. It involved two tasks with very little in common, both of which were effortful. In one task, the subject controlled a cursor using a joystick, in an attempt to track the movement of a moving target. The movement could be in one or two dimensions, always in discrete jumps, and the cursor could be controlled by either the velocity or the acceleration of the joystick, resulting in four levels of task difficulty. In the second task, administered concurrently, the subject heard a series of high and low tones and was to count the number of occurrences of one of the tones. The P300 component of ERP responses to both tasks was measured. This component is very attention-dependent. The finding was that, across conditions, the P300 to the tracking targets and the P300 to the tones exhibited a reciprocity. The larger the P300 was to the tracking targets, the smaller it was to the tones, and vice versa. The sum of the P300 amplitudes was practically constant across conditions. The simplest interpretation of these results is that there is a fixed capacity that can be divided among the two tasks in different proportions, and that the relative P300 amplitudes reflect these proportions.

In sum, the existing literature can be accounted for with the hypothesis that there is a single capacity-limited store that can be identified with the focus of attention. This store is supplemented with other storage mechanisms that are not capacity limited although they are limited by the passage of time and/or the presentation of similar interfering material. The focus of attention can shift from one type of activated memory to another and will recoup considerable information from each type if the materials represented are dissimilar.


Понравилась статья? Добавь ее в закладку (CTRL+D) и не забудь поделиться с друзьями:  



double arrow
Сейчас читают про: