Are some individuals more visually or verbally inclined, therefore learning better from either film or text? Based on his dual code theory, Paivo (1968) concluded that this may be true. In recent research there seems to be a broad consensus that individuals remember and learn visual images and visual information better than words, especially in relation to multimedia (Fletcher & Tobias, 2005; Harskamp et al., 2007, Clark & Mayer, 2008) . On the other hand, some argue on a more general level that there are individual differences as to what presentation form is preferred (Boekaerts, 1982; Kirby et al., 1988, Stiller et al., 2009) . The argument that images are remembered better than words is especially emphasized when information processing is anchored and examined in relation to channel theories. The main objective of this study was therefore to investigate the importance of visual and verbal channel capacity for learning outcome from multimedia and text.
A study by Torgersen (2012) showed that increased short-term memory capacity generally was related to increased learning outcome for both film and text. However, the question is still whether various levels of specific sub-processes in short-term memory related to presentation form are significant for learning outcome from film and text. The specific goals of this study are therefore to measure the capacities of various forms of processing connected to short term memory (STM). The visual and verbal channel capacity is central, and the importance of individual differences within these capacities is related to the learning of details and contexts (understanding) from film and text. Capacity and channel theories within STM are the theoretical basis for this study.
Paivio’s dual-coding theory assumed two processing systems, the verbal and the visual system (Paivio, 1986) . There was no distinction between representational forms within the same processing system, such as auditory-verbal information (speech) or visual- verbal representation (text). Central to this theory is that the systems partly work together and mutually support each other during information processing, and partly that they can work independently.
However, Broadbent (1958) and later Waugh & Norman (1965) and Atkinson & Shiffer (1968) presented information processing models with several specific departments or channels. This means that external stimulation is captured by specialized units depending on the representation form of the stimuli. The principle here was that the information is captured in the corresponding channels in the sensory register (SR), and then processed further in the corresponding channels in the STM. Atkinson & Shifferin (1968) defined the visual and verbal channels, allowing the possibility of more channels, such as channels for auditory and tactile stimuli. Several newer models have defined multiple channels, so-called multimodal channel systems, with various channels depending on the representation form, where information is processed in an interaction (Cohen, 1983; Simatos & Spencer, 1992; Engelskamp & Zimmer, 1994; Spencer, 1996) . In these multi-channel theories and models the term channel is associated with the information presented and processed in STM, rather than classifying it within the senses. For example, the visual channel could consist of two processing channels, one for visual-iconic (pictorial) information and one for visual verbal (word) information. Text and speech will thus be defined each as having their own specific channel, even if they contain the same information, or occur at different times in the course of a presentation (progressive or successive presentation). Furthermore, this approach is more appropriate for analyzing learning processes when using multimedia, where the representation forms are more integrated.
Classic short term memory studies such as Miller (1956) also based their capacity measurements of STM related to words, pictures and figures on channel division. Here the stimuli was simple learning material, “bits”, and he defined “capacity” as “….the upper limit on the extent to which the observer can match his responses to the stimuli we gave him” (Miller, 1956: p. 82) . Norman (1969) made similar studies, including capacity measurements of simultaneous presentation forms, such as text combined with sound based on the same content of learning material. These basic studies showed, as previously suggested, that pictures are remembered somewhat better than text, while speech and text had a relatively equal effect, and combined presentations (dual coding) may give a better effect than single presentations.
Similar classic effects studies have been made of more comprehensive learning materials, such as educational films and television programs, where the theoretical analysis was based on theories and multi-channel “information summary” [Cue summation] (Hartman, 1961a, b; Travers, 1964 , 1967; Sever, 1967a, b, c; Hsia, 1968; Nugent, 1982) . Visual media such as film and television, with stimulation in several channels simultaneously, often gave a better learning score than just text alone (Salomon, 1984) . This was also concluded in the extensive studies by Day & Beach (1950) . However, these results are based on the media of that time and age, and are often founded on different theoretical assumptions and conceptual structures. Even the central concept of information summation [cue summation] was used differently.
While Hartman (1961a, b) defined the term as the summation of the same information presented in several forms (or channels), i.e. a form of direct double-coding, Sever (1967a, b, c) assumed that “cues” involved information in a number of representation forms, not necessarily identical information, but with a high degree of relevance. However, the overall picture within both classic and recent studies on learning material show conflicting results as to what presentation form gives the best learning outcome (Kozma, 1991; Bayraktar, 2002; Johnson & Mayer, 2009; Lei, 2010) . Within a cognitive perspective on multimedia learning, it is however still widely believed that pictures and words together provide a broader, or more thorough, learning outcome than with words alone (Tindall-Ford et al., 1997; Mayer, 2005a; Fletcher & Tobias, 2005) . This is also called the “multimedia principle”, according to Mayer (2001, 2006, 2009) , and is closely related to the concept of information summary.
Central to these concepts is in other words that presentation of information through two channels simultaneously gives better learning than through one single channel, as long as pictorial information (visual-iconic) is represented by one of the two channels. However, the STM capacity is as a key limiting factor in this principle.
Furthermore, based on 15 studies, Mayer (2009) claims that voice [narration] is less stressful for the STM than written text, when these forms of representation occur simultaneously with pictures/visual information. Mayer (ibid.) calls this the Modality Principle. This is still debated, and Jahn (2011) showed that the Modality Principle was not entirely verifiable with other educational materials than the ones Mayer used.
In addition, some studies, although not well-documented, show that the capacity may be somewhat adaptive (or plastic) when exposed to multi presentations (Brünken et al., 2004; Sweller et al., 2005; Mayer, 2005b; Harskamp et al., 2007) . This indicates that the STM capacity may be extended slightly at a high load, i.e. had a compensation effect (Barrouillet & Camos, 2001) .
This effect is also associated with Mayer’s (2009) Multimedia Principle (s). The consequence of this plastic feature is that multi-presentation leads to better learning than expected when it comes to the total load on short-term memory, especially if the presentation is a combination of speech and images rather than text and images.
In studies of the STM processes connected to learning from multimedia, the term “split attention effect” is used when information comes simultaneously from multiple channels (Chandler & Swell, 1991, 1992; Mayer & Moreno, 1998; Kalyuga et al., 1999, Kester et al., 2005; Ginns, 2006; Sweller & Sweller, 2006; Florax & Ploetzner, 2010) . The split-attention effect (SA-effect) means that information from multiple channels is processed, including the linking of new information with the old (retrieval from long- term memory), simultaneously in the relevant representation channel in STM. The concept of capacity is related to both total processing and the sub-capacities for the individual channel processing. There presentation channels also process in interaction with each other when simultaneous stimuli are received from different presentation channels (e.g. visual-iconic and verbal-auditory). If the stimulus represents only one channel, the information is processed in its corresponding representation channel. This happens without interaction with the other channels, but subject to its specific channel capacity, when information is processed in the context of subsequent information in the same channel and any conversion to other representation forms.
Examples of this may be an “inner voice” when reading a text (cf. phonological circuits/[“phonological loop hypothesis”] (Baddeley, 1986, 1998; Koroghlanian & Sullivan, 2000) , ormixed visual and verbal processes between the visual and verbal channels or storage (Morey, 2009) . Split attention effect may occur if information is provided as separate visual and verbal information in a simultaneous presentation (cf. information summation). Attention will be split (divided) between these presentations, with the risk of overloading the capacity of both channels. Due to this, learning outcomes may be reduced (negative SA effect). However, if the load does not exceed the channel capacities, individually or collectively, the integration of information from the channels may contribute to positive learning outcomes (positive SA-effect).
According to Cognitive Load Theory (CLT) (Sweller, 2005; Ayres & Swell, 2005; Artino, 2008) the ultimate learning will therefore occur if the amount of information does not overload the capacity of the different channels at the same time (multi-presentation). Visual and verbal information should be integrated into each other. Integration of this kind is easier to design with multimedia than with analog media, but on the other hand, it may overload individual channels, or cause several small and flickering divided attention spans between parts of information in the integrated representations, with reduced learning outcome as a consequence. The study of Florax & Ploetzner (2010) showed an example of this where learning outcomes (detailed knowledge) of integrated representations were significantly better than learning outcomes from the separate text and images (segmentation), without short text-based pictorial explanations (“lables”). However, the result was non-significant in terms of understanding.
Based on eight studies, Mayer (2006, 2009) also found that placing text and corresponding images close to each other in a multimedia presentation (Screenshot), gave a better learning outcome than if they were separated (Cohen’s d = 1.30). He described his finding as the “Contiguity Principle”. A possible explanation for the better learning outcome is that the integral representation may reduce the split attention effect, precisely because the visual and verbal information were integrated into each other. The outcome of the “split-attention effect” in relation to learning outcome appears to be connected to the manner in which visual and verbal information are presented in relation to one another in a multimedia presentation (screen).
However, an alternative to integrated or multi-based presentations is to present a little information at a time, either through the same channel, or at different times through different channels (for example, shifting image and speech) so-called progressive or successive presentation. This presentation form is possible in multimedia presentation, and has some similar aspects to the process of reading text. In this way the attention is led to the channel that gives information, and the effect of “shared attention” is reduced.
Central to this process is the sensory register (SR) (sensory register/primary memory). Sensory Registry represents the first mental processes that handle information before it is submitted to the short-term memory. SR has a high capacity, with a short processing time (1/4-2s), and contributes to the selection of information, and is thus important in the choice of which information and channels in short-term memory actually will be loaded or processed in the STM (Unsworth & Engle, 2007) . It is therefore also interesting to examine the capacity of the SR, in relation to the study of learning outcome from different forms of presentation.
However, common to these studies of the capacity for individual learning materials and effect studies of educational films and multimedia programs is that they either only measure the capacity goals, or only the learning outcome from the representation forms or media. The studies have mainly taken place in controlled laboratory conditions and not in ordinary classroom conditions. Where capacity studies of short-term memory have been used in connection with effect studies of various presentation types (hybrid studies), this has mainly been to investigate any differences between experimental groups (e.g. Florax & Ploetzner, 2010 ), and not the importance of STM capacity for learning outcomes of multimedia and text. In other studies which have focused on individual differences in the STM-capacity and multi-media learning, the STM-capacity has been described as one single measure of a general capacity (Sanchez & Wiley, 2006, Doolittle & Altstaedter, 2009; Lusk et al., 2009) . Newer hybrid studies have also shown that individual differences in capacity are important for higher cognitive functions, including attention, learning outcome and problem solving in general (Unsworth & Engle, 2007; Unsworth et al., 2009; Redick et al., 2011) .
Therefore, the specific objectives of this study are to examine the following:
1) The association between STM-categories in general and learning outcome from film and text.
2) Differences in learning outcomes between three capacity levels of the STM categories PC, MC and SC for learning from film and text.
3) The associations and differences in learning outcomes between three capacity levels of visual and verbal channel capacity for learning with film and text.
In this study the term “STM-categories” is used to cover three central processing categories in short-term memory and the capacity for them; progressive capacity (PC), multi-capacity (MC), and sensory capacity (SC), while the term channel is used for visual and verbal processing channel (see 2.2.2).
2.1. Samples and Procedures
The sample (n = 396) consisted of students at undergraduate level, including officers (n = 94, Military Academy), student teachers (n = 194) and a mixed group of engineer and psychology students (n = 101). The sample contained 193 women and 185 men. In this study, the field of study was not used as a variable. The respondents were perceived as a total group, divided according to whether they were exposed to film as a presentation (n = 192) or to text (n = 192). The distribution was respectively men and women 99/88 for film and 94/97 for the text group. The overall response rate was 95.5%.
The survey was carried out in the respondents’ regular classroom or lecture hall, and was conducted in connection with a regular lesson. First a brief (5 min) introduction was given, and forms for anonymity and informed consent were addressed. Then followed the STM test conducted in plenary with the use of PowerPoint (about 20 min). The respondents checked off their answers on the distributed form. Finally, the educational film was seen or the text was read, followed by a knowledge test (about 20 min in total). The entire survey was completed in about 60 min.
In this study, two main variables were measured. One was the learning outcome from the film or text and the other was the level of short-term memory capacity (STM) (category capacity and channel capacity.
2.2.1. Measurement of Learning Outcomes
Learning outcomes were measured in two samples. One group was exposed to a (digital) film presentation and another group received an (analogue) text as a presentation form. The film consisted of a selected sequence of 9 minutes and 15 seconds from an educational presentation which dealt with an era in Norwegian history, the unification conflict (800-1270 AD).
The film sequence was chosen in accordance with certain criteria, amongst others certain effects in the film (which were to be tested in another study). The teaching material had to be relatively unknown to the test group, but without demanding any prior knowledge of the subject (criteria of unfamiliarity). When testing and comparing learning outcome from various presentation forms it is necessary to be certain that the knowledge is a result of the presentation, and not of prior knowledge of the subject (Torgersen & Barlaug, 2004) .
The text material was identical with the film’s narrative, with a total of 1113 words. The allotted time for text reading was 8 minutes and 25 seconds, which gave the same exposure time for both film and text, based on a normal reading speed of about 140 words pr. minute. The learning outcomes from both presentation forms were measured with a knowledge test consisting of 13 questions where the answers were divided equally between the film and text. In the knowledge tests the scores were coded to dicotome variables, showing right or wrong answers. Wrong could be actual wrong answers or unanswered questions. Sum scores were made according to the number of right answers from the respondents. The questions also measured the teaching material or subject matter. The difference between detail and context (understandings) was emphasized. Details meant knowledge about certain dates and names, and this was measured with 9 questions. The knowledge that required context and further explanations to answers was measured by 4 questions. The knowledge test gave five response options for each of the questions. All the response options were relevant to the subject, but only one of the five response alternatives was correct according to what was presented in the film or the text. The responses were only oriented toward the verbal information, either just given verbally (also reproduced in the text), or both verbally and by labeling in the film. The term “labeling” means that important verbal information also appears as a short text in the picture i.e. a form of double coding, meaning simultaneous visual verbal and visual iconic coding.
2.2.2. Measure of STM Category Capacity and Channel Capacity
In this study we have applied both Paivio’s two coding theory and multi-modal channel understanding, as well as two central principles within cognitive load theory, the dual attention effect and multimedia principle of Mayer (2009) . Another central aspect was the possible adaptation effect of the STM-capacity during multimedia presentations and stimulation of two channels simultaneously (visual and verbal). Torgersen & Barlaug’s (2004) STM-test was utilized to measure the STM capacity. Channel capacity is measured with all the visual and verbal STM tests available in this measuring instrument. The processing time, defined as short-term memory, is set between 2 - 30 s, and SR to the interval 1 - 2 s, as the basis for the construction of the STM test (Cf. Howard, 1983; Conway et al., 2005 ). Other STM tests, such as the Wilde Intelligence (Jäger & Althoff, 1983) were considered too little nuanced with respect to the measurement of channel capacities for this study.
Based on a multi-modal understanding, the capacity of the objectives was evaluated in relation to the information density (from the film) and the manner in which the information was given. A distinction was made between three types of processing channels. The first one is called Progressive capacity (PC), which measures the capacity to process a little information at a time (progressive presentation). The second one is called Multi Capacity (MC), which measures the capacity to work with a lot of information that is given simultaneously.
In addition, the capacity to remember brief glimpses of information (1 - 2 s) is defined as a separate category, and corresponds to the classical use of the term “Sensory Index” (SI). The term the sensory capacity (SC) is used for this. These divisions have also been made with regard to Study 3, which examines the significance of measures in film presentations that match these STM-related channel types. In this study the capacity is divided into three levels: low, medium and high. This is done to investigate the importance of individual differences in STM capacity related to learning.
Progressive Capacity (PC)
Progressive capacity (PC) is defined as the ability to process information that comes in stages or a little at a time (progressively or successively). This capacity was measured with three subtests: 1) PMvi (Progressive visual memory), 2) PMvv (Progressive verbal memory) and MviB (Visual figurative Multi-capacity). Each of these was divided into the three capacity levels low, medium and high, depending on the number of correct scores. The number n in each category was different, because the relevant category (low, medium or high) depended on the frequency distribution of correct answers on the tests. This is the reason why quartile divisions were not used. A division of this kind requires several more test questions than what was used in this study.
The division into three capacity levels has been made to clarify the capacity levels. A MANOVA analysis was conducted with the presentation form and progressive memory capacity as independent variables, and learning outcomes (details and contexts) as the dependent variables. The division was conducted as follows: On the PMvi test 0 to 12 correct answers could be achieved. Individuals with low score had between 0 and 6 correct answers. The medium value was between 7 and 8 correct responses, while high value was between 9 and 12 correct responses. The distribution of the correct response was almost normally distributed with a standard bias in the distribution of −.66 and a standardized kurtosis of .62. This test provides a measurement of increasing capacity, not of simultaneous progressive presentation. However, as testing is conducted with one channel at a time, the split attention effect between the visual and the verbal channel can be excluded. Yet a direct capacity for simultaneous progressive presentation is not measured, where a successive split-attention may take place, because the individual channels prepare for new information, while others process information.
For PMvv the response vary between 0 and 12 correct answers. The lowest category has a score between 0 and 6, medium category between 7 and 8, and the high category ranges between 9 and 12. The distribution has a relatively normal distribution, but has a standardized kurtosis of 1.39 and a bias of −.77. Both PMvi and PMvv had the same number of questions in the test, in other words, the same number of correct answers is utilized for the category divisions in both tests.
The MviB became a separate independent variable, because this test measures a different visual form than the other visual tests. The test was used to measure both progressive capacity and multi capacity, but was also excluded in some studies. MviB contained a total of 20 questions. The lowest category was between 0 and 11 correct answers, the medium category had 12 to 13 correct answers, while the high category had 14 to 18 correct answers. The distribution had a negative bias (standardized bias was −1.53 and the standardized kurtosis was 3.71). The raw score on PMviB is not normally distributed. The division into low, medium and high is therefore based on the frequency distribution of correct answers.
Multi Capacity (MC)
Multi Capacity (MC) is defined as the ability to simultaneously process large amounts of information, visually and/or verbally. Processing this type of information requires split-attention (simultaneously between visual and verbal channels) or for example different elements in a detailed picture (visual multi-capacity). A possible split attention effect could therefore be measured by these tests, even though they do not measure the visual and verbal capacity in the same test sequence, but separately.
Multi capacity was measured with three tests, one based on a photograph (Multi Visual Image, MviB), and two matrices, one of which shows 30 characters (Multi Visual Figures, MviF) and the second showing 30 words at a time (multi-visual-verbal word, MvvG). The different capacity types were recorded into categories for levels of multi-capacity based on the number of right answers to the tests, similar to MviB. The MviF test contained a total of 10 right responses, in which 0 - 3 are encoded as low capacity (21.7%, n = 84), 4 - 5 correct defined as medium capacity (37.8%, n = 148) 6 - 10 (40.8%, n = 160) correct as high capacity. The same number of correct answers was used for the different levels of the MvvG test. Here low capacity was 31.9% (n = 120), medium capacity was 45.7% (n = 175) and high capacity was 22.5% (n = 86). This is a theoretical estimation in relation to the raw score. It was chosen to have a larger number of people selected to be in the low and high capacity groups.
This division is justified by an assessment of the contents of the tests. This is the reason why the measures 4 and 5 are set as a basis for the medium category. Also, the MviF and MvvG tests had the same division according to correct numbers and levels, as these were to measure the same type of capacity, but with different representations forms, verbal and visual. A more statistical division would have resulted in small groups. For example, a division based on standard deviation would have resulted in about 10% (approximately n = 40) or less in low and high capacity categories (i.e., above/below +/− one standard deviation).
These groups would have decreased to half their size when further divided into the film and text groups. Larger groups were therefore selected. Before this categorization the divisions of MviF and MvvG were on an interval level. With categorization the variation form is usually lost, but this categorization makes it possible to perform MANOVA analysis where performance form, STM tests and the interactions between them are the independent variables.
Sensory Capacity (SC)
Sensory capacity is the ability to quickly capture and process information that is presented in a brief moment (1/4-1s). This was measured by two tests, one with visual stimuli (SMvi) and one with verbal stimuli (SMvv). The breakdown of the capacity levels was made based on the number of correct responses. Low capacity was defined as 0 - 2 correct, 3 - 4 correct as medium capacity and high capacity 5 - 6 correct. This division was made based on the number of possible correct answers ranging from 0 to 6. Division into quartiles was not used because of too few questions.
Channel capacity is the ability to process either visual or verbal information, in which both progressive, multi-oriented and sensory forms of presentation are included. Visual channel capacity was measured by the tests PMvi, MviFpgSMvi and verbal channel capacity was measured by the tests Pmvv, MvvG and SMvv. The picture test (MviB) was excluded from the construction of visual channel capacity, because this test was somewhat special in relation to the others. Both the visual and verbal STM capacity was divided into three categories of quartile divisions. The first category (low) was composed of approximately first quartile. The second category (medium) was composed of approximately the second and third quartile, and the third category (high) consisted of approximately the fourth quartile.
2.2.3. Statistical Analysis
In order to examine the relative meaning of the STM categories, PC, MC and SC in relation to the learning outcome from details or contexts from film or text a regression analysis was conducted in each group. In addition, separate ANOVA analysis for film and text were conducted, and a MANOVA analysis to identify the interaction between the various tests included in the capacity measurements and their interaction with the presentation form.
3.1. The Relation between STM Categories in General and Learning Outcome from Film and Text
There were no strong significant relations to the learning outcome from film and text and the three STM categories PC, MC and SC. However, Table 1 showed that there was a relation between film and text according to the relative meaning of the STM categories
Table 1. Separate linear regression models for the relationship: STM-categories (independent variables) and learning outcomes (dependent variables) for film and text by use of the regression method “Enter”.
*= β-quotients are significant at the 0.05 level (two-tailed); **= β-quotients are significant at the 0.1 level (two-tailed).
for the learning of details. As shown in the table the relations were not strong, but the three predictor variables PC, MC and SC showed more variance in the learning of details (R2 = .14), than for the learning of details from film (R2 = .05). For learning of details from film the PC category explained all the variance. For learning of details for the text group, the MC and SC categories together explained most of the variance. This is shown by the significant standardized β-quotients for MC (.20, p < .01) and SC (.21, p < .01).
The relative significance of the STM categories for learning of contexts was generally weaker than for learning of details. Nevertheless, there was a tendency toward the PC category and the MC category together having significance for the learning outcome of contexts from film. Thus, the PC category was important for both the learning of details and the learning of contexts from film. For the text group the PC category was most significant for learning of contexts. For film and text together, the PC category meant most for learning of both details and contexts, but nevertheless there was still a tendency towards all the three categories having significance for the learning of details.
3.2. Differences in Outcomes between Three Levels of Capacity Types by STM-Categories (PC)
A MANOVA analysis showed that there was no consistent interaction effect between presentation form and progressive STM capacity (PC) at a p > .05 level. This may be because the groups receiving, respectively, film and text, had a fairly equal STM capacity. An interaction effect was not expected here. In contrast to this there was a significant interaction effect between capacity types PMvi, PMvv and MviB (p < .01) in the learning of contexts (F = 2.42, p > .05, Wilks’ λ = .90, p < .01). This shows that STM- capacity, measured as the interaction between the various capacity types that were the target of PC, was important in the learning of contexts. A similar significant interaction effect was not measured for learning of detail.
Table 2 shows that the PMvi level had significant differences in learning outcome from film for both details (F = 3.58, p < .05) and contexts (F = 3.12, p < .05). The PMvv levels had an even stronger significant difference for learning from film, but only for details (F = 5.57, p < .01). There was also a significant difference between the PMvv levels for text and learning outcomes from details (F = 3.16, p < .05). This suggests that there was a correlation between the increasing capacity of PMvi and PMvv and learning outcomes from both text and film, especially for the learning outcome of details.
A similar analysis of shows that there was essentially no difference in learning outcome between the levels in multi-capacity (MC), but there were generally rising levels of learning outcome in both the visual (MviF) and the verbal test for multi-capacity (MvvG). However, only the levels of MviF for learning of details from text hada significant difference in learning outcome (F = 3.97, p < .05).
An ANOVA-analysis also for sensory memory (SM) shows a tendency toward a higher capacity of sensory memory (both visual and verbal) provided better learning outcome from details of the text (F = 5.82, p < .01). A similar linearity was not found in connection to either film or text in relation to the learning of contexts. A slight tendency
Table 2. Average values for learning the details and connections from film and text in relation to different levels of progressive memory capacity.
*= p < .05, **= p < .01. The learning outcome measurements are specified on a scale showing 0 as the lowest, 9.00 as the best for details, and 4.00 as the best for learning contexts. Marked measurements indicate overall average for capacity types. †This difference in learning outcome for details between the PMvi categories in films, is curvilinear. If the F value is considered only between low-and middle-STM-value, it is non-significant (F = 1.19). The significant correlation lies in the relationship between medium and high PMvi capacity.
for differences in learning outcome was detected for SMvi for contexts for both film (F = 2.42) and text (F = 2.86), but the size of n may have contributed to this being statistically insignificant. The overall results show that it is especially the group with low values on both SC-tests that learn least from text.
In addition a MANOVA analysis was conducted. This analysis shows that no Wilks’ λ were significant. This indicates that in interaction with the presentation form, the effect of sensory capacity is weak for learning both details and context, from both film and text. There was a clear interaction between presentation form and SMvi (F = 4.33, p < .05). Therefore, SMvi had a clear impact on learning details from text. This trend was also detected in the ANAOVA-analysis. This was further supported by a correlation analysis between SMvi and learning outcomes from details, with a Pearson’s r = .31, p < .01.
3.3. Differences in Learning Outcomes between Three Levels of Visual and Verbal Channels of Film and Text
Visual channel capacity was measured by the tests PMvi, MviFpgSMvi and verbal channel capacity was measured by the tests PMvv, MvvG and SMvv.
3.3.1. Differences in Learning Outcome from Film and Text at Three Levels of Visual Channel Capacity
Table 3 shows the direction of the differences. Average values consistently revealed that learning outcomes increased with increasing channel capacity for both film and text. η-values were significant and showed a clear pattern. For film the clearest difference between capacity levels was connected to learning contexts (F = 7.67, p < .001, η = .27), while with text the difference was most evident in relation to learning details (F = 9.82, p < .001, η = .31). There was also a significant difference between capacity levels for learning details from film (F = 3.28, p < .05, η = .18), and learning contexts from text (F = 3.75, p < .05, η = .20). For text this difference was only between the low and medium channel capacity.
Table 3. Average values for learning outcome of details and contexts in relation to different levels of visual STM capacity (visual channel).
*= p < .05, **= p < .01; ***= p < .001. Learning outcomes are specified on a scale where 0 is the lowest, 9.00 is best for detail and 4.00 is best for contexts. (There are 396 people in the survey, but some have not defined what presentation form they have received. This resulted in some differences in n between STM tests and measurement of learning outcomes).
For individuals with a low channel capacity the learning outcome was relatively equal between film and text, both for the learning of details and contexts. Individuals with medium channel capacity had a higher average value of learning outcome connected to text (m = 5.70) compared with film (m = 4.91). Individuals with a high channel capacity also achieved higher average values of learning outcome from film (m = 5.76) than from text (m = 5.50) when learning details. Thus, individuals with medium and high visual channel capacity achieved the best learning results from text when it came to learning details. Individuals with a low visual channel capacity had almost the same learning outcomes from film and text when it came to learning of details. The learning outcome for contexts was somewhat different. Here those with high visual channel capacity had the best results with film (m = 2.19) compared to text (m = 1.90), while for other capacity levels learning outcomes were the same.
3.3.2. Differences in Learning Outcome from Film and Text at Three Levels of Verbal Channel Capacity
Table 4 shows the results for the verbal channel capacity in more detail. All differences here are significant from p < .05 to p < .001 level. Average values show that learning outcomes increased with increasing channel capacity, and most clearly for learning details from film (F = 8.47, p < .001, η = .29). There was also a significant difference between capacity levels for the learning outcome of contexts in film (F = 4.05, p < .01, η = .20). The differences were significant between the low and medium capacity levels. For text the difference in learning outcomes between capacity levels was greatest in the learning of contexts (F = 5.14, p < .01, η = .23), while the difference was smaller for learning of details (F = 3.85, p < .001, η = .20). The most significant difference in learning outcomes between capacity levels was between medium (M = 5.12) and high (M = 6.17) for learning of details from film. There was also a corresponding increase in learning outcomes between the middle (m = 5.70) and high (m = 6.50) visual capacity level for learning of text (see Table 4).
Table 4. Average values for learning outcome of details and contexts in relation to different levels of verbal STM capacity (verbal channel).
*= p < .05, **= p < .01; ***= p < .001. Learning outcomes are specified on a scale where 0 is the lowest, 9.00 is best for detail and 4.00 is best for contexts.
The learning outcomes were generally similar or slightly higher for text for all the three capacity levels. There were, however, two exceptions. For individuals with high verbal channel capacity and learning details from film, the learning outcomes measured somewhat higher (m = 6.17) than for the corresponding group with text (m = 5.93). For those with medium verbal channel capacity and learning contexts from text, the learning outcomes also measured somewhat higher (m = 1.80) than for the corresponding group with film (m = 2.00).
The main objective of this study was to investigate the importance of visual and verbal channel capacity for learning from visual and verbal forms of presentation. The results showed that there was a significant correlation between channel capacity and learning from film and text. This applied to both visual and verbal channels. The values were relatively similar for film (r = .57, p < .001) and text (r = .59, p < .001). This may indicate that a split attention effect (SA-effect) does not reduce the learning outcome of multi-presentation compared with text. Instead, a high channel capacity appears to be important in order to take advantage of the SA-effect for learning by multi-presentation, both for learning details and contexts.
This is interesting in terms of selecting presentation forms in the design of teaching and instructional programs. However, if multi based presentation forms are a part of the educational scheme, participants with low-capacity STM require a more thorough educational adaption than participants with high-capacity STM to ensure good learning outcomes.
Nevertheless the findings in the STM categories Progressive capacity (PC), Multi Capacity (MC), and sensory capacity (SC) varied depending on the educational material. The STM categories had less impact on learning of contexts (understanding) than on learning of details. This was true for both film and text. It was particularly the Progressive visual (PMvi) and Progressive verbal (PMvv) capacity that had an impact on learning outcome of details from film and text. However, only one of the tests that measured visual multi-capacity (MviF) revealed significant differences between low, medium and high capacity and learning of details. This connection was only measured for learning from text. In the category of sensory capacity (SC) there was a tendency that individuals with higher capacity also learned most details from text. This may imply that SC is a feature that probably is more important for learning from text than from film. For learning contexts, there was no significant difference between capacity levels by STM-categories for either film or text.
For learning details from film, however, the difference in learning outcome between the capacity levels was greater for verbal channel capacity than for the visual channel. Average learning outcomes were also higher for those with medium and high verbal capacity than the corresponding levels of visual channel capacity. Thus, the tests including measurement of visual channel capacity contributed to a lesser extent to explain the learning outcomes than the tests that were included for verbal channel capacity. This may be interpreted as verbal channel capacity being of greater importance than visual channel capacity for learning details from film. However, this relationship was reversed when learning with text. Here, the difference in outcomes between the visual capacity levels was greater than for verbal channel capacity.
Individuals with low and intermediate visual and verbal channel capacity had relatively similar learning outcomes, but the average values for individuals with high visual channel capacity were slightly higher than for those with high verbal channel capacity. This may indicate that visual channel capacity has the greatest impact on learning of details from text.
These results may be explained by the fact that increasing verbal channel capacity contributes to absorbing more detailed oral information (verbal-auditory), combined with text and supportive images (multi-presentation). Thus, learning from film will be beneficial, as this medium has these combined and simultaneous presentation forms. On the other hand, for text presentation (visual-verbal), it will be an advantage to have a high visual channel capacity. This is because the reader constructs and connects visual notions to the details being read, and this will exert both visual and verbal channels.
High visual capacity may contribute to this connection occurring more easily and more thoroughly than at a lower visual channel capacity, with better learning outcomes as a consequence. This may indicate that high visual channel capacity is more important for learning outcome from text than verbal channel capacity, as long as the reading speed is satisfactory so that the reader absorbs what is read.
For the learning of contexts (understanding) this relationship was reversed compared to the learning of details. Visual channel capacity explained more of the learning outcome from film than verbal capacity. Individuals with high visual channel capacity also scored slightly better than those with high verbal capacity. For text, the relationship was reversed, as verbal channel capacity explained learning outcomes more than visual capacity and individuals with a high verbal channel capacity scored better from text than from film. In relation to this study’s theories based upon channel capacity and SA-effect, this also makes sense. Educational material with professional purposes requires compilation of information, where pictorial information contributes to clarity and comprehensiveness. The pictorial information connects to the verbal information, and as we know images contain a high information density compared with words (Mayer & Gallini, 1990; Mayer & Simms, 1994) . It is clear that the channel capacities were not overloaded, and a SA-effect may have contributed positively to learning outcome by integrating the information.
In this connection, the load may have been reduced by effects that were used in the film. Another question is whether high channel capacity in general contributes to reducing the vulnerability of shared attention. High channel capacity may possibly also include the capability of changing the attention between channel and internal loop processes in the same channel (cf. Baddeley, 1986, 1998; Koroghlanian & Sullivan, 2000; Morey, 2009 ). Overall, a high visual channel capacity may be an advantage for learning of contexts from film. For the learning of contexts from text, a high verbal channel capacity is a greater advantage, because this learning process requires the compilation of a lot of verbal information - from the text.
Overall, both the capacity of the STM category and the channel capacities had the greatest impact on learning of details from text. As previously mentioned this might mean that the effects in the film may have contributed to a compensation for multi-presentation and thus reduced the effect of overloading the STM-categories and any negative SA effect. In a reading process changing channels based on split attention does not occur, but the capacity can be overloaded in the same channel (visual verbal), because of the demands of high frequency of internal repetitions of the learning material (Mayer & Moreno, 1998; Morey, 2009) . This may have occurred here in connection to detail learning, because the educational material contained many details. This might explain why the category capacities had a greater impact on learning of details from text compared with film. In other words increased category and channel capacity helps the reader to process more details, and thus learn them better, without overloading the capacities. For learning of contexts or understanding, similar cognitive processes may take place, but overload will depend on the complexity of the educational material and presentation form. It is possible that the test material did not contain enough complex contexts for the capacity of the visual-verbal channel to become overloaded, either for film or text. Neither simultaneous combination of sound (auditive-verbal), text signs (labeling) or visual information (visual-iconic) through the film seems to have contributed to a negative SA-effect or overload in such a manner that increasing capacity had any mentionable impact on learning outcomes. Another explanation may be that the adaptation effect of the STM-capacity when exposed to multimedia presentations is more prominent with complex learning material than with learning of details. If this is the case, the adaptation or compensation must have had a greater impact for individuals with low and mid-level channel capacity than for those with a high capacity, since the difference in learning outcomes between the two was not significant in relation to learning of context.
However, Study 1 showed that text gave a significantly better learning outcome than film, and it was precisely the learning of details that primarily contributed to this. A certain negative SA-effect can therefore not be excluded as the reason for the poorer learning outcome from film and learning of details.
However, in this case corresponding conditions should have been detected for learning contexts from film. Here the learning outcomes were approximately equal between film and text. If a negative SA-effect has influenced processes in the STM, this has not had a major impact on learning outcomes. The present study also shows that high channel capacity may have an impact on the results of overload connected to split attention.
In addition, the learning outcomes of this study are measured through the number of correct responses on a verbal knowledge test. Therefore, it is not unreasonable to believe that this fact contributes to the higher correlation efficient measured between the verbal channel and learning outcome (r = .31, p < .01 for film, r = .29, p < .01 for text), than between the visual channel and learning outcomes (r = .26, p < .01) for the film, r =.26, p < .01 for text). When the knowledge tests are verbally oriented, and require a verbal response (multiple choice answers), the representation form corresponds with the method of learning the knowledge that the test requires. The need for representation transformation between forms in the verbal performance will thus be reduced (Skaalvik, 1977; Torgersen & Vavik, 2005) . In other words, the respondents perform in the same form as the learning has taken place. This may be an easier cognitive task, or more direct, i.e. without significant transformation processes between the channels or representation forms, than if these had been different.
On the other hand the correlation was almost as high for both film and text, so presumably the visual support from the film does not seem to have been a disadvantage for learning outcomes even if the performance on the knowledge test was verbal. Any transformations from visual form, i.e. image information in the film into verbal forms were not an obstacle to learning outcome. On the contrary, it might seem to have been an advantage, as the pictures gave the words a wider or deeper meaning. This might indicate that for verbal performances a high verbal channel capacity actually can be an advantage, rather than a high visual channel capacity, regardless of whether the form of presentation is verbal (text) or multi-oriented (film/multimedia).
The conclusion of this study is that channel capacity is important for learning outcomes in both film and text, especially when learning details. The relationships though, are not very strong. Previous studies have shown this in general with easy educational material in laboratory conditions, and with theoretical models, including Paivio’s two-code theory (Paivio, 1986) . The present study shows that this is also applicable to realistic educational materials in classroom conditions. Earlier research-based discussions and studies have also shown that the learning outcomes from different forms of presentation depend on individual differences in visual and verbal dispositions, but they have not shown what specific types of capacity STM actually have an impact. The conclusion has often been in general that some individuals are more verbal or visual than others, and therefore learn better using one form. The present study has gone deeper, and revealed that these aspects specifically consist of various types of STM-oriented capacity differences. First and foremost, this applies to the capacity to process information presented a little at a time (progressive capacity).
The capacity for simultaneous presentation through multiple channels (multi-capacity) is also important for learning from both film and text. Although multi-capacity seems to have less impact than expected, especially because the effect of overloading with split attention (negative SA-effect) can probably be countered both by the text reader himself, and by how a multimedia presentation is designed and adapted to the different capacity types in the STM. This study has thus shown that it is underlying STM-related capacities and functions that are important when learning from different forms of presentation.
These findings indicate that educational programs that include multimedia and separate text presentations should be arranged differently depending on the participants’ channel capacity and the education material. Individuals with a high visual channel capacity may have an advantage in learning contexts from film. For learning of contexts from text, the findings show that individuals with a high verbal channel capacity may have an advantage. Therefore extra educational preparation should be given to individuals with medium and low visual and verbal channel capacity. In addition, multimedia applications and films should be designed to match the recipients STM capacity in general. This is examined in further detail in Torgersen (2012) .