Received 28 June 2016; accepted 15 August 2016; published 18 August 2016
Many medical curricula have adopted forms of active learning. The foundations of the benefits of active learning were described by Johnson & Johnson (1999) . Two wide-spread active learning methods are collaborative small group work and peer-teaching. Collaborative group work is thought to stimulate interdependence, individual accountability and team assessment (Smith, 1996) and small group learning may stimulate motivation, group cohesiveness, intellectual development and cognition (Slavin, 1996) . As a result the learning process of students is advantaged (Cohen, 1994; Dolmans & Schmidt, 2006) . Peer-teaching offers education to students on their own cognitive level and enhances intrinsic motivation (Ten Cate & Durning, 2007a) , leading to better clarification of problems (Lockspeiser et al., 2008; Campolo et al., 2013) . Peer-teaching has also shown to improve the learning of the peer-tutor (Topping, 1996; Nestel & Kidd, 2003; Bulte et al., 2007; Secomb, 2008) .
Ten Cate and Durning (2007b) discerned three dimensions of peer-teaching: 1) the relationship between the peers, 2) the number of peers that are taught and 3) the level of formality that is ascribed to the peer-teaching by the faculty. Peer-teaching studies concerned usually near-peer teaching (e.g. Bulte et al., 2007; Nelson et al., 2013 ). Most of these studies showed that students performed equally in tests either being taught by faculty professionals or by peers. Same-level peer-teaching was introduced in anatomy classes (Peppler, 1985; Nnodim et al., 1996; Kassab et al., 2005) and several studies were conducted to measure the learning effects (Hendelman & Boss, 1986; Johnson, 2002; Brueckner & MacPherson, 2004; Krych et al., 2005) . From these studies it was concluded that same-level peer-teaching mostly led to better results during examination. The size of the group being peer-taught also matters because size influences the dynamics of the group (Tyler, 2006; Ten Cate & Durning, 2007a, 2007b; Edmunds & Brown, 2010) . These authors also discussed the levels of formality of peer-teaching: on one hand the level could be very informal such as peers working together outside classes, while very formal on the other such as peers that teach and grade each other in obligatory educational settings.
On top of the three dimensions of peer-teaching formulated by Ten Cate and Durning (2007a) two other dimensions can be recognized: the format in which the peer-teaching takes place (oral presentations or poster presentations (Billington, 1997; Kitto, 2009) and the amount of available preparation time for the peer-teaching event. Oral presentations or communications are commonly used as the vehicle to transfer information during peer-teaching. Chollet et al. (2009) showed that during examinations students performed better on topics covered by their own oral presentations. Higgins-Opitz & Tufts (2010) reported that students who performed in peer-teaching perceived the presentations as beneficial to their learning. Joughin (2007) argued that oral presentations were demanding and therefore would lead to better learning. The time in between the assignment to peer-teach and the actual peer-teaching event may vary from minutes (Kågesten & Engelbrecht, 2007; Kooloos et al., 2011) to weeks (Ramaswamy et al., 2001) . Despite a gap of four weeks, students reported that they perceived improved understanding of the matter they had presented themselves (Higgins-Opitz & Tufts, 2010) .
Preparing for peer-teaching only was found equally effective for learning as preparing plus execution of peer-teaching (Barg & Schul, 1980, Gregory et al., 2011) . Students who expected to teach were supposed to change their attitude to the material to be studied (Barg & Schul, 1980) . Fiorella & Mayer (2013) also showed that peer-teaching actually has to have been carried out for the retention of the surplus of learned matter during the preparatory phase.
To use the expectancy for teaching as a motivator to induce cooperation in small group work has not been studied yet to the best of our knowledge. Kooloos et al. (2011) studied the effects of different structures and assignments of small group work on learning gain, student satisfaction and perceived participation. Peer-teaching was used in one of their study-arms. In that study, however, the supervising staff instructed the groups varyingly, leading to somewhat different peer-teaching events. In the present study these instructions were controlled for and it was further explored whether an increased formality level of peer-teaching showed influences on learning gain, student satisfaction and perceived participation. Formality was defined as the weight posed on the oral presentation by the teacher and as a consequence the possible importance the subsequent peer-teaching event got in the views of the students.
The research questions were: Are perceived participation and satisfaction increased in small groups preparing for higher formality levels of the subsequent peer-teaching event? Does this increased perceived participation and satisfaction also lead to higher learning gain?
This study was based on the premise that higher levels of in-class formality put on peer-teaching by the faculty would increase individual accountability and group coherence during preparation for peer-teaching (“sink or swim together”-principle of Johnson & Johnson, 1999 ). This would enhance the participation in the small groups and would then lead to higher learning gains. The hypothesis was that students who had prepared for peer-teaching on a higher formality level would score higher on learning gain, satisfaction and self-perceived participation than those who had prepared for peer-teaching on a lower formality level.
2. Material and Methods
2.1. Educational Context
This study was carried out during a group assignment among first year (bio)medical students in a gross anatomy course (October 2010). They were divided into 30 groups of about 15 students by the faculty. The group assignment was made up of three clinical cases which had to be solved within that particular group session. During the group assignment, the 15 students formed 3 subgroups of 5 students, who had to solve one of these clinical cases and prepare for a short presentation about their conclusions, peer-teaching the students in the other subgroups. Available resources during the group assignment were textbooks, prior knowledge and an expert teacher during the second half of the 2-Hr session. For more details on the educational context see Kooloos et al. (2011) . The goal of the assignment selected for this investigation was to learn the system of the branches of the abdominal aorta by solving three clinical cases: 1) aneurysm cranial to the bifurcation, 2) atherosclerosis of the common iliac artery, and 3) an accessory renal artery.
A clustered randomized controlled study with four arms (Figure 1) was carried out. Arms A and B held 7 groups of 15 students each, and arms C and D 8 groups each. The formality level differed in each arm in the following way:
Arm A: Students worked in three subgroups solving all three clinical cases during the complete assignment and peer-teaching was absent.
Arm B: Students first worked in three subgroups during one hour, each subgroup solving its own clinical case and preparing for peer-teaching. Peer-teaching occurred with the re-assembled group of 15 students during the following hour. Students were instructed to an informal way of peer-teaching, one member of each subgroup was invited to report the solution to its own clinical case while seated and speaking from personal notes only.
Arm C: The same as B, but the subgroups were instructed to prepare a transparency sheet with each other containing the solutions to their own solved clinical cased. One member of each subgroup presented the sheet in front of the re-assembled group of 15 students. The presenter was selected by the subgroups themselves and it was left free when and how the presenter was designated.
Arm D: The same as C, and now the presenter was designated by a dice, thrown just prior to the presentation. So, up to one minute before the peer-teaching event the presenter was not decided upon.
The role of the teacher was the same in all arms: questions could be asked and in case of doubt the teacher assisted. Three senior and well-experienced teachers were semi-randomly divided to the 30 groups taking care that each teacher would perform in the four arms evenly. Beforehand the teachers had reached consensus about the anatomical content of their responses to possible questions during the group sessions.
2.3. Outcome Measurements
All students were subjected to three anatomical knowledge tests, a pretest at the start of the session, a post-test directly following the session and a follow-up test two weeks later. These tests were extended matching questions (EMQs) asking about their understanding of the patterning of the branches of the abdominal aorta (Kooloos et al., 2011) . The follow-up test was included in the examination of the complete course two weeks later. The three tests were identical. During the taking of the post-test, the questionnaire was also put forward and
Figure 1. Depiction of the four in-class levels of peer-teaching studied in the four arms: Arm A: no peer-teaching; Arm B: informal peer-teaching; Arm C: formal peer-teaching, peer-teacher designated by peers; Arm D: formal peer- teaching, peer-teacher designated by throwing a dice just prior to the peer- teaching activity. In every quadrant there are three rectangles with 5 smaller “heads”, depicting the students around tables during the work in the subgroups, preparing for peer-teaching (B, C and D only). The arrow represents the change in the format of the groups sessions after one hour: a teacher― larger head with a “T”―is present and peer-teaching―a larger head with a “P”―is running, except for A where the teacher rotates among the three subgroups to answer possible questions. In B, C and D the assembly of the subgroups in a larger group around a square of tables is depicted and the teacher takes place in the assembly.
completed by the students. The 19 questions in the questionnaire asked about the organization and satisfaction of the group assignment and about their perceived participation. All responses were on a 6-point (Likert) scale, except for the final judgment which was on a scale of 1 - 10. Only the extremes were defined: 1 = fully disagree; not satisfied at all; or much too short and 6 = fully agree, very satisfied, or much too large. Four questions about the short oral presentations (7, 8, 16 and 17) could not be answered by the students in A and B since no (formal) peer-teaching had happened. Additional questions were asked about group characteristics (gender, repeater, etc.). In the final examination of the course, about 2 weeks later, the same EMQ was posed next to nine other EMQs. This EMQ was taken as the follow-up test.
Differences in the experimental arms with respect to group characteristics and to the scores on the questionnaires were analyzed with ANOVA and post-hoc corrections for multiple comparisons, according to Bonferroni, when equal variances were found, and according to Dunnett T3, when the variances were not equal. Equality of the variances was explored with a Levene-test. In order to test if the responses of the students to the questionnaires could be ranked from lower in A to higher in D a Friedman test was carried out, leaving out the scores of questions 7, 8, 16 and 17. Wilcoxon’s signed ranks test was used to explore differences in the scores of C an D, including the scores of all 19 questions.
The pretests and post-tests were scored blindly by one corrector. The scores from these tests and the scores from EMQ in the final examination of the same question were analyzed using repeated measures ANOVA.
Medical educational research in the Netherlands is exempt from formal ethical approval. Therefore information about the treatment of the students is provided. The students were adequately informed of the purpose of the study: at the start of each group session the teacher read out a short paragraph to inform the students of the goal of the experiment. It was stressed that participation was voluntarily, and that they could still participate in the group work if they did not consent to participating in the experiment. The privacy of the students was guarded since all test data were analyzed by someone who is not employed as a teacher and therefore not involved in the course. The results of these tests were matched by student number and then de-identified. All questionnaires were answered anonymously; only the experimental group (study arm) was noted. We were not aware of any vulnerable population among the students that would have required safeguards.
In total 422 students were enrolled and their distribution in the groups was: A107, B 99, C 113 and D 103.
Although a difference between the groups in preparatory training was found, this difference was absent in post-hoc comparisons (Bonferroni) of the separate groups (Table 1). In the other scored aspects the groups did not differ.
On the whole, the mean scores from the questionnaire (Table 2) increased significantly from A to D for all questions (p = 0.000). It was also found that the overall mean scores in the questionnaire of the students in D were significantly higher than those in C (p = 0.007).
About half of the separate questions showed significant differences between A, B, C and D (†, †† and ††† in Table 2). Post-hoc comparisons showed that for items asking about satisfaction (number of question between parentheses):
(Q1) A scored lower on satisfaction with the way of working in the assignment than both C (p = 0.014) and D (p = 0.001);
(Q4) A was less satisfied with the assistance of the teachers than D (p = 0.042);
(Q5) A was less satisfied with the group assignment than B (p = 0.012), C (p = 0.001), as well as D (p = 0.001);
(Q6) C was less satisfied with the small group session than D (p = 0.019);
(Q9) A perceived less grip on the learning material than D (p = 0.005);
(Q11) A recognized to have reached less goals compared to both C (p = 0.009) and D (p = 0.026);
(Q12) A was less satisfied with their learning gain than D (p = 0.003).
Post-hoc comparisons showed that for items asking about participation:
(Q13) A believed to have participated less in the small groups than D (p = 0.034);
(Q14) A perceived less participation between the group members than group D (p = 0.024) and also B perceived this (0.033);
(Q19) In addition, the students in A gave a lower mark for this group session than students in both C (p = 0.08) and D (p = 0.001).
The learning gain from the pretest, post-test and the follow-up test showed no differences at the three time points. The lines that represented the average proportion of the scores on the three tests examined run in parallel (Figure 2) and there were no statistical differences found between the groups in learning gain (p = 0.087). The tests did differ significantly in their average proportion scored at the three separate time-points (p = 0.000), so students gained lots of anatomical knowledge in between the tests.
Table 1. Group characteristics. All figures are in percentages, except for n and the p-value. One-way ANOVA shows a statistical difference between all groups in preparatory training (p = 0.011), but in post-hoc comparison between the separate groups this difference breaks down: there are no statistical differences between C and B (p = 0.091) or C and D (p = 0.098).
Table 2. Results of the questionnaire. The questions posed to the student in the questionnaire, the average score and standard deviations in the four experimental groups are given. Statistical differences between groups are marked as †p < 0.05, ††p < 0.01 and †††p < 0.001. All questions, except the last, are registered on a 6-point scale.
Figure 2. Learning gain per experimental group and at the three tested time-points. Arm A: no peer-teaching: squares and dashed lines; Arm B: informal peer-teaching: triangles and solid lines; Arm C: formal peer-teaching, peer-teacher designated by peers: open circles and solid dotted lines; Arm D: formal peer-teaching, peer-teacher designated by throwing a dice just prior to the peer-teaching activity: crosses and open dotted lines. Data at session start: A = 0.303 ± 0.132; B = 0.302 ± 0.145; C = 0.312 ± 0.145; D = 0.346 ± 0.155. Data at session end: A = 0.523 ± 0.152; B = 0.546 ± 0.160; C = 0.534 ± 0.169; D = 0.550 ± 0.142. Data at follow-up test: A = 0.765 ± 0.137; B = 0.770 ± 0.144; C = 0.750 ± 0.136; D = 0.752 ± 0.133.
This study explores four formality levels of a peer-teaching activity during small group work. The four levels were randomly practiced in 30 similar group assignments of about 15 students. It was hypothesized that the cooperation would be stimulated in small groups in which more pressure was put on the successive peer-teaching event because of the increasing appeal to the individual accountability of the students to the shared result. Cooperation was believed to be reflected in student satisfaction with the group session, in perceived participation and in learning gain.
When analyzing the results of the questionnaire in a single test―by applying the Friedman test―the students noted that their overall satisfaction together with their perceived participation increased from group A to D. Also students in group D were on the whole―as tested with Wilcoxon’s signed ranks test―more satisfied and perceived more participation than the students in group C. Thus, the hypothesis on satisfaction and perceived participation is substantiated by these results. This is further underpinned by the analyzed results of the separate questions. Group A scored significantly the lowest compared to the other groups in half of the questions asked. Group B perceived less participation than group D, and group C was less satisfied with the small group work than D. Provided that the registered student opinions were related to their sensed cooperation during the preparatory phase of teaching in the small group, these results suggest that cooperation is stimulated by announced higher required levels of formality of the successive peer-teaching activity.
The learning gain was not significantly different between the four groups. So regardless of the way the students were instructed to peer-teach, equal learning gains were the result. Two underlying aspects might be at hand:
1) the possible increase in anatomical knowledge in the peer-teaching groups with higher formality might have been undetected by the anatomical knowledge test. The test was designed to measure overall knowledge of the studied material, but the students worked together in small groups on just one of the three clinical cases. This means that additional knowledge gain in the peer-teaching groups with higher formality, if present at all, might have been too small to have been registered with the test. So, we should have made three subtests in order to be able to monitor the possible increase of a specific small group working on one specific clinical case.
The same test was used for all three anatomical knowledge tests, so a test effect was present in this study. But since all four groups underwent the same tests, the test effect is expected to have influenced the data from all groups to the same magnitude.
2) the increased perceived participation in peer-teaching groups with higher formality did indeed, as the results show, not lead to higher learning gain in those groups. This could mean that the kind of instructions for a small group preparing for a peer-teaching activity really do not matter. The mere fact that the students work in a small group seems to be enough to reach the desired knowledge gain and the level of formality does not add up to additional knowledge gain. A group mark for the quality of the peer-teaching performance might had increased the cooperation in the groups and as a consequence the learning outcomes. Also adding an even higher formality level, for instance by using the presentations as a summative test, might had increased the outcomes of this study.
Possible group differences in the results of the follow-up test 2 weeks later may have been masked by the fact that all students had engaged in other learning activities in order to pass the final course examination, an effect that was also described by Raupach et al. (2010) . This masking does not hold for the post-test however, in which no differences were found either.
Peer-teaching has various established effects: it aids to the development of communication skills (Mohamad et al., 2012) and core professional knowledge (Badger, 2010) , it prepares for future roles of students as residents (Dandavino et al., 2007) and it helps students to become better learners themselves (Peets et al., 2009) . Preparing to peer-teach has shown to have the same beneficial effects as the execution of peer-teaching (Ten Cate & Durning, 2007b; Gregory et al., 2011; Fiorella & Mayer, 2013) . The current study contributes to the discussion of the benefits of peer-teaching: our results suggest that preparation for peer-teaching in small group work stimulates cooperation in order to reach for a better shared result, although this does not lead to an increase in test scores.
A limitation in this study is the premise that the cooperation felt in the small groups is reflected in the scores of the students about their satisfaction and perceived participation. Videotaped small group work and subsequent analysis of the social behavior of the students and of the quality of their questioning would be needed to elicit elements of cooperation. For now we have to rely on the theory of cooperative learning (Slavin, 1996; Johnson & Johnson, 1999) which postulates that positive interdependence such as a common task of a small group stimulates promotive interaction in the group which affects the efforts of the individual group members to contribute.
Another point is that the students who actually peer-taught were not discarded from further analysis. Gregory et al. (2011) found that preparing to peer-teach combined with actually peer-teaching had a greater learning effect than preparing for peer-teaching only. So, the students who peer-taught should have been considered as a separate group. However, of the about 105 students in each experimental arm maximally 24 were peer-teachers, who all peer-taught just one of the three cases offered. So, the influence of the additional learning experience of the peer-teachers themselves on the results is considered limited. Moreover, if peer-teaching itself would have had an effect on the data than the effect might have been the same for the groups within one particular experimental arm. Next to this, the design of this study aimed to evoke motivational differences during the preparing phase. The actual peer-teaching event was not stressed upon and may have led to “knowledge telling” instead of “knowledge building” (Roscoe and Chi, 2007) . So the execution of the peer-teaching itself was rather comparable in the experimental arms B, C and D. This may have dampened differences in the learning effect that could have been evoked during a more challenging execution of the actual peer-teaching event.
The calculated increases in scores of the questionnaire were highly significant, however the analysis of the individual items showed a majority of differences between A and C and/or D and a minority between the B, C and D. So our conclusions are based more on differences between the no-peer-teaching group and the two formal peer-teaching groups than on the three peer-teaching groups reciprocally. However, significant differences between B, C and D were also found and these still direct to more satisfaction and perceived participation in D.
Our study suggests that higher levels of formality during a peer-teaching activity increases satisfaction and perceived participation of students in small group work, albeit this does not result in more learning gain. These results are in agreement with cooperative learning theory which regards individual accountability as a driving force of cooperation in small group work.
This study was stimulated by a local committee on the enhancement of small group work. One of us (SvK) was a member of this committee. The other participants were: Prof. J. Willems, PhD, G. Bosman, PhD, H. Heereveld, MD, PhD and Prof. D. Ruiter, MD, PhD; two students completed the committee.