Skills training is inseparably linked to surgical practice (Kneebone, 2003). Although simulation offers a safe training environment (Kneebone, 2003; Sutherland et al., 2006) teaching in the operating room (OR) remains central to surgical education (Agha & Fowler, 2015; De Win et al., 2016; Kneebone, 2003; Konge & Lonn, 2016; Li & George, 2017; Sutherland et al., 2006). While being taught by experienced surgeons, it is in the OR that trainees develop a set of skills, tacit knowledge, and abilities to cope with complex, stressful and variable situations (Agha & Fowler, 2015). Training trainees in the OR is a challenging task as teaching and safe surgical care have to be combined. To offer the best possible training it is important to understand how to successfully support learning in the OR. This may be even more important since the time and opportunities available for teaching in the OR are decreasing (Anderson et al., 2013; McCaskie, Kenny, & Deshmukh, 2011; Reznick & MacRae, 2006; Snyder et al., 2012).
The literature on teaching in the OR is extensive. Behaviors like providing feedback, instructions, explanations, demonstrations, asking questions, and using humor have been identified (Alken, Tan, Luursema, Fluit, & van Goor, 2013, 2015; Blom et al., 2007; Chen, Williams, Sanfey, & Smink, 2015; Hauge, Wanzek, & Godellas, 2001; Roberts, Brenner, Williams, Kim, & Dunnington, 2012; Sutkin, Littleton, & Kanter, 2015) and evaluated (Claridge et al., 2003; Jensen, Wright, Kim, Horvath, & Calhoun, 2012; Levinson, Barlin, Altman, & Satin, 2010; Rose, Waibel, & Schenarts, 2011). Novice trainees need these behaviors to be applied intensively, but as trainees progress, the intensity should be reduced and increasingly substituted by reflective feedback (White, Rodger, & Tang, 2016). Although many conceptual frameworks for teaching in the OR are available, most research relies on trainees’ perceptions of satisfaction or expert opinions (McKendy et al., 2017). Research into what specific teaching behaviors contribute to the actual, objective acquisition of skills in trainees seems scarce. Studies on simulation training are also extensive (Dawe et al., 2014; Gurusamy, Aggarwal, Palanivelu, & Davidson, 2009; Sturm et al., 2008; Sutherland et al., 2006); however, it is again hardly addressed which specific hands-on teaching behaviors are used. This makes it difficult to draw firm conclusions and make evidence-based recommendations on how to teach in the OR from a behavior standpoint.
In order to address this evidence gap, we conducted a literature review study according to systematic review principles. Our aim was to identify hands-on, evidence-based teaching behaviors which have been shown to be effective for complex psychomotor skills acquisition in adult trainees. We looked for teaching behaviors that were associated with an objectively measured improvement of psychomotor skills, not a perceived improvement. By gaining more knowledge on this topic and discussing our findings, we expect to be able to express recommendations for surgical teachers to help them teach complex surgical psychomotor skills.
2.1. Scoping Search
A scoping search showed that research on evidence-based teaching behaviors for hands-on surgical complex psychomotor skills was scarce. We considered a skill complex if it was executed with specialized equipment, involves multiple actions and requires conscious cognitive demands. Previous research on surgical skills teaching turned to the fields of sports and music education due to similarities in complexity and training intensity (McCaskie et al., 2011; White et al., 2016). Keeping with this approach, we searched for teaching behaviors in the fields of surgical, medical, sports, and music education.
2.2. Search Terms
Together with a librarian specialized in systematic reviews we defined our search strategy and searched studies published prior to February 14, 2018, in the MEDLINE, PsycINFO and ERIC databases (Table 1). All references were imported into Endnote X7 (Thompson Reuters, Philadelphia, PA, USA).
All references were screened on title and abstract according to a three-stage pre-screening and screening approach (Figure 1). For pre-screening, a medical student was trained using 3 sets of 100 references, independently screened by the student and one author (SA) achieving an inter-rater reliability of Cohen’s Kappa 0.7. The student pre-screened all references using inclusion and exclusion criteria (Table 2). We took a conservative approach: in the case of any doubt, a reference was forwarded to the screening stage.
In the screening phase we applied stricter inclusion and exclusion criteria (Table 3). Two times 100 references were independently screened by two researchers (SA and JML) for training (Figure 1). An inter-rater reliability of Cohen’s Kappa 0.7 was reached. All references were then independently screened by SA and JML, achieving a moderate inter-rater reliability of Cohen’s Kappa 0.6. Disagreements were resolved by a third researcher (CF) who was informed about the disagreement, but who was blinded for the individual decisions made by the other researchers. Papers that were included by CF were accepted for full text screening.
For full text screening we applied the strict inclusion and exclusion criteria (Table 3). Due to the moderate inter-rater reliability obtained during screening, two extra training rounds with a total of 14 papers selected by SA were organized to ensure agreement on the interpretation of the inclusion and exclusion criteria. All remaining papers were screened by SA on full text. CF and JML each screened half the amount. We only included papers in which the effectiveness of teaching behaviors was analyzed in relation to the objective acquisition of skills in trainees. Reference lists of included papers were screened for relevant references.
Figure 1. The selection of the reviewed articles.
Table 1. Search conducted in this systematic review study.
Table 2. Inclusion and exclusion criteria used during the pre-screening stage.
*We consider psychomotor skills to be complex if 1. specialized equipment is required for their execution, and or 2. dynamic decision making is needed to select the proper skill to execute, and or 3. execution of the skill requires conscious attention even after training (e.g., the skill is physically or cognitively challenging).
Table 3. Additional criteria for the screening stage on title and abstract and full text screening.
2.4. Data Extraction, Quality Assessment and Level of Impact
For each experiment in each included study, we extracted general information (authors and field of research), study set-up (research aim and design), outcome measures (evidence-based outcome measures used in the study, data collection methods and bias risk assessment), and the teaching behaviors that were shown to be effective in the trainees’ acquisition of skills.
To assess each experiment’s quality and risk for bias we used the Medical Education Research Study Quality Instrument (MERSQI) and the Newcastle-Ottawa Scale-Education (NOS-E) for quantitative research (Cook & Reed, 2015). The MERSQI focuses on study design, number of institutions used for sampling, response rate, subjective or objective data collection, validity evidence of the applied instruments, appropriate data analyses, and impact of the outcome measures. The NOS-E focuses on representativeness of the trainees, selection and comparability of a comparison group, likeliness for study retention, and use of blinded assessors. Because of their different focus, the two scoring systems are considered complementary (Cook & Reed, 2015).
For each experiment, individual MERSQI and NOS-E items were scored and total scores were calculated (maximum MERSQI score: 18; maximum NOS-E score: 6), and compared to the normative score of 12.3 (MERSQI) and 3.58 (NOS-E) (Cook & Reed, 2015). If an experiment used two methods to analyze data (two outcome measures), total MERSQI and NOS-E scores were given for each method and mean scores were calculated. All scores were assigned by one researcher (SA) and discussed with a second researcher (CF). A consensus process was applied in case of disagreement.
MERSQI and NOS-E total scores give an indication of the overall quality. However, quality assessment should also take into account the individual MERSQI and NOS-E items (Cook & Reed, 2015). Applying the MERSQI and NOS-E enabled us to compare and interpret the quality of studies in relation to the normative scores.
We independently determined the impact levels for each experiment based on recommendations for evidence in medical education from Belfield et al. (2001) who based their work on the research of Kirkpatrick (Belfield, Thomas, Bullock, Eynon, & Wall, 2001). We identified four main impact levels (Table 4).
Of 18,337 references, seven studies describing eight experiments met the inclusion criteria (Table 3; Figure 1). Four were conducted in sports education (Harrison, Fellingham, Buck, & Pellett, 1995; Wulf, Lauterbach, & Toole, 1999; Wulf, McConnel, Gartner, & Schwarz, 2002), two in music education (Duke & Henninger, 1998; Henninger, Flowers, & Councill, 2006); one in critical care education (McSparron et al., 2015) and one in surgical education (Flinn et al., 2016). Data from all experiments were derived from quantitative research methods. Six experiments were entirely focused on adult teachers training adult trainees (Flinn et al., 2016; Harrison et al., 1995; Henninger et al., 2006; McSparron et al., 2015; Wulf et al., 1999; Wulf et al., 2002), while two included both adult and non-adult trainees (5th and 6th graders (Duke & Henninger, 1998); and 15 to 17 years old (Wulf et al., 2002)).
Table 4. Level of impact distinguished in our systematic review study.
Impact levels are ranked from lowest (1) to highest (4). Studies with an impact level of 1a and 1b were excluded from our review study (see inclusion and exclusion criteria; Table 3).
3.1. Study Quality
Using the MERSQI and NOS-E checklists, we identified sources of bias per experiment (Table 5). Total MERSQI and mean total MERSQI scores (ranging from 11.5 - 15) were either near or above the normative score of 12.3 (Cook & Reed, 2015). Total NOS-E and mean total NOS-E scores for all experiments were below the normative score of 3.58 (ranging from 1 - 3.5). Only one experiment used a method which achieved a NOS-E score above the normative score (4 (Wulf et al., 2002)). The level of impact for all experiments was focused on learner outcomes (Table 4 and Table 5). This enabled us to assess each experiment’s quality and interpret the strength of the results.
3.2. Research in Surgical and Medical Skills Training
Flinn et al. (2016) compared the effects of four feedback behaviors on the objective acquisition of complex laparoscopic skills (Table 5). Trainees who received harshly criticizing feedback performed worse than trainees who received encouraging positive feedback, minimal and neutral feedback, and no feedback. Trainees who received encouraging positive feedback did not perform better than trainees who received minimal and neutral feedback, and no feedback. The researchers concluded that it was not positive feedback which improved learning, but harshly negative and threatening feedback which impaired learning.
McSparron et al. (2015) analyzed feedback and instruction behaviors of teachers while they were teaching subclavian central venous catheter (S-CVC) insertion to a trainee who was instructed to show challenging learning behavior. Subsequently, the teachers’ feedback and instruction behaviors were related to the performance of real novice trainees in a consequent training session (Table 5). Positive feedback (interpreted by the researchers as constructive feedback), suggestions as to how to improve, and step-by-step demonstrations were positively correlated. The regular repetition of learning goals was negatively correlated. The researchers concluded that this may be less effective for technical skills teaching.
3.3. Research in Sports and Musical Skills Training
Duke and Henninger (1998) compared two types of feedback in teaching a musical performance skill: corrections focusing on what trainees had to improve versus corrections focusing on what trainees had done wrong (Table 5). No significant differences were found in trainees’ acquisition of skills. The researchers concluded that neutrally provided improvement feedback and fault-focused feedback were equally effective.
Wulf et al. (1999, 2002) compared the effectiveness of two types of feedback and instructions in teaching sports skills, which either externally focused on the task and effect, or internally focused on how to move (Table 5). Regarding accuracy of the trained sports skills, trainees whose teacher provided externally focused feedback and instructions performed better during training and retention, regardless of trainee experience, and especially if teachers had provided feedback after every trial. Trainees could also see themselves how accurate they had performed (trainees could derive perceptual feedback). For movement quality, internally focused feedback and instructions were more effective, but only in novice trainees.
Henninger et al. (2006) compared the teaching behaviors of music teachers in teaching a wind instrument (Table 5). Active trainee involvement, by stimulating trainees to ask questions, think aloud, etc., was shown to be important to teach effectively.
Harrison et al. (1995) compared the effectiveness of volleyball training in which the trainees determined the sequence and speed of progress versus volleyball training in which the teacher made all decisions (Table 5). No clear conclusions on what was more effective could be drawn.
3.4. Strength of the Evidence
The evidence for the described teaching behaviors is mostly weak and limited to improved learner outcomes in a training setting (second lowest level of impact; Table 4). We scored the strength of evidence weak for the results and conclusions drawn by Harrison et al (1995), Duke and Henninger (1998), Henninger et al. (2006), McSparron et al. (2015) and Flinn et al. (2016). Table 5 shows detailed information regarding the MERSQI and NOS-E bias risk assessment.
We scored the strength of the studies conducted by Wulf et al. (1999, 2002) moderate, providing the strongest evidence of all studies included in this review. The mean total MERSQI scores are all above the normative score, and the highest of all experiments included in this review. The mean total NOS-E scores are all around, but mostly below the normative score (Table 5). Still, important risks for bias remain.
Our goal was to identify evidence-based teaching behaviors which improved complex psychomotor skills acquisition in a hands-on training setting, applicable to the OR. Although research was abundant, only very few studies analyzed the effectiveness of teaching behaviors in relation to objective improvement of skills in trainees. Of 18,337 studies only seven studies met the inclusion criteria. Only one study focused on surgical skills teaching (in a simulation setting). Most evidence was derived from studies on teaching sports which, however, contains comparable complex psychomotor skills. We identified evidence-based effective teaching behaviors related to feedback, instruction and active trainee involvement.
Feedback and instructions were effective if provided non-threatening (Flinn et al., 2016) and positive (McSparron et al., 2015), if they contained suggestions as to how to improve and step-by-step demonstrations (McSparron et al., 2015), and if they made trainees externally focus on the task and effect (Wulf et al., 1999; Wulf et al., 2002).
Threatening feedback was found to be harmful to trainees’ skills acquisition (Flinn et al., 2016). The importance of non-threatening feedback is supported by surgical review studies (McKendy et al., 2017; Timberlake, Mayo, Scott, Weis, & Gardner, 2017). Threatening feedback causes stress in trainees (Flinn et al., 2016), which is considered harmful since it is induced by a factor outside the learning task itself (Joëls, Pu, Wiegert, Oitzl, & Krugers, 2006; Schwabe, Joëls, Roozendaal, Wolf, & Oitzl, 2012; Vogel & Schwabe, 2016), namely the teacher. Interestingly, Flinn et al. (2016) did not find any effects of positive feedback, while McSparron et al. (2015) did find an increase in trainees’ skills. Comparing both studies is difficult since exact definitions of positive feedback remained unclear. Duke and Henninger (1998) reported that negatively formulated feedback (focused on faults) and positively formulated feedback (focused on how to improve) were equally effective if provided with a neutral tone of voice. Suggestions as to how to improve were also shown to be effective (McSparron et al., 2015). This is supported by review studies on intra-operative teaching (McKendy et al., 2017; Timberlake et al., 2017). Whether feedback is effective depends on how it is provided, but also how it is received and interpreted (Ridder, McGaghie, Stokking, & Cate, 2015). This is a complex process influenced by variables like tone of voice, intention, non-verbal behavior and prior relationship attributes established. It would be interesting to investigate how these variables exactly relate to the content of feedback messages and the effects on skills acquisition.
Externally focused feedback and instructions on task and effect were superior to improve accuracy and movement quality in sport skills (Wulf et al., 1999; Wulf et al., 2002), especially if intensely provided in trainees with experience, and in a setting of perceptual feedback (Wulf et al., 2002). Only novice trainees improved movement quality more with internally focused feedback and instructions on how to move. In surgical teaching, intensely provided feedback and instructions are considered only effective in novice trainees (McKendy et al., 2017; Timberlake et al., 2017) and if trainees cannot derive sufficient perceptual feedback (Nicholls, Sweet, Muller, & Hyett, 2016; White et al., 2016). The finding that intense externally focused feedback and instructions are effective in trainees with experience and possibilities for perceptual feedback, may shed new light on the role of the focus, content and intensity of instructions and feedback, and the effectiveness for teaching different surgical (sub) skills in relation to trainee experience and perceptual feedback.
Step-by-step demonstrations have also been shown to improve skills (McSparron et al., 2015). The importance of deconstructing skills into a series of small steps is also stressed in surgical reviews (McKendy et al., 2017; Nicholls et al., 2016; Timberlake et al., 2017). It is considered important to only provide trainees with the key instructions as to what to do. Other information may cause cognitive overload and prevent learning through distraction (Nicholls et al., 2016).
Henninger et al. (2006) found that actively involve trainees was effective in skills training. Although active involvement was not clearly defined, they considered it important for teachers to know when to direct trainees, when to allow them to talk, ask questions and verbalize actions and thoughts. Surgical review studies support this finding, although their evidence is primarily based on perceptions and not objective measurements (McKendy et al., 2017; Timberlake et al., 2017). Verbalization by trainees is also considered a key step in surgical skills training because gaining insight in trainees’ reasoning processes helps to teach effectively (Nicholls et al., 2016).
Interestingly, the Peyton approach (Nicholls et al., 2016), which is often used in surgical skills training, seems to be compatible with the identified evidence-based teaching behaviors: non-threatening, externally focused feedback on task and outcome, instructions, suggestions how to improve, step-by-step demonstrations and active trainee involvement. The effectiveness of this Peyton approach may be improved by the integration of these behaviors. However, the Peyton approach requires one skill to be taught and performed at least four times in a row (Nicholls et al., 2016), which, in our view, questions its compatibility to teaching in the OR.
4.1. Strengths and Weaknesses of Our Study
The strength of our review is the focus on the objective measurement of effects of teaching behaviors on the acquisition of complex psychomotor skills. Our review addresses the growing need for optimal OR teaching behaviors as well as objective assessment of training quality and trainees’ skill level to assure safe and effective surgery performed by surgical residents. Our research underlines that much can be gained in the field of surgical educational research. Since only one surgical paper was included one might question the applied inclusion and exclusion criteria we applied. However, the small amount of papers in comparable professions and sports may imply that attention for objective outcomes of different teaching behaviors is generally limited.
The teaching behaviors with the strongest evidence originated from sports research, which questions the translatability of our findings to surgery. Differences exist regarding very fine motor skills training (fingers). However, the intensity and extensity of training necessary to reach proficiency is comparable. Also, proven effective methods in sports, like mental skills training in athletes, have been shown applicable to surgery (Anton & Stefanidis, 2016; Rogers, 2006; White et al., 2016). A limitation of our review study is the use of the MERSQI and NOS-E instruments which are specifically designed for medical educational research. The items on these instruments, however, were formulated in a way that made them easily applicable to other research fields. This review only selected papers written in English. To our best estimation we excluded only seven out of 18,337 papers because they were written in non-English languages. This makes us believe all relevant research is represented.
4.2. Implications for Future Research and Surgical Practice
We were surprised to find a lack of research investigating the effects of teaching behaviors on the objective acquisition of surgical skills. Future research should focus on defining teaching behaviors that are effective for the objective development of complex surgical skills in hands-on training settings (measuring impact beyond perceptions of effectiveness). We advise to study the possibilities for and frequency of externally focused feedback and instructions in surgical skills training and the effectiveness of clearly defined feedback messages in respect of differences regarding formulation, tone of voice, intention and non-verbal behavior. Controlled longitudinal studies are required, both conducted in simulated settings with possibilities of standardizing tasks and teaching behaviors, but also in the clinical real-life OR setting. We advocate to start the incorporation of externally focused teaching behaviors into surgical teaching practice, first in simulation training of simple skills, gradually followed by more complex skills in the real-life OR environment.
Table 5. Summary and bias risk assessment of the experiments which were included in our review study.
M = The mean MERSQI and NOS-E total scores: (A + B) / 2.
* = In case instrument validity was not applicable and the maximum potential MERSQI score was 15, the MERSQI score was corrected and multiplied with a factor 1.2 in order to obtain a potential MERSQI total score of 18.
We would like to thank Alice Tillema, librarian at the Radboud University Medical Centre, for her help in defining and conducting the search applied in our review study.
The authors have no conflicts of interest to report.