To improve the skills of interviewers and quality of psychological counseling, we focused on reflections of interviews. This chapter summarizes the background of this research and previous studies and describes the purpose of this research.
1.1. Visualization and Quantification of Psychological Counseling
In various cases of psychological support, including assessment and community action, psychological interviews (i.e., counseling) are adopted, and the quality of the interview determines the quality of psychological support. Consequently, the interviewers are required to constantly improve their skills. Egan, G. suggested that interviewers “build on new competences acquired through solving challenges faced in the actual interview based on their knowledge and skills,” emphasizing the importance of understanding the evolutionary process of skills (Egan, 1998). It is useful to visualize psychological interviews to understand this evolutionary process. Also, Corey and Corey have focused on the interaction between clinical psychologist and client in the interview and argued that self-understanding and self-analysis are important in reflecting on the interview (Corey & Corey, 1993). Furthermore, Schön, D. A. has defined a psychological interview as an event in which the interviewer sets new questions in the process of interview each time and improvises his/her response to the questions (Schön, 1984). He then explains the response in reference to the “reflective practitioner” model (Schön, 1984). Thus, reflection of interviews is important in improving the interviewer’s skills and in examining the quality of the interview, for which it is thought that visualization of the counseling process is useful.
In many cases, reflection of the psychological interview is carried out by evaluating the transcribed interview and the interviewer’s reflection by a group or a supervisor. In this method, as the interview is evaluated as a flow based on the transcript, the evaluation varies significantly depending on the supervisor or group, which entails a problem of objectivity (Flyvbjerg, 2006; McLeod, 2010; McLeod & Elliott, 2011). In response to this problem, many attempts have been made to quantitatively evaluate the quality of the psychological counseling (e.g., Kodama, Hori, Tanaka, & Matsui, 2018; Kodama, Tanaka, Shimizu, Hori, & Matsui, 2018; Nagaoka & Komori, 2008; Ramseyer & Tschacher, 2011). As quantitative evaluation should return similar results with different supervisors or discussion groups, it might offer an objective perspective in the reflections of psychological interview, improve reflection and the interviewer’s skills, and contribute to the psychotherapy and psychological support as a whole based on the interview.
1.2. Previous Studies of Descriptions of Psychological Counseling
1.2.1. Qualitative Method
In reaction to the free-style writing suggested by Freud (1901, 1909), a case study method using one or more than one cases has been developed as a way of evaluating psychological interviews (Stiles, 2007; Turpin, 2001). McLeod (2010, 2013) has identified four types of case study methods: the outcome-oriented case study, theory-oriented case study, pragmatic case study, and experiential or narrative case study (McLeod, 2010, 2013). Among these, the pragmatic case study is a relatively detailed approach that clarifies the effect of the treatment plan and intervention, and it is recommended for use with quantitative indices (McLeod, 2010, 2013).
One of these relatively detailed approaches is a method that divides the interview into qualitative chunks according to a certain theoretical framework to evaluate them. For example, Ivey’s qualitative method uses a five-phase structure: Micro-counseling refers to a training program developed to aid the acquisition of counseling skills, and Ivey proposes structuring the interview in five phases, “Rapport: to develop rapport and get information for structuralization”, “Information gathering: to get more information about the client’s competences and determined his/her constructive resources”, “Goals: to listen to the client’s want and discuss about mutual setting of goals”, “Working: to discuss about possible ways of achieving his/her goals”, and “Completion: to help the client’s action and generalization”, and also presents a framework for a qualitative description of the psychological interview (Ivey & Authier, 1978).
The strengths of these qualitative methods include the following: They are diverse, allowing discussion from multiple points of view; they make longitudinal examination easier; and it is easier using such methods to take into account contextual information and to present practical behavioral schema as narratives (McLeod & Elliott, 2011). On the other hand, problems such as the following have been pointed out: The method could be easily biased, it has low generalizability, and it is difficult to identify causation (Flyvbjerg, 2006; McLeod, 2010; McLeod & Elliott, 2011).
1.2.2. Motivational Interviewing Skill Code (MISC)
MISC is a method that attempts to evaluate interviews quantitatively. The motivational interviewing is a client-centered and goal-oriented intervention method based on empirical evidence that supports the client in his/her efforts to explore and resolve ambivalence for behavioral change, focusing on what and how the client describes his/her behavioral change in his/her statement. A statement that suggests the client’s orientation for change is referred to as “change talk” (for example, “I have to stop drinking”) and a statement that aims to maintain the status quo (for example, “I can relax with drink”) is referred to as “sustain talk” (Miller, Moyers, Ernst, & Amrhein, 2013).
It is assumed that the motivational interviewing would facilitate the client to make decisions about change in a client-centered manner, and that the interview integrates concrete behavioral skills. Miller and Rose (2009) have identified two major factors in the motivational interview: relational and technical factors (Miller & Rose, 2009). The relational factor in the motivational interviewing refers to the process in which the empathizing and accepting interviewer builds a collaborative relationship with the client, supports the client’s autonomy, and strengthens the client’s motivation for change. The technical factor comprises certain behaviors of the interviewer to elicit and strengthen the client’s statement (Moyers, 2014). It has been empirically shown that an effective integration of relational and technical factors encourages the client’s utterance at the interview and leads to the treatment outcome (Miller & Rose, 2009).
Researchers mainly use the observation evaluation tool called the Motivational Interviewing Skills Code (MISC) to evaluate the language used in the interview by the interviewer and client, behavioral mechanisms, and effective factors (technical and relational). MISC is designed to evaluate the coherence of the motivational interview and evaluates MI consistent behaviors (MICO), interpersonal relational style, and the client’s statement about behavior change in the motivational interviewing. It counts MI-consistent responses (e.g., affirmation, emphasizing client control) and MI-inconsistent responses (e.g., offering advice without permission, confronting, directing, warning) and provides an overall evaluation based on the interpersonal relational style, i.e., the interviewer’s behavior through the session, such as showing empathy and acceptance. Therefore, while examining the interview process using MISC in this manner, it is important to pay attention to the relational and technical factors that interact to elicit the client’s behavioral change.
MISC 2.5 (Houck, Moyers, Miller, Glynn, & Hallgren, 2010) is used by the evaluator to code an audio-recorded work sample. MISC 2.5 is a comprehensive method to identify the interviewer’s and client’s linguistic behaviors deemed to be important in the process of the motivational interviewing. Using MISC 2.5, the evaluator allocates specific codes to each utterance in order to evaluate sequentially correlated flow of the interviewer’s and client’s utterance. This provides information regarding the direction of the motivational interviewing: How the interviewer is focused on the topic of certain behavioral changes. MISC 2.5 applies six dimensions, empathy, acceptance, collaboration, evocation, autonomy/support, and direction, which are measured on a 5-point Likert scale as global ratings of the interviewer. To measure the global rating of the client, a self-exploration scale is used. In contrast to the overall impression of the interview shown in the global rating, the behavior count shows specific utterances in the interview that correspond to a particular behavior and each behavior count is included in the generated sum. MISC 2.5 codes the interviewer’s behavior into 17 categories (simple reflections, complex reflections, etc.). A simple reflection is the interviewer’s statement in which he/she restates the client’s language. A complex reflection is the interviewer’s statement wherein he/she assumes what the client wants to say before anything has been said, and thus, adds meaning to the client’s statement. The client’s utterances are coded into 15 categories (Follow/Neutral/Ask, change talk, or sustain talk). In MISC, the utterance in the interview is coded based on the audio recording, allowing a quantitative description and evaluation of the interview.
1.2.3. Natural Language Processing
Quantifying general conversations could be executed by transcribing the interview and applying natural language processing. The dynamic topic model (Blei & Lafferty, 2006) is a natural language processing method to detect punctuation marks by word frequently transition. The dynamic topic model allows researchers to construct a probability model of the shift in topic in chronological sentences. As for quantifying the result of the psychological interview through natural language processing, Can et al. tried to detect word reflection (Can et al., 2016).
While dynamic models can be used to detect chronological changes in topics, the topic model (Blei et al., 2003; Liu, et al., 2016), another major natural language method, can identify topics in specified sections. There are several technical challenges in applying analysis with the topic model to clinical psychological data with relatively diverse content from a small number of cases, as it is difficult for the interviewer to form an assumption of the generated result due to complex processing and to process it accurately without large amounts of data. A classical method that is less likely to fail is text tiling (Hearst, 1994), in which conversation is divided into simple chronological chunks, on the basis of which characteristics of the language of the conversation/period are extracted. In text tiling, a window of a certain width is created around an utterance, and by determining how different the word’s utterance rate in the window is from that in the overall conversation, we can divide the conversation. The extraction of characteristic words is also possible for each punctuated conversation period, through methods like Term Frequency-Inverse Document Frequency (TFIDF, Luhn, 1957), which is commonly used.
1.2.4. Interpersonal Bodily Synchrony
Recently, nonverbal behavior like interpersonal bodily synchrony has been also investigated to describe one aspect of psychotherapy quantitatively using a video-image processing technique (e.g., Nagaoka, Yoshikawa, & Komori, 2006; Ramseyer & Tschacher, 2011). It has been investigated widely in human communication studies including psychology (for reviews, Bernieri & Rosenthal, 1991; Keller, Novembre, & Hove, 2014; Schmidt & Richardson, 2008; Vicaria & Dickens, 2016). Interpersonal bodily synchrony is considered to interconnect with psychological factors and social functions, such as building social relationships (rapport), affiliations, or empathy, and facilitating communication (Bernieri, 1988; Bernieri, Gillis, Davis, & Grahe, 1996; Hove & Risen, 2009; Maurer & Tindall, 1983; Wiltermuth & Heath, 2008).
In clinical psychology, interpersonal bodily synchrony is considered a significant factor or relevant element of psychotherapy skills (Tschacher & Bergomi, 2011; Tschacher & Pfammatter, 2016). Previous studies have indicated that interpersonal bodily synchrony is associated with session-level processes and therapy outcomes (Ramseyer & Tschacher, 2011) and that interpersonal bodily synchrony can act as an indicator reflecting the mental process of a client or counselor (Nagaoka et al., 2006). These studies have applied quantification methods (e.g., image-processing techniques, time series analysis) to video data to measure interpersonal bodily synchrony, using objective evidence to reveal the importance of such interpersonal bodily synchrony in psychotherapy (for reviews, Koole & Tschacher, 2016; Ramseyer & Tschacher, 2006). We suppose that interpersonal bodily synchrony can reflect psychological processes and social relationships in psychotherapy.
In the long-term, this study purposes to develop a system that can offer objective indicators that would help in improving psychological interviewing skills. As mentioned previously, as qualitative methods have both strengths and weaknesses, more useful information can be obtained for post-interview reflections by supplementing the weaknesses of the qualitative method with quantitative techniques.
The present paper aims to prove the concept of visualization and punctuation of psychological interviews using quantitative methods in addition to the conventional qualitative method. To this end, we combined a semi-quantitative method in the form of MISC and quantitative data analysis methods (text and video data analyses) with qualitative evaluations. Finally, we visualize the data of a case and examine whether this contributes to reflection of the interview.
One male university student (21 years old) and one female psychotherapist (64 years old) participated in our experiment. Participants were recruited from the universities to which the authors belong. The selected participant was physically and psychologically healthy. The psychotherapist was one of the coauthors, who has more than 10 years of clinical psychotherapy experience. The client was counseled for 50 minutes in the counseling session. The client said he had nothing to tell but, after reporting his work at university and talking about how to make contact with his friends, he began to talk about bullying at school. The procedures were approved by the research ethics committee of Kanagawa University, where the experiment was conducted. Each participant provided written informed consent to participate in this study. The student was paid 1,000 JPN yen/hr for his participation.
The participants faced each other and were seated 1 meter apart. A video camera (Handycam HDR-PJ540, Sony) (24 FPS) and two microphones (4071 Miniature Omnidirectional Microphone, DPA microphones) were used to record and collect audio-visual data. Two microphones were separately connected to the audio mixer (R-26 Portable Recorder, Roland). The left and right microphones were used for the interviewer and the client, respectively. For data processing and analysis, a personal computer (LET’S NOTE CF-SX2, Panasonic) and software (MATLAB R2014a, MathWorks) (R, 3.1.2) were used. To quantify how much each participant moved, a video image processing technique was performed on a laptop computer (MacBook Pro 13-inch, Apple) with Motion Energy Analysis software (MEA 3.10, Ramseyer & Tschacher, 2011).
The data were qualitatively analyzed by several researchers and clinical psychologists who observed the video and interviewed the psychotherapist in the experiment. We also identified the phase punctuation of the session in terms of Ivey’s micro-counseling as a qualitative method. At the same time, we conducted various quantitative analyses explained in the Introduction section: MISC, text data-based analysis, and video-based analysis in terms of interpersonal bodily synchrony. Finally, the results of the qualitative and quantitative analyses are compared and discussed.
In the experimental room, a video camera was positioned 3 meters from the participants. The interviewer was seated waiting for the client in the room before the client entered. She was asked for psychological counseling for 50 minutes, and the experimenter told her that the client was a male university student. She was not informed of the client’s problem before the session. In another room, the client was informed that he could have a free counseling session for 50 minutes. After the client arrived at the room, he sat down in the chair facing the interviewer. This was the first contact between them. The experimenter left the room after attaching microphones to them and started recording. The counseling session then started and was videotaped by the camera for about 57 minutes. After the session finished, the interviewer called the experimenter into the room and he stopped the recording.
2.4. Data Analysis
2.4.1. Qualitative Method
To qualitatively examine the content and proceedings of the interview, the current study employed the five-phase structure of micro-counseling suggested by Ivey (Ivey & Authier, 1978). Micro-counseling is a training program developed to aid the acquisition of counseling skills (Ivey & Authier, 1978). It is seen as focusing on counseling skills that form the basis of various psychological treatment and counseling theories and as constituting the basic skills or basic model of counseling (meta model) common across schools. The interviewer reviews various counseling skills through micro-counseling and learns to construct the interview intentionally. Ivey has identified five phases structuring the interview: “rapport and structuralization,” “information gathering,” “mutual setting of goals,” “working,” and “completion” (Ivey & Authier, 1978). Furthermore, “the client’s search for positive quality” is positioned at the core of the five phases. As we assume that this should be reflected on the interviewer’s attitudes, in the current study a qualitative analysis based on the micro-counseling model was conducted by the interviewer, one of the authors.
The study used MISC 2.5 Japanese version. The recorded audio data was punctuated by utterance and coded using CASSA Application for Coding Treatment Interactions (CACTI) software, an application software developed by CASSA (Houck et al., 2010). CACTI software is an open-source, sequential coding program.
Coding for the MISC 2.5 was performed in three separate passes. In the first pass, a coder listen non-stop to the entire recording and completes a set of five Likert-type global ratings. In the second pass, the recording is parsed into utterances (i.e., thought units). In the third pass, the coder applies behavioral codes to each interviewer and client utterance. The MISC 2.5 is a true sequential coding system that can preserve the temporal order of behaviors.
Coding in MISC 2.5 is carried out along three paths. In the first path, the evaluator listens to the whole interview, twenty minutes long, and makes an overall evaluation. In the second path, the audio record is punctuated into utterances by the client and interviewer, and in the third path, the evaluator carries out behavioral evaluation of utterance by the counsellor and client.
In the current study, an expert on MISC 2.5 who is also a member of an international organization of motivational interviewer trainers (Eiji Yamada) independently conducted an evaluation; the evaluator was a MINT trainer taught by the first author. While the interview was not intended to do motivational interviewing, the study used the client’s utterances showing his/her ambivalence about the topic, focused on in a motivational interviewing, and his/her behaviors showing change from the perspective of the self-exploration scale, the client’s overall evaluation scale, as the punctuation of the interview.
2.4.3. Natural Language Processing
To extract the characteristic word of each period, in addition to word frequency in the period, word uniqueness for the period with respect to other periods must be considered. TFIDF (Wu, Luk, Wong, & Kwok, 2008) is an effective indicator for this task. The TFIDF for the i-th word in the j-th period is defined as
Here, represents term frequency:
is the inverse document frequency and is defined as
where indicates the frequency at which the word appears in each period. We extract the top 10 characteristic words for each period based on the calculated TFIDF values. In order to extract words that express the features of conversation, words were extracted only from nouns, verbs, and adjectives, and nouns were analyzed as a group of meanings as compound nouns. In addition, nouns that seem to have the same meaning were included as most frequent word.
To calculate TFIDF, all spoken words were textured and decomposed to the word level by morphological analysis, for which we used the tokenize function (https://github.com/python/cpython/blob/3.7/Lib/tokenize.py) in Python 3.0. To extract words representing the character of the period, we limit words to be analyzed in the morphological process to nouns.
In quantitatively dividing the conversation into periods, we used text tiling (Hearst, 1994). Text tiling is a method of dividing the text into periods by calculating the similarity of word groups before and after at each utterance, where similarity is calculated in the same manner as with TFIDF. We set the window width, which indicates the number of utterances of the word for calculating the similarity in the specific utterance, to 30.
2.4.4. Interpersonal Bodily Synchrony
A video image processing technique was used to quantify how much participants moved (i.e., Motion energy; Ramseyer & Tschacher, 2011). After downsampling from 24 FPS to 12 FPS, smoothing by low-pass filter (2nd-order Butterworth filter, cut-off frequency: 5 Hz), and normalization by Z-score, we calculated the mutual information (Fraser & Swinney, 1986) between the time series of the two participants to quantify the degree of interpersonal bodily synchrony. Mutual information is a nonlinear measure of dependency based on the Shannon entropy (Shannon, 1948) that indicates a reduction in uncertainty (i.e., the gain in information about one of the random variables after observing the other) (Ragert, Schroeder, & Keller, 2013). It can quantify the amount of uncertainty about a random variable (X) reduced by the observation of another variable (Y) (Kostrubiec, Dumas, Zanone, & Kelso, 2015). Previous studies applied it to quantify the degree of synchrony/coordination or coupling between the body movements/sounds of participants (Kodama, Hori, et al., 2018; Papiotis, Marchini, Maestre, & Perez, 2012; Ragert et al., 2013). In this study, the mutual information between the body movements of the participants was calculated to quantify interpersonal bodily synchrony for each 30-second segment of the session.
3.1. Qualitative Method
In analyzing the interview, as the interview was carried out with an experiment participant who did not have an urgent matter for consultation but participated in the interview in order to generate data, we set rapport and structuralization in the five phases as rapport construction and the topic setting for the interview. Information gathering was set as listening to the participant’s story regarding the set topic, while mutual setting of goals, unlike in a normal interview, was not set as the goals of the interviews that the interviewer and client mutually agreed for the interview; for the interviewer, the goal was set as encouraging the participant’s self-understanding by talking at the interview, and when the participant stated his/her goal regarding what was being talked about, this was described as C1’s goal. For working, we used the definition used in the five phases of the interview and included a variety of statements of the participant until the discussion of the topic concluded, such as the selection of the options to achieve the goal when discussing the topic, and the comprehension of contradictions discovered and the determination of action. When we judged that the participant conveyed a summation of the conclusion of the topic under discussion to the interviewer, it was marked as “completion”.
S1 refers to conversations to form rapport and gather information to set the theme of the interview. In S2, information was gathered after having set the theme of the interview “research” based on the rapport construction and structuralization of the interview. While the conversation on the goal of research took place and work to achieve the goal was carried out, the two subject areas were consolidated with the common phrase “the expansion of the sales route.” As the participant managed to confirm what he had been researching so far, he was encouraged to reflect on post-graduation life, which led to the completion. In S3, self-exploration in reference to a relationship proceeded through working (friends → girlfriend) with some resistance, and it completed when the participant stated that his interpersonal relationship style was maintained by effort and motivation and he also had interpersonal relationships in which he could be natural due to the existence of “local friends” (Table 1).
As the interview sample used in the study was not intended as motivational interviewing, we did not carry out an evaluation of “direction,” “evocation,” and “supporting autonomy,” three dimensions of the MISC global ratings of the interviewer. We did not find a complex reflection in the interviewer’s behavior to increase momentum for the client’s specific behavioral change. In this interview, the interviewer mainly engaged in information gathering with closed questions and provided encouragement, reflecting the overall interviewer’s skills factor. Consequently, we focused on examining the ways in which the relational factor exerted influence on the client’s utterance or self-exploration.
As for the punctuation of the interview, we defined the section from the introduction (structuralization) until settling down on the topic (Th.686) as the first phase. In terms of the number of utterances, this accounts for about 44 percent of the whole interview. Much time was spent finding an agreed focus of the
Table 1. Estimated talk period by quantitative method (Structure shows where the segment is located in the five-phase structure of micro-counseling).
interview. In the first phase, the interviewer gathered information by closed questions and simple reflections and led the interview with prompts such as, “oh, is it?” The interviewer showed active and positive interest in the client and maintained an attitude of respecting the client’s expertise and wisdom. Due to this, the client’s utterances increased. Also, at Th.326, the interviewer asked a question about the client’s interests as shown through the interaction so far, “I think you are interested in people.” After this, the client showed the value to himself of talking with local “people.”
In the second phase, ambivalence about continuing to study or finding a job was shown, and the level of self-exploration was evaluated as having deepened by a level. After this, the client voluntarily talked about his concern and the number of utterances touching upon value was increased.
In the third phase, the topic shifted to the ways in which he spent time with his high school friends and fellow seminar members at the university, and the value to behave “in accordance with expectations.” The interviewer accepted the client without judging his behaving “in accordance with expectations” and maintained active interest in him. In the situation in which the relational factor was fully presented, the client stated he found relationship tiring and at the same time showed he spent energy on relationships. The client shifted the topic to his relationship with friends and the amount of utterances showing interest in values in interpersonal relationships increased. The number of utterances touching upon the client’s value was the largest in the third phase. However, we need to bear in mind that this phase is not a punctuation on the basis of the same amount or time of utterance. In the third phase, the client showed that he held strong emotions and his self-exploration was deepest, as seen with reference to the link between emotion and behavior. While expressing a hope to become closer to his fellow seminar members as with his high school friends, he also showed that he was concerned whether he would be accepted as he was (ambivalence). The interviewer did not use the skills factor of exploring and resolving ambivalence in order to facilitate behavioral change but dealt with the client by means of acceptance, empathy, and collaboration.
In the fourth phase, the client made a statement on the coping strategy in interpersonal relationship (behavior) that suggests a sense of self-efficacy such as “I am now trained” and “now I can do this rather well.” The interviewer asked through what kind of efforts and experiences he trained himself, and in general he talked about being bullied in the past, which he would avoid letting others know about. However, the client looked as if he was reporting a past fact and did not seem to be presenting a new insight or emotion, and thus the self-exploration level was not evaluated as deepened (Table 2).
Following the interaction up to the fourth phase, in the fifth phase it was shared that the client was concerned with interpersonal relationship (worry), that he found it comfortable to be accepted as he was, and that he paid great attention to “behavior”. His self-exploration was deepened again as he had re-acknowleged the meaning of the behavior.
Table 2. Estimated talk period by MISC.
3.3. Natural Language Processing
In Phase 1, the name of the place where the participant lived and place names around the university were extracted as characteristic words, which means that proper nouns uttered during the icebreaker were extracted as characteristic words. In Phase 2, words express the character of the place were extracted, which suggests that the participant discussed a more detail about his location. In Phase
Figure 1. Estimated talk period by text tiling.
Table 3. Characteristics words for the conversation period.
3, different place name and commuting related word was observed, which suggest that topic is still general, but the topic transited to different place. In Phase 4, there was a shift to verb related to client’s action, suggesting topic move into the person’s situation. In Phase 5, words suggesting the old days appear, indicating that the problem was spoken. In Phase 6, the word turns into a general term, indicating that the conversation is heading for closing. The last phase did not have many utterances but suggested that the conversation with a different utterance tendency from Phase 4 was taking place toward closing.
3.4. Interpersonal Bodily Synchrony
Figure 2 shows the motion energy of each 30-sec segment. The X-axis is time (56 minutes). The Y-axis is the motion energy value averaged across each 30-sec segment. The gray and black solid lines indicate motion energy of the interviewer (mean = 605.45, SD = 317.35) and the client (mean = 279.86, SD = 289.59), respectively. The dashed line indicates the mean values of the interviewer (gray) and client (black), indicating that the interviewer moved more than the client during the psychological counseling. This result agreed with the result of observation of video data.
Figure 3 shows the mutual information of each 30-sec segment. The X-axis is time (56 minutes). The Y-axis is the mutual information value averaged across each 30-sec segment (mean = 0.094[bits], SD = 0.047). The dot-line circles indicate the three points with highest values of mutual information. In other words, in these segments, the interviewer and client were highly synchronized. The most synchronized segment was the ending scene (Scene 3, 51:30-52:00; mutual information of 0.26[bits]). The second synchronized segment was the later middle scene (Scene 2, 37:00-37:30; mutual information of 0.22[bits]). The third synchronized segment was the former middle scene (Scene 1, 13:30-14:00; mutual information of 0.21[bits]). The values of mutual information in these scenes were relatively higher than the mean value of all segments and twice as high as the standard deviation of all segments.
Figure 2. Motion energy averaged across each 30-sec segment.
Figure 3. Mutual information averaged across each 30-sec segment.
4.1. Individual Discussion of Each Result
4.1.1. Subjective Evaluation
The analysis of qualitative data by means of the five-phase structure defined by micro-counseling found that, if we define the section in the beginning in which rapport was formed as the first phase, the interview developed roughly along two themes afterwards. The first theme, “research,” was shown in scenes in which we could not clearly decipher the interviewer’s intention from the transcript, which matched the interviewer’s own post-interview reflection that it was difficult to gather information, to mutually set the goals, and to set punctuation in working. This shows, as it was in the first half of the interview, that it was difficult to set the focus in the dialogue between the client and interviewer. However, if the interviewer conducted the interview aware of the five-phase structure, it is expected the theme would have been clarified even in the early stages of the interview.
As for the second theme, “interpersonal relationship,” we expected from the transcript that as the client was conscious of this issue, he would feel reluctant to discuss it. However, by continuing the dialogue with the interviewer, while the client managed to maintain his interpersonal relationship style to some extent with effort and motivation, the interview concluded with him by discussing the self-realization that he had also experienced through interpersonal relationships in which he could be himself due to the existence of “close friends.” This meant that the interview concluded with positive content.
While we found these results, these are based on the interviewer’s subjective evaluation and are the results of qualitative analysis of qualitative data. This means there is still a problem of the analyst’s cognition bias.
The interview was not intended to do motivational interviewing. Consequently, it had a weak orientation toward facilitating change in targeted behavior. From the first to fifth phases, relational skills such as empathy and collaboration made it easier for the client to talk.
Many closed questions were asked to gather information in the first phase. The interviewer attempted to verbalize to show empathetic understanding while being careful not to offend the client. At the end of the first phase, the interviewer shared the client’s situation and interest, touched upon the client’s desires and values, such as “I want to chat with others,” and provided stimuli to get the client to talk more about his interest. In the following second phase, the client managed to show ambivalence about whether to continue studying or to find a job. The interviewer showed active interest in the client’s ambivalence and demonstrated her empathy by repeating the client’s utterance as the interviewer understood it. Because of this, in the third phase, the client commented on “concern” about “how to maintain interpersonal relationships,” which could be ambivalent, and the number of utterances regarding desires and values increased. In the fourth phase, the interviewer repeated the client’s words describing the process through which the desires and values shown in the third phase were strengthened. This made the client reflect on his own weaknesses and experience, which were usually hidden from the sight. In the fifth phase, the client himself gave meaning to his interest and concerns and he was successful in reaffirming the ways in which he was trying to optimize his values and behavior in the situation he found himself in.
It is possible to evaluate this interview as one in which the interviewer explored the client’s values, making the most of the relational factor with acceptance, empathy, and collaboration, and succeeded in focusing on the area in which the client was most interested through the interview process.
4.1.3. Natural Language Processing
We divided the interview into six phases by text tiling and detected characteristic words for each phase using the TFIDF information criterion. The results indicated that frequently appearing characteristic words managed to record the flow of the conversation and content of each phase. These results could be utilized as objective indices when subjective/semi-subjective evaluations disagree. Furthermore, it is a simple method that does not require machine power and yet ensures repeatability. On the other hand, as this method is based only on differences in frequencies of words that were uttered, users should be mindful that it does not take into account subjective classification, contextual information such as phases in MISC, or information other than the interview content. It is possible to apply the concept of text tiling not only to text information but also to other contextual information such as body motion sound quality or tone of the sound.
4.1.4. Interpersonal Bodily Synchrony
As reported in the Results Section (3.3), the mutual information analysis detected three scenes where the degree of synchrony was relatively high (Figure 3). Here, we discuss what we could observe in these highly synchronized scenes in terms of psychological and clinical viewpoints and how we can interpret the results of interpersonal bodily synchrony analysis.
The first highlighted scene (13:30-14:00) is regarded as the second “collecting information” phase of the five steps consisting of psychotherapy explained above (Ivey & Authier, 1978). In this phase, the interviewer asked the client about his graduate study. Observation of video data showed that he was active in talking about his study topic and their conversation became lively and interactive. For example, he nodded and said “yes” repeatedly to the interviewer’s questions and sometimes they laughed together. We speculated that the relatively high bodily synchrony between them reflected such interactive communication.
The second highlighted scene (37:00-37:30) is considered part of the fourth “working” phase of the five steps of psychotherapy (Ivey & Authier, 1978). In this phase, the interviewer asked about the client’s human relationships, particularly with his girlfriend. Video observation showed that they had an active and lively conversation with laughing. As in the above phase, we guessed that such active communication led to their bodily synchrony.
The third highlighted scene (51:30-52:00) is considered as the last, “completing” phase of the five steps of psychotherapy (Ivey & Authier, 1978). In this phase, the interviewer listened as he described his human relationships carefully and mentioned his past experiences being bullied. After that, he reflected on his characteristics and how to build a personal relationship, and he spoke the important keyword “how to behave” in this session. This was the first time this phrase appeared in the session and it seemed to be a meaningful answer for him on how to build a good relationship and behave better as his way of coping. Unlike the other two phases above, in this scene we did not observe lively conversation. We found, however, that they nodded deeply and showed emphatic behaviors. These interactions might be reflected in the relatively high bodily synchrony found here.
4.2. General Discussion
4.2.1. Differences between Results
Figure 4 shows the subjective periodization, semi-subjective periodization by MISC, and objective periodization by text tiling as well as interpersonal bodily synchronicity on the same chronological axis.
S1 and S2 were weakly classified mainly based on the tone of the interviewer’s conversation and shows a poor correlation with the other methods. On the other hand, the start of S3 agreed with almost all other methods. It is worth noting that the timing at which a subjective method indicated that the conversation became deeper (the starting point of S3) almost perfectly agreed with the results of MISC and text tiling.
Although M2 was close to M1 in terms of content, as the counsellor’s response was more collaborative/empathetic/accepting, it was distinct from M1. Therefore, this captured change that is difficult to detect with text tiling, which only evaluates the uttered words. On the other hand, the starting point of M3 agreed in all evaluations. M4 is a phase in which the ambivalence of the conversation was further strengthened, and the topic shifted to the content of love
Figure 4. Difference between results of four methods.
relationships, which almost entirely agreed with the content of text tiling. As for M5, in which a closing conversation was detected, as in M2, a change in the direction of conversation was detected, which made it difficult to be picked up by text tiling.
T1 and T2 do not show much change in clinical relationship, but the conversation topic shifted from accommodation in the first half to university life in the second. Thus, text tiling judged these as two separate phases. T2 and T3 were also separated simply because of a shift in the conversation topic, and M2 also captures some degree of change, which suggests it could be even closer to methods such as MISC, depending on how similarity is defined. T6 captures the chit chat that concluded the conversation. To detect and delete chit chat serving to help the client relax at the beginning and end of the interview is sufficiently useful for a simple method such as text tiling.
The last, fifth phase determined by MISC (M5) is an important scene from the psychological and clinical points of view. According to the results of MISC analysis, this scene is also important because it was the moment when the client found an answer, “my behavior is important,” for meaningful coping from his past experience of bullying that was important and of value to himself. We speculate that this scene was important because his self-exploration deepened, and he could obtain a feeling of self-efficacy from finding the answer. Such psychological effects might lead to bodily synchrony.
The subjective evaluation methods are seen to use a wide range of information such as conversational tone. Classification by MISC is a semi-quantitative method, which should yield smaller variation among evaluators than in the case of subjective evaluations. However, it is based on the verbalized content and sound information, and cannot consider nonverbal information such as facial expression. Text tiling is the most objective and complete objectivity can be secured if data cleansing and the parts of speech to be used are standardized. However, it is solely based on uttered meaningful words and cannot capture a change in relationship, unlike MISC. Accordingly, it can be argued that MISC is effective if it is used along with a subjective method that detects phases based on nonverbal information. On the other hand, we need to be mindful that the objective method does not consider much of information an interviewer would use because the range of data it uses is limited. However, by using both a nonverbal analysis such as body motion and verbal analysis such as text tiling side by side or in combination, we can expand the range of information used in the analysis.
4.2.2. Clinical Implications
Interviews without concrete complaints such as in this study pose difficulties to the interviewer, as it is challenging to obtain a positive impression or suggestions even if reflection for the future is carried out. However, qualitative and quantitative analyses conducted in this study suggest the following: The interview roughly followed two main themes “confirmation of progress in research” and “positive approval of one’s interpersonal relationship style” obtained by the client’s reflection on daily life; each theme was completed. The client obtained an acceptable self-image through these processes. Adding quantitative analyses to qualitative ones can lead to positive results from two aspects.
The first aspect is that the result of qualitative analysis can be supported by quantitative methods, which are more objective and reproduceable. In general, the scientific nature of qualitative research based on qualitative analyses is secured by whether the conclusion can be trusted (Saijo, 2005), and (internal) validity would require falsifiability as a result of visualization of the process in addition to transparency of the description of the process through which the conclusion is reached. This is accepted if the described result credibly reflects the data from which the conclusion is drawn (Nochi, 2005). However, in qualitative analyses, whether data are credibly reflected needs to be confirmed by other researchers carrying out confirmatory examination using the disclosed (visualized) data and the analytical process. This suggests a problem in reproducibility.
The second aspect is that it makes easier to confirm the point at which the interview shifts and yields a new realization. For example, in the current study, an examination of what working led to the completion of a theme would show a concrete realization in the interview. The index used in this study sheds light on its various aspect to indicate what went wrong. For example, there was a scene where the interview did not progress smoothly because the interviewer misunderstood what she heard in the first half of the interview, which constituted information that the interviewer was unaware of during the interview. In addition, it has been suggested that differences found in referencing MISC, text tiling, and changes in body motion would lead the interviewer to reflect. When qualitative and quantitative analyses did not agree with one another, the qualitative method captured the inward shift in the conversation topic to the discussion of interpersonal relationship in the second half as a change in the flow of the conversation (a line). On the other hand, quantitative methods such as text tiling captured well the point at which the conversation shifted from accommodation to university life. This is useful to mark an objective shift in conversation that is often overlooked. Consequently, using quantitative indices as well doubly underpins the psychological counseling by underscoring the subjective impression information and by objectively summarizing the conversation.
The abovementioned considerations lead to the conclusion that the proposed method is more effective than the conventional approach for three reasons. First, the quantitative method employed in this study offered reproducible results. All calculations can be automated by fixing some parameters, and the computations can be availed immediately after data acquisition. Second, body movements and frequencies of word usage can provide additional insights that are difficult to attain through conventional methods. For example, although the significance of nonverbal communication is widely recognized, few practices of reflections consider body movements in psychological interviews. Finally, information about punctuations and the characteristics of each punctuated period may be useful as a summary to enhance an interviewers’ grasp and recall past interviews. Figure 4 illustrates an example of the prospective utilization of the proposed method. The proposed method can help interviewers intuitively grasp and recall the contents of their interviews and reflect them based on objective indicators.
4.2.3. Future Directions
Limitations of MISC 2.5 include its limited ability to capture characteristics that work as the key to another relational factor (the interviewer’s considerate self-disclosure to support the client’s emotional adjustment). MISC 2.5 is a tool that can show detailed information regarding the influence that the interviewer’s utterances and strategy exert on the client’s utterances and the interactive processes between the interviewer and client. Consequently, it can be employed to evaluate the quality of practice of the motivational interviewing. As free coding software for MISC 2.5 has developed, the cost of producing transcripts has been reduced. On the other hand, as the coding is carried out based on the interview’s sound data, it is limited regarding characteristics in nonverbal factors such as facial expression and movement and other factors that are key to the relational factor.
In the future, we plan to continue extracting relational and technical factors related to behavioral change based on the audio and video recordings of clinical interview in Japanese based on MISC 2.5 and testing hypotheses as well as accumulating interview data. In addition, we need to make more use of Information and Communication Technology to reduce the cost of coding.
In the current study we applied a frame differencing method (i.e., MEA) and simply quantified the quantity of whole-body motion, which presents the limitation of roughly analyzing the whole-body motion of participants, because different body parts are supposed to have different associations with counseling outcomes (Tschacher, Rees, & Ramseyer, 2014). By using new video image processing techniques such as OpenPose (Cao, Hidalgo, Simon, Wei, & Sheikh, 2018), researchers can detect more detailed information on the human body based on skeletal structure (head, hand, foot, and so on). In a future study, we should apply such a quantitative method to analyze the body movements of participants that have clinically important meanings in greater detail, such as nodding and gestures (Bavelas, Chovil, Lawrie, & Wade, 1992). Of time-series analysis methods, the current study applied a nonlinear method of calculating mutual information to assess interpersonal bodily synchrony. We can also apply other methods of time-series analysis to quantify and visualize psychological counseling in more detail, such as recurrence analysis (Fusaroli, Konvalinka, & Wallot, 2014; Shockley & Riley, 2015), wavelet-based analysis (Fujiwara & Daibo, 2016; Issartel, Bardainne, Gaillot, & Marin, 2014), and causality analysis (Kostrubiec et al., 2015; Okazaki et al., 2015).
Turning to methods to detect objective conversation periods in a clinical psychological scene in which frequency of words by the period varies enormously, it is expected that the use of LCseg (Galley et al., 2003) or Topic Tiling (Riedl & Bieman, 2012) would make the identification of a period with a small number of words more accurate. On the other hand, it is also possible to develop a more pragmatic punctuation method by providing a definition based on an algorithm for effective punctuation in clinical psychology. When significant amounts of data are available, it is not impossible to apply machine learning to learn the interviewer’s punctuation. In any case, the present study confirmed differences in punctuations among each method subjectively and supposed that it was valid to some degree. We should quantitatively clarify the validity, features, and differences of punctuations by our methods using other conversation data. With regard to characteristic word detection, TFIDF used in this study tends to reduce the TF value when the total number of words in a period is large. When the number of words by period varies enormously, there is a limitation in comparing characteristics of words by period using TFIDF. Consequently, using Okapi BM25 (Robertson & Zaragoza, 2010) which adds information about the total number of words in the document to TFIDF would make the calculation more accurate. As in the development of objective methods for conversation, we should note that the use of complex algorithms would reduce ease of understanding and that indications of method accuracy in general technical meanings could differ from those of clinical effectiveness.
Other natural language methods such as dynamic topic modeling can be applied. However, in the clinical scene, the use of highly developed summary-generating technology without guaranteed accuracy is limited. We must confirm the degree to which interviewers need to understand the content of therapy by not going back to the actual recording but by referring to a shortened summary produced by machine learning. For now, it can be argued that visualization level analysis is the first qualification with which the interviewer can understand a calculation algorithm that would be acceptable. Such basic indices include the mora number, the most basic natural language indicator used in our previous study (Matsui, Kodama, Wang, & Shiino, 2017), and various psychological tests using human sensing equipment such as autonomic nerve measurement and salivary amylase used in our previous study (Matsui & Kodama, 2018). It is expected that additional examination of various data and identification of their clinical significance would lead to the construction of a more pragmatic feedback system.
The current study carried out subjective qualitative analysis, semi-quantitative analysis (MISC), and quantitative analyses (natural language processing and motion energy analysis) for single case data. To improve objectivity and reproducibility, more samples and data must be collected, in addition to refinement and addition of quantitative indices discussed in the previous section. Gathering additional data would enable us to statistically examine the validity and characteristics of indices. Furthermore, conducting various kinds of interviews would provide meaningful suggestions on the robustness of indices according to the conversation content and which indicator is best used when. In addition, chronological change of single case data would facilitate an evaluation of each interview based on clearer results from past interviews (e.g., the success rate). It is assumed that technical innovation would help researchers analyze data at the prediction/optimization level. It would lead to the development of a system to improve interviewers’ skills and suggestions for improvement of interviews beyond the provision of indices for the reflection of interviews.
We are grateful to Mr. Taiyo Kato for his helpful advice and the useful Python code which enabled us to conduct natural language analyses.
*All authors equally contributed.
 Bernieri, F. J., & Rosenthal, R. (1991). Interpersonal Coordination: Behavior Matching and Interactional Synchrony. In R. Feldman, & B. Rimé (Eds.), Studies in Emotion & Social Interaction. Fundamentals of Nonverbal Behavior (pp. 401-432). Cambridge: Cambridge University Press.
 Bernieri, F. J., Gillis, J. S., Davis, J. M., & Grahe, J. E. (1996). Dyad Rapport and the Accuracy of Its Judgment across Situations: A Lens Model Analysis. Journal of Personality and Social Psychology, 71, 110-129. https://doi.org/10.1037/0022-3518.104.22.168
 Blei, D. M., & Lafferty, J. D. (2006). Dynamic Topic Models. In Proceedings of the 23rd International Conference on Machine Learning (pp. 113-120). New York: ACM Press.
 Can, D., Marín, R. A., Georgiou, P. G., Imel, Z. E., Atkins, D. C., & Narayanan, S. S. (2016). “It Sounds like...”: A Natural Language Processing Approach to Detecting Counselor Reflections in Motivational Interviewing. Journal of Counseling Psychology, 63, 343-350.
 Fujiwara, K., & Daibo, I. (2016). Evaluating Interpersonal Synchrony: Wavelet Transform toward an Unstructured Conversation. Frontiers in Psychology, 7, 1-9.
 Fusaroli, R., Konvalinka, I., & Wallot, S. (2014). Analyzing Social Interactions: The Promises and Challenges of Using Cross Recurrence Quantification Analysis (pp. 137-155). Cham: Springer. https://doi.org/10.1007/978-3-319-09531-8_9
 Galley, M., McKeown, K., Fosler-Lussier, E., & Jing, H. (2003). Discourse Segmentation of Multi-Party Conversation. In Proceedings of the Association for Computational Linguistics (pp. 562-569). Stroudsburg: Association for Computational Linguistics..
 Hearst, M. A. (1994). Multi-Paragraph Segmentation of Expository Text. In Proceedings of the Association for Computational Linguistics (pp. 9-16). Stroudsburg: Association for Computational Linguistics. https://doi.org/10.3115/981732.981734
 Issartel, J., Bardainne, T., Gaillot, P., & Marin, L. (2014). The Relevance of the Cross-Wavelet Transform in the Analysis of Human Interaction—A Tutorial. Frontiers in Psychology, 5, 1-18. https://doi.org/10.3389/fpsyg.2014.01566
 Keller, P. E., Novembre, G., & Hove, M. J. (2014). Rhythm in Joint Action: Psychological and Neurophysiological Mechanisms for Real-Time Interpersonal Coordination. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 369, Article ID: 20130394. https://doi.org/10.1098/rstb.2013.0394
 Kodama, K., Hori, K., Tanaka, S., & Matsui, H. (2018). How Interpersonal Coordination Can Reflect Psychological Counseling: An Exploratory Study. Psychology, 9, 1128-1142.
 Kodama, K., Tanaka, S., Shimizu, D., Hori, K., & Matsui, H. (2018). Heart Rate Synchrony in Psychological Counseling: A Case Study. Psychology, 9, 1858-1874.
 Koole, S. L., & Tschacher, W. (2016). Synchrony in Psychotherapy: A Review and an Integrative Framework for the Therapeutic Alliance. Frontiers in Psychology, 7, Article No. 862.
 Kostrubiec, V., Dumas, G., Zanone, P.-G., & Kelso, J. A. S. (2015). The Virtual Teacher (VT) Paradigm: Learning New Patterns of Interpersonal Coordination Using the Human Dynamic Clamp. PLoS ONE, 10, e0142029.
 Liu, L., Tang, L., Dong, W., Yao, S., & Zhou, W. (2016). An Overview of Topic Modeling and Its Current Applications in Bioinformatics. SpringerPlus, 5, 1608.
 Maurer, R. E., & Tindall, J. H. (1983). Effect of Postural Congruence on Client’s Perception of Counselor Empathy. Journal of Counseling Psychology, 30, 158-163.
 McLeod, J., & Elliott, R. (2011). Systematic Case Study Research: A Practice-Oriented Introduction to Building an Evidence Base for Counselling and Psychotherapy. Counselling and Psychotherapy Research, 11, 1-10.
 Nagaoka, C., & Komori, M. (2008). Body Movement Synchrony in Psychotherapeutic Counseling: A Study Using the Video-Based Quantification Method. IEICE Transactions on Information and Systems, 91, 1634-1640.
 Nagaoka, C., Yoshikawa, S., & Komori, M. (2006). Embodied Synchrony of Nonverbal Behaviour in Counselling: A Case Study of Role Playing School Counselling. In The 28th Annual Conference of the Cognitive Science Society (pp. 1862-1867). Mahwah: Lawrence Erlbaum Associates, Inc.
 Okazaki, S., Hirotani, M., Koike, T., Bosch-Bayard, J., Takahashi, H. K., Hashiguchi, M., & Sadato, N. (2015). Unintentional Interpersonal Synchronization Represented as a Reciprocal Visuo-Postural Feedback System: A Multivariate Autoregressive Modeling Approach. PLoS ONE, 10, e0137126. https://doi.org/10.1371/journal.pone.0137126
 Papiotis, P., Marchini, M., Maestre, E., & Perez, A. (2012). Measuring Ensemble Synchrony through Violin Performance Parameters: A Preliminary Progress Report. In International Conference on Intelligent Technologies for Interactive Entertainment (pp. 267-272). Berlin, Heidelberg: Springer.
 Ragert, M., Schroeder, T., & Keller, P. E. (2013). Knowing Too Little or Too Much: The Effects of Familiarity with a Co-Performer’s Part on Interpersonal Coordination in Musical Ensembles. Frontiers in Psychology, 4, Article No. 368.
 Ramseyer, F., & Tschacher, W. (2011). Nonverbal Synchrony in Psychotherapy: Coordinated Body Movement Reflects Relationship Quality and Outcome. Journal of Consulting and Clinical Psychology, 79, 284-295. https://doi.org/10.1037/a0023419
 Riedl, M., & Biemann, C. (2012). TopicTiling: A Text Segmentation Algorithm Based on LDA. In Proceedings of the Association for Computational Linguistics (pp. 37-42). Stroudsburg: Association for Computational Linguistics.
 Schmidt, R. C., & Richardson, M. J. (2008). Dynamics of Interpersonal Coordination. In A. Fuchs, & V. K. Jirsa (Eds.), Coordination: Neural, Behavioral and Social Dynamics (pp. 281-308). Berlin: Springer. https://doi.org/10.1007/978-3-540-74479-5_14
 Shockley, K. D., & Riley, M. A. (2015). Interpersonal Couplings in Human Interactions. In C. L. Webber, & N. Marwan (Eds.), Recurrence Quantification Analysis Theory and Best Practices (pp. 399-421). Berlin: Springer.
 Tschacher, W., & Pfammatter, M. (2016). Embodiment in Psychotherapy—A Necessary Complement to the Canon of Common Factors? In European Psychotherapy, 2016/2017 (pp. 5-21). Norderstedt: Books on Demand.
 Turpin, G. (2001). Single Case Methodology and Psychotherapy Evaluation. In Evidence in the Psychological Therapies: A Critical Guidance for Practitioners (pp. 100-121). Abingdon-on-Thames: Routledge.
 Vicaria, I. M., & Dickens, L. (2016). Meta-Analyses of the Intra- and Interpersonal Outcomes of Interpersonal Coordination. Journal of Nonverbal Behavior, 40, 335-361.
 Wu, H. C., Luk, R. W. P., Wong, K. F., & Kwok, K. L. (2008). Interpreting TF-IDF Term Weights as Making Relevance Decisions. ACM Transactions on Information Systems, 26, Article 13. https://doi.org/10.1145/1361684.1361686