Overview: The Connection Machines project is currently working along two fronts: developing a theory of how rapport is built, maintained, and destroyed among teens, and developing a computational architecture and system implementation that allows a virtual peer to build, maintain (and if necessary respond to destroying) rapport in the context of math tutoring.
Our theory building relies on the analysis of verbal and nonverbal behaviors of two partners engaged in a task. Currently we are working on rapport among students and the ways in which it can improve learning gains in peer tutoring. To this end, we have analyzed a number of datasets in which students tutor one another – over chat or face-to-face, and a dataset where students talk aloud while they tutor a teachable agent. In each case, we look at the most minute of details – the function of smiles, eye gaze, prosody and pitch, and verbal content, such as second person pronoun use and commands vs. requests. We also look at higher-level behaviors – conversational strategies such as disclosing negative information about oneself, insulting or praising the other. And we are interested in features that arise from the interaction of the two interlocutors, such as entrainment and mimicry. In each case we use the tools of hand-annotation and machine learning to derive lessons about how people build, maintain and destroy rapport, and how those rapport behaviors correlate with learning.
Our implementation is based on the human-human interaction data, from which we derive rules and patterns for our virtual peer to follow. This involves a number of technical innovations, including generating utterances that serve task and social goals at the same time, representing the state of the user and the system in a more dyadic or interconnected way, and building computational modules to detect level or rapport and generate responses to affect that level of rapport. Ultimately we are working towards a fully automatic embodied conversational agent (ECA) that can engage in peer tutoring setting and fulfill the task of engaging students in a learning task. By making empirical observations of the ways that students learn together and support each other socially, we can begin to build our own "student", one of an age with the human participants, that can be introduced to novel concepts, and that can adjust its way of behaving with its human partner over the course of a school year or perhaps over a lifetime.
Motivation: One of the most interesting, and least understood, fields of behavioral science involves the social substructure of daily life: friendship, politeness, impoliteness, relationship formation, and rapport. We all know that feeling of "getting along" or "clicking" with someone, and the ways in which as a relationship deepens, rapport builds, but there are few comprehensive theories about what the rapport-building process is, and the mechanisms by which it takes place. And yet, it has been shown that increased rapport plays an important role in everyday life: people learn more from teachers and peers with whom they feel rapport, gain more medical benefits from doctors and therapists with whom they feel rapport, are more honest and more likely to complete surveys when they feel rapport with the interviewer, and so on. As computational devices take on an increasingly important and ubiquitous role in our lives, we believe that these devices should know how to build rapport so as to better support their users over time.
We know from research in the Learning Sciences that intelligent tutoring systems can help students learn, often vastly improving what students learn in traditional, lecture-based classrooms. Furthermore, we know that when classroom peers collaborate on a learning project, those students who are friends learn more together than those students who are not friends. Finally, research has shown that peer tutoring can lead to more learning gains for the student who does the tutoring than the one being tutored. When we combine these observations together, we conclude that a computerized peer that can engage in reciprocal peer tutoring – teaching and being taught by a human – and can also develop rapport with that human, may be of great use in the classroom. Imagine a teachable agent that knows social cues well enough to say something impolite if that utterance would improve the chances that the tutor learns more. This is the system we are implementing.
While some researchers have studied “instant rapport,” we focus on long-term rapport, and the ways in which people change their behaviors as they come to know and feel deepening rapport with another person. This will allow us to build devices that truly can become a part of our lives over the long-term.
Empirical Analysis: We have carried out extensive analyses of three datasets to look at the role of rapport in learning. The first, collected by Erin Walker (Walker et al., 2011) contained data on peer tutoring over a chat interface by 130 high school students. We annotated the text data for the social functions of impoliteness and positivity, and the behaviors that might play a role in those social functions (such as criticisms, praise, insults, condescension, complaining, challenges, off-task behavior, etc.). Our analyses showed that negative behavior such as insults actually predicted learning gains (Ogan et al., 2012) and that both positivity and impoliteness could be automatically detected on the basis of the behaviors that make it up (Wang et al., 2012).
These results suggest that social functionality does play a role in peer tutoring, but that the nature of that social talk may not be the politeness and positivity that one might expect.
To follow up, we collected a second data set of face-to-face peer tutoring. We asked 12 dyads of high school students (half of the dyads were friends and half were strangers; half were girls and half were boys) to take turns tutoring one another in linear equations. The students came into the lab 5 times over 5 weeks. During each session both students in the dyad had the opportunity to tutor the other, with social time breaks built in between the tutoring (social time – tutoring – social time – tutoring – social time). Each session was videotaped from 3 angles so as to capture the face and torso of each individual, and a side view showing both participants. At the end of each session participants filled out a questionnaire about their rapport with and liking for the other, and at the beginning and end of the 5 weeks, the students took a test to evaluate their knowledge of linear equations.
We have been transcribing and annotating the more than 90 hours (60 sessions) of human-human data. Based on prior literature in social psychology and communication studies, we have annotated non-verbal behaviors such as eye gaze, head nods, posture shifts and smiles, and verbal behavior such as insults, external vs. internal complaining, positive and negative self-disclosure, reference to shared experience, and more than 20 other phenomena. The longitudinal nature of the data, as well as the differences between friends and strangers has allowed us to see how friends vs. strangers weather frustration, how they manage a task where one partner (the tutor) is given more power than the other, and what kinds of social support strategies enhance learning and what kinds diminish it.
This dataset has also allowed us to automatically detect friends vs. strangers based on their acoustic and nonverbal behavior (Zhou et al., 2013).
Finally, we used a think-aloud protocol to collect data about students tutoring a virtual agent (called a “teachable agent”) to see whether the results we obtained for human-human tutoring translated to a context where one member of the dyad was a computer. Here too, to our surprise, we found that students who insulted the agent, and students who engaged with the agent and referred to it as “you” were more likely to learn than students who were polite, or students who referred to the agent as “she” or “it”(Ogan et al,. 2012).
Theory: Based on the data analysis described above, as well as a thorough investigation into prior literature from the social sciences on the components that make up the experience of rapport, the way people assess rapport in others, and the goals and strategies people use to build, maintain and destroy rapport. we propose a model for rapport enhancement, maintenance, and destruction in human-human and human-agent interaction. In Spencer-Oatey’s (Spencer-Oatey, 2005) perspective, each of these tasks requires management of face, which, in turn, relies on behavioral expectations, and interactional goals. Our data support the tremendous importance of face, as the teens alternately praise and insult one another, all the while hedging their own positive performance on the algebra task in order to highlight the performance of the other. The data also contain numerous examples of mutual attentiveness and coordination as input into rapport management. Unlike prior work such as Tickle-Degnen (Tickle-Degnen & Rosenthal,1990) and the computational work that is based on it, we found it difficult to code positivity independently of its role in face. Therefore, our model posits a tripartite approach to rapport management, comprising mutual attentiveness, coordination, and face management (Zhao et al,. 2014)
Rapport Model (Enhancement/Maintaining)
Rapport Model (Destruction)
Architecture: Having proposed a theoretical framework for rapport management, we have also proposed a computational architecture that allows virtual agents to enhance, maintain and destroy long-term rapport with their users. The proposed architecture is presented in the following figure, and is described in (Papangelis et al.,2014)
Computational Dyadic Architecture for Rapport Management.
The technical innovations represented by this architecture include its dyadic nature, meaning that updates and grounding are done by taking into account both sides of the interaction – both human and agent. While we defined rapport-management strategies above, their effect is not guaranteed (and therefore cannot be grounded) until we observe the user’s reaction. To achieve this, it is necessary to represent a dyadic state modeling what has been grounded; a model of the user, representing the system’s beliefs about the user; and a putative virtual agent state inside that user model, representing the system’s beliefs of how the user perceives it. The data structures in our architecture, derived from our theoretical model of rapport, include the dyadic state (left in the following figure) representing the current state of rapport and a user model (right) containing information we learn during the interaction.
Representation of the dyadic state (left) and user model (right)
We continue to iteratively analyze the data and use the results to update our theoretical framework which, in turn, allows us to innovate the computational architecture for a rapport managing virtual peer.
Demo: This demo is presented in Fourteenth International Conference on Intelligent Virtual Agents (IVA 2014) (Zhao et al,. 2014)