Using affective and behavioural sensors to explore aspects of collaborative music making

Our research considers the role that new technologies could play in supporting emotional and non-verbal interactions between musicians during co-present music making. To gain a better understanding of the underlying aﬀective and communicative processes that occur during such interactions, we carried out an exploratory study where we collected self-report and continuous behavioural and physiological measures from pairs of improvising drummers. Our analyses revealed interesting relationships between creative decisions and changes in heart rate. Self reported measures of creativity, engagement, and energy were correlated with body motion; whilst EEG beta-band activity was correlated with self reported positivity and leadership. Regarding co-visibility, lack of visual contact between musicians had a negative inﬂuence on self reported creativity. The number of glances between musicians was positively correlated with rhythmic synchrony, and the average length of glances was correlated with self reported boredom. Our results indicate that ECG, motion, and glance measurements could be particularly suitable for the investigation of collaborative music making.


Introduction
Modern technologies and interfaces for musical expression have provided new ways for humans to collaborate creatively and make music together (Bevilacqua et al., 2013).These advances are being adopted by a wide range of people, from professional musicians, to those less able to play traditional instruments.The devices can be controlled by various means, such as touch surfaces, gesture recognition, and accelerometers (Kuhara and Kobayashi, 2011;Mitchell et al., 2012;Kuyken et al., 2008), allowing sounds to be created through novel modes of expression.Despite their heightened ability to sense human input, these devices predominantly focus on single-user operation and do very little to sense and support the important emotional, interpersonal, and communicative elements of collaborative music making (Carlile and Hartmann, 2005;Fencott and Bryan-Kinns, 2010).
Our research is motivated by a desire to develop interfaces that specifically take into account the presence, behaviours, and emotions of musical collaborators.Such an interface could be integrated into the musical instrument itself, or it could function alongside both traditional and modern instruments.The underlying aim is that it should support and enrich the affective aspects of collaborative performance, and in turn, the creative musical outcomes.In order to achieve this, we must first gain a better understanding of the behavioural and affective processes that accompany collaborative music making and creativity.There is very little existing research in this area, and our work is largely influenced by the fields of Affective Computing and Psychophysiology; both of which deal with the detection of people's inner psychological states based upon externally observable and quantifiable measurements.In the following section we provide an overview of related research in these fields.The remainder of the paper documents a study, where we collected continuous behavioural and affective measurements from pairs of improvising drummers.We use these measures to explore various aspects of collaborative music making.

Related Work
Our multidisciplinary approach is influenced by existing research in a number of different fields.We can broadly organise these studies into two categories: i) studies relating to the measurement of behaviour and affect, and ii) studies dealing with the analysis of co-present, collaborative music making activities.This section discusses relevant work within these two categories.

Measuring Behaviour and Affect
A major component of our research is the use of sensor technologies to measure and quantify aspects of human behaviour and affect.The apparatus and techniques that we employ are influenced by research across a range of disciplines, but most prevalently Psychophysiology, Affective Computing, and Social Signal Processing.We will briefly introduce these disciplines before highlighting literature that is specifically relevant to this study.
Psychophysiology is the study of how psychological experiences (thoughts, feelings, emotions) relate to the physiological activity of the body.The typical approach to psychophysiological studies is to measure physiological variables in the lab, using equipment developed for medical diagnostics.These measurements are then compared to qualitative and quantitative measures of behaviour and experience, according to the specific focus of the study.Over recent years equipment for physiological measurement has become increasingly non-invasive, miniaturised and affordable, making it easier to conduct studies, not just in the lab but also in naturalistic settings (Morgan et al., 2013).Furthermore, these developments are leading towards the integration of physiological sensors in everyday technologies such as phones, smartwatches, and computer game consoles.An advantage of adopting physiological measurement as a means for inferring psychological states is that it often requires no conscious effort on behalf of the user.The development of a car seat that can sense the driver's heart rate and detect tiredness (Edwards, 2013) is great example of this paradigm; involving technology development (Walter et al., 2011), research in psychophysiology (Patel et al., 2011), and an application that revolves around the need for subconscious measurement.
The field of Affective Computing also concerns the measurement of psychological states.However, it focuses more specifically on the development of technologies that are able to recognise, react to, and/or express emotions.The work in this field has mainly focused on categorising the discrete emotional responses of individuals who are presented with pre-recorded, static or virtual stimuli, usually in a laboratory setting (Sariyanidi et al., 2014).
Social Signal Processing (SSP) (Pentland, 2005) is a relatively new domain, which aims to provide computers with the ability to sense and understand human social signals (Vinciarelli et al., 2009).In this context a social signal is defined as a "communicative or informative signal that, either directly or indirectly, provides information about 'social facts', that is, about social interactions, social emotions, social attitudes, or social relations" (Pantic et al., 2011, p. 8).SSP researchers develop tools and techniques for the sensing and machine analysis of behavioural and psychological constructs.A thorough survey of current work in SSP can be found in Vinciarelli et al. (2012).
A common theme within these disciplines is that human experience and affect can be segmented into three components; cognitive (thoughts), behavioural (expressions and actions) and physiological (biochemical and electrical changes in the body).Measuring each of these components presents varying challenges, requiring distinct technologies and processing techniques.The following sections discuss relevant research relating to each component, since our study incorporates measurements of all three.

Cognitive Measurement
Neuroimaging techniques allow us to obtain information about cognitive processes inside the brain.The most commonly used techniques are EEG, which involves placing electrodes on the subject's scalp; and fMRI, which involves the subject lying still in a large magnetic scanner.Both techniques have been used to identify felt emotions by analysing the brain's response to affective stimuli such as images (Schneider et al., 1997) and music (Schmidt and Trainor, 2001).Neuroimaging studies have also sought to uncover links between brain activity and creative behaviour.A comprehensive review of such studies is provided by Dietrich and Kanso (2010), where the authors highlight that the literature is, on the whole, fragmented and inconclusive.They broadly conclude that tasks involving creative cognition induce changes in pre-frontal activity of the brain.On top of this they suggest that creativity may not even be localisable in the brain, given limitations of current neuroimaging systems.
Regarding the practicalities of neuroimaging, more user-friendly EEG systems have been developed in recent years, however they still suffer from high susceptibility to noise, and a poor spatial imaging resolution.fMRI has far better spatial imaging resolution, but the requirement for participants to be stationary inside a large and immobile scanner make it impossible to conduct studies in naturalistic settings.For example, in a study of the neural aspects of musical improvisation, jazz pianists were asked to play whilst lying in an fMRI scanner (Limb and Braun, 2008).

Behavioural Measurement
In the context of our research we are interested in measuring relatively shortterm behaviours (in the order of seconds and minutes), many of which can be categorised as non-verbal communicative acts.Argyle (1978) outlines seven forms of non-verbal communication (NVC): facial expressions, gaze, gestures, bodily posture, bodily contact, spatial behaviour (e.g.proximity), and appearance (e.g.clothing).He models NVC as a simple, communications theory-inspired sequence, whereby a sender encodes a 'social signal', which is subsequently decoded by a receiver.It is implicit that this signal transfer is not error free.One person will never perfectly interpret the non-verbal communicative act of another person.Furthermore, the process of NVC does not always involve conscious awareness.This makes it a particularly interesting parameter in the study of human interactions, as it indicates subtle features of the interaction that cannot be revealed through self-report based measures.
Because we are investigating co-present musical interactions, we will be most concerned with gaze, bodily posture, and spatial behaviour.Gaze can reveal a great deal about the dynamics and nature of co-present human interactions.Numerous studies have shown how gaze is closely synchronised with speech during conversations (Kendon, 1967;Cummins, 2012;Oertel et al., 2012).Additionally, the amount of time people spend looking at each other has been shown to relate to dominance and rapport (Argyle, 1978).Mutual gaze has also been shown to be physiologically arousing (Mazur et al., 1980).Gaze is commonly measured by manually annotating video footage, which is a time-consuming task.However, modern eye-tracking glasses are able to continuously track where someone is looking within a scene captured from a head-mounted camera.
With respect to bodily posture and spatial behaviour, accurate measurements can be obtained using a marker-based motion tracking system, which uses multiple cameras to detect small reflective markers positioned on the body.Glowinski et al. (2013) used such a system to study the bodily movements of a string quartet.Their results suggest that head movement features can be used to distinguish between an engaging and non-engaging performance (as rated by the performers).Healey et al. (2005) examined the spatial behaviour of a group of seven improvising musicians.They observed how the use of space played a complex role in maintaining the coherence of the performance, and drew a number of parallels with conversational interactions.The trade-off with marker-based systems is that they take some time to set up and are not particularly portable.With the advent of the Microsoft Kinect, researchers have been able to measure body movements using a single depth sensitive camera (Hadjakos et al., 2013;Morgan et al., 2013).Other solutions involve using worn accelerometers (Zhou et al., 2008), and pressure sensitive floors (Bränzel et al., 2013).
It should be noted that the non-verbal behaviours defined by Argyle, and discussed above, are all forms of visual communication and do not encompass auditory behaviours.In the study of musical interactions it is also important to consider the effects that auditory feedback might have on NVC between musicians.For example, the timing synchrony between two musicians tapping a rhythm has been shown to improve when the taps trigger sounds, but to be invariable to the effects of visual information (Nowicki et al., 2013).Measuring the physical properties of auditory information, such as loudness and pitch, is easily achieved.However, it is significantly more challenging to measure how auditory information is processed in the brain.

Physiological Measurement
Mechanical, electrophysical and biochemical changes in the body can be measured using surface electrodes and sensors positioned at specific sites on the body.In a study of flow1 during piano playing, de Manzano et al. (2010) measured heart rate, respiration and facial muscle movements while professional pianists gave five performances of a pre-prepared piece.For each performance the pianists were subsequently asked to rate their level of flow using a questionnaire.A significant relationship was found between flow and heart rate variability, respiratory depth, and facial muscle movements.Physiological studies have also been conducted to try and measure musicians' emotions during musical performance (Knapp et al., 2009).A potential application of live physiological measurement in music collaboration has been demonstrated by Mealla et al. (2011), who created an interactive musical tabletop, where physiological signals contributed to the generated sounds.
In a study of the physiological reactions of audience members to a live music performance, Egermann et al. (2013) found that unexpected musical events were generally associated with a rise in skin conductance, and decreased heart rate.A real-world application of these ideas saw the musical score and sequence of scenes in a film being dictated by the emotional responses of the audience, as inferred from physiological measurements (Price, 2011).
Regarding human-human interactions, a study of partner influence during conversation found 'physiological linkage' between the blood pressure (BP) measurements of romantic couples (Reed et al., 2013).Research into user experience with game technologies found differing physiological responses when participants were playing against a computer compared with playing against another human (Mandryk et al., 2006).
In summary, cognitive, behavioural and physiological measurements provide many ways in which we can infer and quantify affective aspects of human experience.In recent years the use of these techniques has started to make the transition from laboratory-based studies to real-world applications.However, there has been comparatively little focus on situations where the interaction is occurring between people.

Co-Present Collaborative Music Making
In the context of our research, we define co-present collaborative music making as any situation where two or more people are jointly involved in the process of creating live music.The following sections discuss research that is relevant to group musical interactions in general, before focusing on improvisation, which is the specific case of co-present collaborative music making that we investigate in this paper.

Group Interactions
Sawyer (2003) proposes two generalised approaches to the study of group interactions: the input-output (IO) approach and the process approach.The former concerns things that take place before and after the interaction, whilst the latter looks at what occurs during the interaction.Quantitative methods can provide interesting insights into group interaction processes.For example, Fencott and Bryan-Kinns (2010) used interaction logs to analyse specific aspects of software-based co-present collaborative music making, such as the amount of co-editing that occurred.Hadjakos et al. (2013) developed a quantitative method for analysing the rhythmic synchronisation of a violin duo using the Kinect motion tracking device.By tracking head movements they were able to demonstrate how complex interaction patterns could be observed.
As an extension to the Theory of Flow (see section 2.1.3),Sawyer (2003) conceives the idea of Group Flow, referring to a state of peak performance at the level of the group, rather than the individual.He points to the importance of factors such as parallel processing (simultaneous awareness of self and collaborator(s)) and visual attention in establishing a state of group flow.However, Sawyer does not back up his theory of group flow with a substantial amount of evidence, and leaves it somewhat unsatisfactorily defined.A similar concept is that of mutual engagement, which has been highlighted as an important feature of group musical interactions (Bryan-Kinns, 2013).Bryan-Kinns and Hamilton (2012) have developed a Mutual Engagement Questionnaire (MEQ) for evaluating the mutual engagement qualities of different musical interfaces.
Emotion also plays an important part in group interactions.In particular, we may be interested in the processes by which the emotional representations of a person or group influence the emotions of another person or group.This is often termed emotional contagion.There is a lack of research on emotional contagion in performing musicians, however it has been suggested that it is one of the mechanisms by which people experience emotions while listening to music (Lundqvist et al., 2008).Studies of group interactions have shown that emotional contagion affects group processes such as task performance and cooperation (Barsade, 2002).Similar processes of influence have been observed in relation to behavioural displays.Behavioural mimicry is the process whereby actions or emotions represented by one person subconsciously cause congruent behaviour in another person.There is a growing body of evidence for behavioural mimicry (see Chartrand and Lakin (2012) for an extensive review), and its links to cognitive processes.In the musical domain it has been suggested that behavioural mimicry contributes to temporal assimilation and coordinated variations in intensity and intonation during ensemble performances (Keller et al., 2014).

Group Improvisation
Group musical improvisation is defined as a spontaneous process, whereby creative contributions are made within the restrictions of real-time performance (Kenny and Gellrich, 2002;Wilson and MacDonald, 2012).Research on group musical improvisation is limited, and existing studies have predominantly focused on jazz music.Seddon (2005) used videotapes of six jazz musicians during rehearsal and performance in order to investigate modes of communication during jazz improvisation.He defined three modes of non-verbal communication: instruction, cooperation, and collaboration.Instruction involves the demonstration of musical ideas through vocalisation, or use of an instrument.Cooperation occurs when the musicians are producing a cohesive performance with the inclusion of musical and visual cues.Collaboration is the state where musicians are able to stimulate the spontaneous generation of creative contributions within the group, communicating exclusively through musical interaction.During collaboration, Seddon defines the musicians as being 'empathetically attuned' -a state of mutual empathy that encourages the musicians to take risks and challenge each other's creativity.
Regarding the cognitive factors involved in improvisation, Kenny and Gellrich (2002) propose eight processes: short, medium, and long-term anticipation; short, medium, and long-term recall ; flow status (see section 2.1.3);and feedback (decisions based upon previous experiences).Similar cognitive processes were also identified by Biasutti and Frezza (2009), who developed an Improvisation Process Questionnaire, which they gave to 76 experienced musicians.In addition to anticipation, use of repertoire, flow, and feedback, the authors highlight emotive communication as an important ability required by improvisers.
Comparisons have been made between jazz improvisation and conversation (Monson, 1996;Sawyer, 2005), whereby a common language or vocabulary is used alongside rules (e.g.grammar) in order to form a coherent and emergent interaction between two or more people.Conversation analysts have described how interlocutors use non-verbal behaviours such as eye gaze (Kendon, 1967), and body position (Kendon, 2010) to maintain successful conversations.There is already some evidence that similar processes may occur during collaborative musical creativity (Healey et al., 2005).A noteworthy difference between conversation and musical improvisation is that the former is characterised by turn taking (Sacks et al., 1974), whilst the latter involves simultaneous contributions.Musicians must, therefore, continuously monitor the contributions of their collaborators, whilst also providing their own novel contributions.This involves a combination of both conscious and sub-conscious processing (Sawyer, 2003).
Finally, it is worth considering how the moods and emotions of group members might influence the creative aspects of improvisation.A theory for the relationship between emotion and creativity is provided by the 'dual pathway' model (De Dreu et al., 2008) (see Fig. 1).This model suggests that emotions with high arousal (e.g.anger, elation) lead to greater originality and creative fluency (the number of ideas, insights and solutions generated) when compared to low arousal emotions (e.g.sadness, serenity).It also proposes that positive emotions contribute to this process by facilitating greater cognitive flexibility and inclusiveness; whilst negative emotions facilitate increased persistence and perseverance.It is important to note that this model was developed based upon studies of individual creativity, and it is not clear how it might apply to group creativity.Existing studies of group creativity have predominantly explored the influence of long-term mood on creative output, often in a workplace setting (Amabile et al., 2005;Jamison, 1996).A notable feature of these studies is that they tend to focus on correlations between discrete emotional states and overall creativity.There is an absence of research that addresses continuous affective interactions and their real-time influences on creative tasks.

Summary
Our research is concerned with the use of sensor technologies to explore aspects of collaborative music making.Each of these technologies has associated techniques for the collection of useful data.Following data collection, existing concepts can be used to guide the analysis and interpretation of the data.In this section we have reviewed a wide range of topics with associated technologies, techniques, and concepts.We have seen how motion sensing technologies can be used in tracking and mediating interactions, and measuring the subtle movements of musicians.Whilst small, wireless physiological sensors are used to sense and recognise affect and social signals.Techniques for collecting data in both controlled and naturalistic settings have been described.These include the strictly controlled fMRI scans of improvising pianists, and the collection of subjective and physiological data from audience members watching a live performance.Finally, concepts relating to the phenomena we wish to investigate have been discussed.These include the theory of group flow, the dual pathway approach to creativity, and concepts surrounding non-verbal communication and auditory behaviours.In the following sections we will see how specific sensor technologies and techniques were chosen for our study, and how certain concepts influenced the subsequent analysis of our data-set.

The Study
Incorporating methods and techniques from the research discussed above, we designed an exploratory study to gather subjective and continuous quantitative measures from pairs of co-present, improvising drummers.Co-present interaction was chosen due to our interest in non-verbal communication and the effects of presence.To investigate the importance of visual contact between participants we decided to use two conditions -one where the participants were visible to each other, and one where they were separated by a screen.Under both conditions the pairs of drummers performed two 5-10 minute improvisations.This was deemed long enough for them to engage with the task and improvise a range of rhythms, without becoming bored or fatigued.Improvisation was chosen due to our interest in musical creativity.Our study was not founded on specific hypotheses, instead our aims were to: Assess challenges and issues associated with the experimental use of behavioural and affective sensors to investigate collaborative music making.
Report exploratory findings to guide and inspire future research.
Identify which measurements and features are most suitable for the investigation of collaborative music making.
We chose to use drumming in our study because it presents some noteworthy advantages over other forms of musical expression.In particular, beat timing and velocity can be accurately recorded using electronic pads.Large amounts of motion are involved, which increases the information conveyed visually, through movement.There is also an absence of melodic content, which might otherwise influence the participants' affective responses.We simplified the experiment further by requiring that each participant only used one hand to drum on a single drum pad.This also enabled us to use the non-drumming hand to collect physiological data, using sensors attached to the fingers.The following sections provide more detailed descriptions of the study design and data collection methods, as well as the steps involved in processing this data.

Research Design and Data Collection
We used a within-subjects design, where each pair of drummers performed drumming tasks under each of the two conditions -visual (V ) and non-visual (N V ).This design was selected due to high individual variability in drumming ability, behaviour, and affective responses.Throughout the remainder of this paper we use the word "dyad" to refer to a specific pairing of two drummers.We then distinguish each drummer within a dyad as participant 1 or 2. We use the notation Dx.py.C to indicate the dyad (D), participant (p), and condition (C).

Participants
Participants were recruited via email lists and word of mouth.We required that all participants had prior drumming experience and were confident enough to improvise rhythms on-the-fly.Five pairs of participants took part in the study (2 mixed-sex pairs, 3 male pairs).Participants in each dyad knew each other, and three of the dyads had previously played music together.We specifically chose to pair people who were known to each other, since we believed that this would maximise the amount of non-verbal communication that we could observe.However, it is beyond the scope of this study to include interparticipant relationship variables as part of the analyses.The participants were aged 26 to 34 (M =29.1, SD=3.1), they had been playing percussion for between 1 and 17 years (M =7.4,SD=5.0), and their self-rated level of expertise ranged from 2 to 4 (M =2.7, SD=0.7) on a five point scale representing novice (1) to expert (5).Participants were offered £20 as an incentive and signed a consent form before partaking in the study.The study was given ethical approval (QMREC2013/48)2 .

Measures
We chose to collect a wide range of measurements so that we could evaluate different techniques and explore correlations between various types of data and extracted features.To measure heart rate and perspiration we used small  (53mm × 32mm × 19mm) wireless ECG and GSR sensors developed by Shimmer Research.We used the Emotiv EEG headset to wirelessly record 14 channel EEG measurements from each participant.All of the physiological sensors contained accelerometers for recording motion.For the drums we used two identical Roland V-Drum electronic drum pads.By recording MIDI data from the pads we were able to log the exact timing and velocity (strength) of each drum beat.Three video cameras were set up -one facing each participant, and one overhead camera to capture the entire interaction.Figure 2 shows an annotated image taken from the overhead camera.
A post-performance questionnaire (PPQ) was designed to collect self-report (SR) data from each participant while they reviewed video footage of their improvised performances.The PPQ asked participants to rate their individual levels of creativity, engagement with the other participant, energy, positivity and boredom on a 9-point scale; as well as who they thought was leading the performance (1 = 'All me', 9 = 'All them').The full PPQ is shown in Fig. 3.The first item is loosely based upon the widely used Consensual Assessment Technique (CAT) (Amabile, 1982), which proposes that the best way to measure the creativity of an artefact is to simply ask experts in that field to provide a creativity rating.In our case, we were specifically interested in the participant's own subjective rating of their creativity.Previous studies have found moderate correlations between self and expert-rated creativity using the CAT (Hennessey et al., 2011).Item 2 is adapted from the Mutual Engagement Questionnaire (Bryan-Kinns and Hamilton, 2012), developed specifically for musical interactions.Items 3, 4 and 6 are closely related to the items activation, valence, and dominance from the Self Assessment Manikin (SAM), which is commonly used in affect research (Gunes and Schuller, 2013).Item 5 is adapted from a questionnaire used to study user experience with entertainment technologies (Mandryk et al., 2006;Mandryk and Atkins, 2007).

Data Synchronisation
We used two computers and three separate applications (MATLAB, Emotiv TestBench, Logic Pro) to record the continuous measurements.To synchronise this data we placed the physiological and EEG sensors on top of one of the drum pads and hit the drum 10 times.This meant that we had 10 clearly identifiable, short-duration peak events in the EEG and physiological accelerometer data, accompanied by 10 MIDI note events and 10 visible video events.When processing the data, we were able to use these events as reference points, allowing us to align all of the data sources to a high (millisecond) precision.

Setup
The study was held in a room designed for performance studies, with stage lighting set up to make it feel like a live music venue.The drum pads were positioned in the centre of the room, with speakers either side (see Fig. 2).The two computers were placed behind blank screens at one end of the room; this was also where the experimenter sat during the performances.ECG modules were strapped around each participant's waist, with the electrodes attached to their chest.GSR modules were placed around the wrist of their non-drumming hand, and the electrodes were strapped to their index and middle finger.The EEG headsets were placed on the participants' heads, and electrodes were individually adjusted to obtain a good signal.

Tasks
The experiment consisted of two warm up tasks followed by an improvisation task.These three tasks were performed twice, once under a non-visual (N V ) condition, then again under a visual (V ) condition (see Section 5.1 for a discussion of why we chose a fixed order and potential issues associated with this choice).In the N V condition the participants faced a blank screen so that they were unable to see each other.In the V condition they faced towards each other with no obstruction, other than the drum pad.The first warm up task (∼ 1 min) required the participants to hit their drum in synchrony with a metronome click track at a tempo of 110 bpm.The second warm up task (∼ 1 min) required them to repeat a set rhythmic phrase, which they listened to and learnt prior to the task.These initial tasks allowed the participants to get used to playing the electronic drum and to drumming with one another.The improvisation task (Improv, ∼ 6-10 min) required the participants to improvise with one another, where the only restriction was that they did not use verbal communication.Verbal communication was prohibited because we wanted to simulate a live performance environment, where musicians would not normally use verbal communication.Following completion of the drum tasks, the participants sat individually and watched the overhead videos of their two improvised performances.After each minute3 of video they were asked to complete all the SR items on the PPQ, in relation to that particular minute of their performance.

Data Processing 3.2.1. Preparation
The EEG, ECG, GSR, MIDI, and self-report (SR) data were imported into MATLAB.For each dyad the accelerometer synchronisation peaks and MIDI note events were used to align the data to a common start point (t 0 ).Using the video footage we found the start and end times of each task, relative to t 0 .For each data source these time points were used to extract and label datasets corresponding to measurements for each participant within each task and visibility condition.

Feature Extraction
Features were extracted from individual data-sets according to the type of data they contained.We initially segmented the data from the Improv tasks according to the 1 or 2 minute time windows used for the self report questionnaires.We then extracted features from each of these windows (Improv w ), for each participant within each condition (N V or V ).This was done to enable us to analyse relationships between extracted features and self report measures, as detailed in the following section.We found that some of the EEG, ECG, and GSR data contained artefactual readings, due to movement and poor electrode contact.We manually labelled these noisy data using a binary coding (1 = noisy, 0 = clean), so that it could be recognised and automatically excluded from our subsequent analysis.

ECG:
We used ECGtools4 to filter the raw ECG data (sampled at 51.2 Hz) and extract the R-peaks, which correspond to individual heart beats.The distance between consecutive peaks was then used to find the instantaneous heart rate (HR) values.These values were interpolated to give an evenly spaced time series from which we extracted the mean, SD, maximum, minimum, the positions of maxima and minima, and the number of extrema divided by the task duration.
GSR: Skin conductance response (SCR) has been shown to be a useful metric in analysis of GSR data (Kim et al., 2004;Kim and André, 2008).We used Ledalab5 to extract the timing and amplitude of SCR events from the raw GSR data (sampled at 5 Hz) using Continuous Decomposition Analysis (CDA).Again, interpolation was performed and the mean, SD, positions of maxima and minima, and number of extrema divided by task duration, were calculated from the SCR amplitude series.
EEG: EEG signals contain frequency components that relate to the firing activity of neurones in the brain.Standard Theta, Alpha, and Beta frequency band power values are often computed in EEG studies, as they provide information on cognitive activity (Chaouachi and Frasson, 2010).Using EEGlab (Delorme and Makeig, 2004) we initially bandpass filtered the EEG signal (sampled at 128 Hz) between 3 and 30 Hz.We then performed manual artefact rejection to remove noisy segments of data caused by head and facial muscle movements.The Emotiv EEG recordings consist of 14 channels of data, relating to sensors at different positions on the scalp.Artefactual channels were removed entirely and the average power over all remaining channels was computed within the following standard frequency bands: Theta (4-7 Hz), Alpha (7.5-12.5 Hz), L-Beta (12.5-25Hz), and H-Beta (25-30 Hz).
Motion: We took the accelerometer readings from the ECG, GSR and EEG sensors and summed the absolute values of the axial components for each sensor.This gave us approximate mean quantity of motion (QoM) values for the head (EEG), torso (ECG), and non-drumming hand (GSR).QoM has previously been shown to be one the most successful motion features for classifying gestural representations of emotions (Castellano et al., 2007).

MIDI:
The number of beats per second, SD in time between consecutive beats, and mean velocity were computed as MIDI features.We compared the beat onset times for participants in each dyad (tP 1 and tP 2) and considered any beats that occurred within 70ms of each other to be perceptually synchronous rhythmic events, as suggested by Dixon (2001).For these beats we calculated the time difference (tP 1 -tP 2).We then found the mean over all the absolute difference values.This provides an indicator of the timing synchrony within the dyads.The MIDI data was also used to manually annotate the rhythmic change points (RCPs) for each participant.RCPs were defined as the time points in seconds (relative to t 0 ) when the participant changed their rhythm from a previously established pattern -defined as a fixed-length sequence of beats, which had been repeated at least twice.The identification of RCPs is illustrated in Fig. 4, which shows a section of MIDI data from one of the participants.We can see that the position of the MIDI notes on the score, as well as the velocity  of the notes (denoted by colour) helps identify two distinct rhythmic patterns.In addition to visual analysis we were also able to listen to the rhythm in order to identify RCPs.
Video: We have already described how video footage was used as an elicitation tool to help the participants provide PPQ responses (see Section 3.1.2).In existing studies of conversational interactions video footage has also been used to annotate gaze behaviour (Oertel et al., 2012;Cummins, 2012).We used the footage from the front-facing video cameras to manually annotate the time points when a participant was glancing towards their collaborator during the Improv tasks.The start and end time of each glance was recorded in seconds (relative to t 0 ).We then used these data to calculate the number of glances, percentage glance time, number of mutual glances, and percentage mutual glance time, within each time window (Improv w ).

Analyses & Results
The aim of our analyses was to use our rich data-set to undertake an exploratory investigation of dyadic, collaborative music making.To facilitate this we structured our analyses over three levels -i) individual; ii) dyad; and iii) whole study.This allowed us to analyse general trends and observations over the entire study whilst also paying attention to interesting and suggestive idiosyncrasies at the level of the individual and the dyad.Sample videos of the performances are provided online.

Individual-Level Analyses 4.1.1. Data visualisation
At the level of the individual we were particularly interested in how specific events within the improvisation tasks might be linked to observable changes in the continuously captured physiological and motion-based measures.This interest was inspired by previous studies identifying patterns of physiological change in response to musical events (Egermann et al., 2013), and computer game events (Ravaja et al., 2006).Within the context of our study, the two main event based measures that we obtained were the rhythmic change points (RCPs), and phases of glance (see Section 4.3) for each participant.We refer to these as continuous features since they derive from continuous measures -audio/midi and video.Figure 5 illustrates a simple time-series plot that we designed to enable us to visualise these events alongside other continuous features.By generating such a plot for each dyad within each condition, we were able to visually explore potential patterns within, and relationships between, continuous measures.

Relationships between heart rate and RCPs
One of the most interesting observations to come from our visual analysis was that RCPs for three of the participants seemed to be closely aligned with extrema (peaks and/or troughs) in the participant's heart rate plot.To test whether this relationship was significant we extracted the heart rate extrema time points, for each participant, within each condition, and calculated how many of the corresponding RCPs were aligned.We arbitrarily defined points as aligned if they fell within a quarter of a second of each other.Figure 6 shows an example of the extracted extrema and RCPs, which in this case appear to be aligned to heart rate minima.
Our null hypothesis was that the number of aligned points would not be significantly greater than the chance outcome.In order to estimate the likelihood of a given number of alignments, we considered each set of RCPs as a randomly distributed set of discrete time points, each of which has a probability of being aligned with an extrema point, defined by: The numerator in Equation 1 represents the total number of time points on which an RCP could be considered aligned, where n e is the number of extrema, W is the window size, and res is the time resolution (seconds between samples).The denominator represents the total number of time points, where T s is the length of the task in seconds.Our window size and resolution were 0.5 and 0.1 seconds respectively.These values were chosen based upon the precision of our heart rate data and expected error due to RCP annotation inaccuracies.
Our p-values were subsequently calculated using the complementary cumulative distribution function (CCDF): Where F is the binomial CDF, Equation 2gives the one-tailed probability of obtaining at least n a aligned points, given the total number of RCPs (n r ) and the probability (P (aligned)) of a single alignment.This is a suitable statistical test for our null hypothesis, since we consider the number of aligned points to be randomly distributed, with a known probability distribution.Table 2 presents the results of this analysis.We can see that four of the participants show significant alignment percentages, where: For each of the participants with significant alignments in Table 2, we plotted their average heart rates over the 7 seconds preceding and following all of their RCPs.We performed this analysis because it allowed us to look in more detail at how individual participants' heart rates changed on average leading up to, and preceding RCPs.This method was successfully used by Egermann et al. (2013) and Ravaja et al. (2006) to reveal time-windowed patterns in event-based physiological responses.The results are shown in Fig. 7.For D1.p1.NV, heart rate rises in the pre-RCP phase and peaks around 2 seconds after the RCP.This moderately corresponds with our results in Table 2, where we showed a significant percentage of maxima alignments for this participant.The strength of the correspondence may be diluted by the fact that we have averaged the HR across all RCPs, and this participant also had a significant alignment percentage for all extrema.For D1.p2.V the changes in HR are also subtle, however the mean HR does appear to rise prior to the RCP and drop immediately after.This supports our significant results for maxima alignments in the visual condition.For D3.p2.V we see that the mean HR peaks around 1s prior to the RCP, and troughs around 3s after the RCP.This does not correspond well with the results in Table 2. Again, this could be a result of the averaging effect, as we can see that this participant also had significant alignments with all extrema.For D5.p1.NV we can see that a substantial drop in HR occurs during the Pre-RCP phase, followed by an immediate rise during the Post-RCP phase.This strongly supports the significant result for non-visual minima alignments presented in Table 2.In this case the participant had no corresponding maxima alignments in that condition, which might be why we see such a pronounced minima alignment in the plot.

Dyad-Level Analyses 4.2.1. Correlations between self report scores
We were interested in how well participants' self report scores correlated within dyads.The reason for this is that it provides an indication of the level of within-dyad agreement with respect to the subjective evaluation of collaborative music making experiences.We used the windowed SR data in order to perform one-tailed, pairwise Spearman correlation analysis between sets of equivalent SR scores within dyads.Spearman correlation is an appropriate test because we cannot assume that linear correlations exist, nor that the variables are normally distributed.The results are shown in Table 3, along with the mean correlation across dyads for each SR item.We can see that the majority of correlations are weak, indicating that participants did not generally agree with each other on the subjectively rated aspects of their performances.Correlations for the item leadership are strongest (mean r S = −0.46),with significant results for dyad 3 (r S (8) = −0.72,p < .01)and dyad 4 (r S (10) = −0.54,p < .05).

Heart rate synchrony
For the HR measurements we were interested in undertaking a more focused analysis of correlations between the continuous time series data for each participant across entire Improv tasks.This interest was inspired by the studies on emotional contagion and physiological linkage reviewed in Section 2, which suggest that aspects of co-present experiences might be apparent in physiological measurements.As a starting point for our analysis, we were curious as to whether synchronised changes in HR occurred within dyads, as this would serve as a basic indication of potential physiological linkage.The requirement for synchrony is that a change in one participant's HR should be accompanied by a corresponding change in the other participant's HR.Therefore, to carry out this analysis we differentiated each participant's continuous HR data and performed within-dyad comparisons of the slope at each time point.If two participants' heart rates shared the same direction of change (positive or negative slope) then we labelled that time point as 'in-sync'.We were then able to calculate the proportion of synchrony between participants.The results are shown in Table 4, where we see that the values all lie close to the chance value of 50%, since 0% and 100% correspond to perfect negative and positive synchrony respectively.This suggests that there was no synchrony between HR changes within dyads.Consequently, we decided not to explore a more detailed analysis of heart rate synchrony.

Study-Level Analyses
In order to perform our analyses at the level of the entire study, we had to take into account the nested structure of our study.Each windowed data  point comes from an individual, nested within one of the two conditions.These individuals are, in turn, nested within dyads, which are nested under the entire study group.A popular approach to the analysis of data that is structured over multiple levels is the use of linear mixed models (LMMs).Numerous studies have adopted LMMs for the analysis of continuous and subjective human measurements (Glowinski et al., 2013;Egermann et al., 2013;Baayen et al., 2008;Ravaja et al., 2006).Linear mixed models estimate the relationship between a dependent variable and associated covariates by taking into account both fixed and random effects.They also allow for missing data points for subjects (West et al., 2007), which is the case with our data-set.We used the MIXED procedure in SPSS (Version 22) statistical analysis software to model the relationships between pairs of continuous features.For the random effects we specified random slopes and intercepts for participants.Random effects were not included at the level of the dyad, since dyad association did not generally have systematic effects on the individual participant responses.The analysis was carried out with data across both conditions, as well as within conditions.For the former, we specified in the random effects model that the covariates were nested within the condition.The Restricted Maximum Likelihood (REML) method was used for parameter estimation.To select the best covariance structure for each model (either variance components, or unstructured ) we used likelihood ratio tests to compare the -2 Restricted Log Likelihood values for the models obtained using each of the two covariance structures.If the two models were not significantly different (p < .05)then we selected the simpler covariance structure (variance components).The covariance structure used for each analysis is specified in the notes below the corresponding results table.

Correlations between continuous features
Much like our analysis at the level of the individual in Section 4.1, we were interested in how glance and RCP events related to changes in other continuous measures.Table 5 presents the results obtained in our LMM analysis (see Appendix A.1 for more details).Since there are many potential combinations of our 28 continuous features, we only present the significant results.All of the results come from the analysis within the visual condition.We see that the mean absolute inter-beat lag is significantly negatively correlated with the number of individual glances (t(29) = −2.37,p < .05)and mutual glances (t(27) = −2.79,p < .01).Mean HR is significantly negatively correlated with the number of RCPs (t(26) = −2.19,p < .05).The first two results are closely related, since there is a collinearity between the number of mutual glances and the number of individual glances.This is due to the fact that mutual glances occur when the individual glances of two participants within a dyad are concurrent.
In a similar manner to our individual level analysis in Fig. 7, we decided to plot the study-level mean heart rate during the 7 seconds preceding, and following RCPs.This was calculated as the mean of the set of all the 15 second normalised HR recordings centred upon every corresponding RCP.The results can be seen in Fig. 8, which shows that in the 7 seconds following rhythmic change (post-RCP) there is a fairly linear average increase in HR.For the pre-RCP phase we see a slight rise and fall in HR occurring in the 5 seconds leading up to a RCP.

Correlations between continuous features and self report scores
Using the same LMM procedure as above (see Appendix A.2 and Appendix A.3), we performed an analysis of relationships between continuous features and self report scores.Again, the analysis was performed across both conditions, and within conditions.Our aim was to explore which continuous measures might be suitable as predictors of subjective experience.Significant (p < 0.05) correlations are presented in Table 6.We see that across the entire analysis all of the self report items have at least one associated continuous measure that gives a significant correlation.The mean body quantity of motion is positively correlated with creativity (t(37) = 3.59, p < .001),engagement (t(31) = 3.01, p < .01),and energy (t(33) = 3.16, p < .01).There are significant correlations between the beta-band EEG features and leadership.Again, these correlations are strongest in the NV condition, with leadership correlating with mean L-Beta power (t(21) = 3.33, p < .01),and mean H-Beta power (t(34) = 3.25, p < .01).In the NV condition mean H-Beta power is also positively correlated with positivity (t(22) = 2.70, p < .05),and negatively correlated with boredom (t(21) = −2.30,p < .05).Leadership is also strongly correlated with the number Estimates calculated using variance components (VC) covariance type or † unstructured covariance type.
The visual condition analyses result in the fewest correlations, with three results that are exclusive to this condition.Firstly, engagement is negatively correlated with the mean MIDI velocity (t(36) = 2.66, p < .05).Secondly, there is a significant correlation between percentage glance time and boredom (t(26) = 2.10, p < .05).Finally, the only GSR-related correlation arises here, with a significant negative relationship between the number of SCR extrema and boredom (t(27) = −2.77,p < .01).

Effects of visibility condition
Our final study-level analyses were carried out to evaluate the potential effects of the visibility condition on both subjectively reported and continuously measured aspects of the improvised performances.We used a similar LMM approach to that adopted above, with visibility condition set as a factor, and random slopes and intercepts for participants (see Appendix A.4). Significant (p < .05)and noteworthy (p < .15)results are shown in Table 7, where we see that the only significant effects of visibility condition are on the creativity SR item, t(9) = −2.26,p < .05.The negative t-value indicates that participants felt more creative in the visual condition, than the non-visual condition.

Discussion
We believe that this study provides three distinct contributions to research on collaborative music making and the use of affective and behavioural sensors in studies of human interactions.Firstly, the techniques, challenges and issues associated with our novel study design serve as a valuable reference for researchers planning to conduct similar studies.In addition to this, our findings pose interesting research questions, which relate to existing concepts, and serve as inspiration for future studies.Finally, our assessment of various measures will assist researchers in deciding which measures and associated sensor technologies are best suited to their particular studies.The following sections discuss these three contributions of our study in more detail.

Design
We designed a study to assess the challenges and issues associated with the experimental use of behavioural and affective sensors to investigate collaborative music making.The conflict between statistical and ecological validity is a noteworthy challenge for researchers attempting to undertake such studies.In our case we are unable to account for the ways in which the experimental environment and measuring devices may have influenced the participants.In Section 2 we saw how researchers investigating musical improvisation and ensemble interactions have chosen to tolerate unnatural and highly controlled experimental settings, such as the confines of an MRI scanner; and abstract tasks, such as finger tapping.We chose a one-handed improvised drumming task because we felt it was representative of a basic on-the-fly musical collaboration, whilst also satisfying practical requirements.However, the specificity of this task means that we must take caution in generalising our findings to other forms of musical interaction.
An additional issue, specific to our design, was the ordering of the visual and non-visual conditions.Our decision to use a fixed order was made partly because randomised ordering would not have been effective with our small study size.In addition to this we thought that holding the V condition first would have allowed participants to use non-verbal communication to appraise aspects of their performance.These appraisals could then have had a noteworthy influence on how the participants performed in the second improvisation.As a consequence of this decision, our results may have been influenced by ordering effects.We highlight the potential presence of these effects when discussing specific findings below.

Findings
The level at which various analyses should be performed is an important consideration for researchers investigating dyadic interactions.It depends both on the nature of the data, and of the questions posed by the researcher.Our broad analyses were performed at the level of the individual, dyad, and entire study.At the level of the individual we looked at how specific events in the performances related to continuous measures.This allowed us to reveal correlations that may not have emerged during study-level analyses, due to cross-participant random effects such as personality factors and musical ability.Conversely, study-level correlations between subjective and continuous features may not have achieved significance at the level of the individual, due to small sample sizes.Our findings are discussed in more detail below, according to their level of analysis.

Individual-level
At the level of the individual, our visual analysis of the continuous data resulted in us pursuing a more detailed analysis of potential relationships between HR extrema and RCPs.The results in Table 2 and Fig. 7 indicate that the extent of these relationships varies a great deal between participants.For the majority of participants the relationship is weak or non-existent.However, for some participants it is significant.A plausible explanation for these alignments is psychophysiological linkage.The reasoning behind this is that heart rate is linked to arousal and stress, and participants are most likely to change rhythm when their arousal is too high (e.g. the rhythm is too challenging) or too low (e.g. the rhythm has become boring).Another interesting consideration is that HR and RCP alignments are a representation of the participant attempting to maintain a state of flow, whereby they never become overly challenged, or overly bored by the task.The alignments for D5.p1 are especially interesting, as they appear to correspond solely with HR minima.During an informal post-performance interview this participant commented that she had been particularly nervous about playing in the presence of the other participant, who she knew well.She also had 11 years less drumming experience than her collaborator.It is possible that this anxiety could have contributed to a pronounced rise in HR following new rhythmic contributions.This lends support to our suggestion that HR and RCP alignments were a result of psychophysiological linkage.Furthermore, it highlights the possibility that other factors, such as the participants' personalities, expertise, and relationships, may have influenced these findings.A more focused study is required in order to verify whether a causal relationship exists between HR extrema and on-the-fly creative musical decisions.In particular, it would be worth considering whether there are other performance-related events that also coincide with HR extrema (e.g.changes in the musical dynamics or specific non-verbal exchanges between musicians).

Dyad-level
Our analysis of correlations between SR scores at the level of the dyad indicated that participants did not generally agree upon aspects of their experiences, and the creativity of the performances.These results are similar to those of Schober and Spiro (2014), who found that two jazz musicians did not have a high level of agreement on self reported evaluations of their performances, despite the performances being judged to be successful improvisations.The most significant and consistent correlations were for the leadership item.This is a potentially interesting result as it suggests that dyads were more attuned to the functional leader-follower aspects of their interaction than they were to the affective aspects, such as energy and positivity.It has been suggested that the establishment and communication of leader-follower relationships is an important tool in the creation of dynamic musical collaborations (Reidsma et al., 2014).
Our analysis of HR synchrony within dyads suggested that synchrony did not generally occur for participants in this study.The existence of physiological linkage between co-present individuals is understudied and previous evidence has been found only in resting participants (Reed et al., 2013).In our case, the active nature of the drumming task may have been the dominant factor in determining changes in HR over time.

Study-level
Our study-level analyses of relationships between paired sets of continuous features revealed significant correlations between glance-based features and the mean absolute participant inter-beat lag.These results suggest that glancing at the other participant increases the amount of timing synchrony (decreased lag) between participants.This agrees with existing research, which has shown that visual contact can have a positive influence on timing synchrony between musicians (Vera et al., 2013).We also found a significant negative correlation between the number of RCPs and the mean HR (t(26) = −2.19,p < .05).This supports our suggestion that RCPs occur at points of low physiological arousal (see Section 5.2.1).If this is the case, then we would expect the arousal to increase following rhythmic change.Evidence for this can be seen in the plot in Fig. 8, which shows an average increase in HR in the 7 seconds following rhythmic change.
Our study-level analyses of correlations between continuous features and self report scores revealed that body motion is positively correlated with creativity (t(37) = 3.59, p < .001),engagement (t(31) = 3.01, p < .01),and energy (t(33) = 3.16, p < .01).This aligns with previous research, which found that quantity of movement features can discriminate between low and high arousal emotions (Castellano et al., 2007).The fact that body movement is also correlated with creativity lends support to the dual pathway model discussed in Section 2.2.2, which highlights the importance of arousal in the generation of creative ideas.H-Beta EEG power is significantly correlated with leadership in both the cross-condition (t(32) = 3.15, p < .01)and NV analyses (t(34) = 3.25, p < .01).We also found H-beta correlations with positivity (t(22) = 2.70, p < .05)and negative H-beta correlations with boredom (t(21) = −2.30,p < .05).Previous studies have associated beta activity with engagement and cognitive challenge (Budzynski et al., 2008).This concurs with our findings, as we would expect people to be more engaged and cognitively aroused when leading the performance.Within the visual condition, the negative correlation between engagement and mean velocity (t(36) = −2.66,p < .05) is interesting, since it suggests that participants hit the drum more softly during periods of high engagement.This might be reflective of participants playing more quietly in order to direct their attention towards the other musician's playing.This result is consistent with descriptions of jazz musicians' multiple levels of attention, incorporating an awareness of both the self and the other (Sawyer, 2003).Also of interest is the correlation between percentage glance time and boredom, which suggests that prolonged glances might be used within the performance to communicate boredom; or that they might be interpreted as indicators of boredom when reviewing the video recordings of the performance.The same can be said for the negative correlation between boredom and the number of SCR extrema, since SCR events have been shown to be concomitant with emotional arousal (Benedek and Kaernbach, 2010).
The lack of significant correlations between creativity and EEG features is interesting because it may be indicative of the use of contrasting thought processes during creative action.This concurs with previous EEG research, which has struggled to show conclusive links between creativity and localised brain activity (Dietrich and Kanso, 2010).Indeed, the literature on creativity often refers to the roles and interplay of different styles of thinking in the development and generation of ideas.For example, the terms associative and analytic have also been used to define two contrasting modes of thought involved in the creative process (Gabora, 2002;Simonton, 1975).
During our study-level analyses of correlations between continuous features and self report scores we drew attention to the fact that the results for the NV condition yielded more, and stronger correlations than those for the V condition.A potential explanation for this is that participants' non-verbal behaviours and physiological responses are more closely tied to their personal subjective experiences when they are unable to see the other musician.This idea is influenced by the theories of emotional contagion and mimicry (see Section 2.2.1), which suggest that being able to see the other participant might influence changes in behaviour and physiology.More work would need to be done to establish whether or not this is an influencing factor when correlating subjective scores with measures of behaviour and affect.
Our results for study-level effects of the visibility condition indicated that the only significant effects of co-visibility were on self reported creativity (t(9) = −2.66,p < .05).This indicates that creativity was rated more highly in the V condition.However, we should be cautious of the potential influence of ordering effects on this result.Given the small size of our study, we also reported noteworthy results (p < .15).These indicate that self reported engagement was generally lower in the NV condition (t(18) = −1.64),whilst boredom was higher (t(9) = 1.69).With regard to ordering effects, in this case we would have expected boredom to be rated more highly in the second condition (V).These results support the idea that co-visibility is influential to the subjective experience of collaborating musicians.The HR and SCR extrema features are the only physiological features that show noteworthy visibility condition effects.These features represent the number of phasic shifts in a participant's physiological arousal over time.Our results suggest that there are more shifts during the V condition.This may be reflective of the fact that the participants have more to attend to when they are able to see their fellow musician.This could lead to them shifting their attention more frequently, resulting in corresponding shifts in physiological arousal due to psychophysiological linkage.
As can be noted from the analyses discussed above, we predominantly investigated relationships between pairs of features and conditions.This approach was chosen due to the exploratory nature of our study and the broad array of measures and features collected.A more comprehensive analysis of specific correlations and effects would require the inclusion of other factors, based upon specific hypotheses.Such a multi-factorial approach could yield results that contrast with those presented in this paper.In addition to this, the small sample size used in this study means that our findings should be viewed speculatively in relation to existing studies, and as inspiration for future research.

Measures
We assess the suitability of our measures based on three factors; practicality, reliability, and informativeness.Practicality concerns how difficult it is to collect the measurements, given the constraints of live musical performance.Reliability addresses the issue of whether the raw measurements are likely to contain reliable information, as opposed to noisy, or artifactual data.Finally, informativeness concerns the amount of useful information that we are able to obtain by extracting features and analysing the measurements.These factors were chosen in light of study aims (see Section 3).
Of the sensor-based measures, ECG and accelerometer data were easy to acquire using wireless sensors, and were not particularly prone to noise.On the contrary, the EEG sensor required a lot of preparation and adjustment prior to use, and the data required manual processing (artefact removal), and were prone to noise.Regarding the informativeness of sensor-derived features, we found that the extraction of heart rate extrema time points was useful for the analysis of time-based performance events.The bodily quantity of motion feature appeared to be a good indicator of the arousal dimension of subjective experience.Glance related features were acquired by manually annotating video footage, however it is feasible that modern eye-tracking sensors could perform this task automatically.Glance count and average duration appear to be promising features for the analysis of collaborative interactions where the musicians are visible to one another.
Performance related features, acquired through MIDI data, assisted in the identification of rhythmic changes, and provided velocity and timing synchrony features.Timing synchrony was extracted from beats that were perceptually synchronous, so would not have accounted for more pronounced and intentional changes in tempo.Reliable measures of tempo changes are difficult to extract automatically, which is why we chose not to perform tempo analysis.However, this is an important aspect of improvisation and would be worth considering in future studies.One issue is that MIDI data can only be obtained from certain instruments.Alternative methods, such as audio feature-extraction, should be considered for more generalised studies of collaborative music making.
Finally, our self report measures were obtained using a post-performance questionnaire (PPQ).We designed the PPQ to be as concise as possible, due to the fact that each participant had to complete it from 6 up to 10 times.Instead of including multiple questions addressing similar constructs, specific questions were selected and adapted from existing questionnaires used in related studies.Verifying the validity and reliability of questionnaires is a challenge for any researcher attempting to collect subjective measures of collaborative music making.In particular, reliability is difficult to ascertain, given that the unique nature of each collaboration prohibits the collection of repeated measures.Furthermore, the value and general reliability of introspective self report data has been called into question (Pronin, 2009).These issues provide weight to the argument that alternative measures, such as physiology, might be more suitable tools for the measurement of behaviour and affect.The predicament facing researchers is that, in order to interpret and understand how such measurements relate to affective and behavioural phenomena, we must have some existing knowledge about the phenomena we are measuring.This is especially difficult when it comes to investigating collaborative music making -an activity that involves a complex tapestry of contextual and subjective information that changes continuously as the interaction unfolds.
In summary, researchers should take caution in drawing links between self report and continuous physiological and behavioural data.In particular, it is necessary to carefully consider the causal mechanisms and pathways that might explain such links.

Conclusions
Our exploratory study provides an insight into the use of affective and behavioural sensors as tools for investigating collaborative music making.We have highlighted some of the challenges of working with these sensors and have addressed experimental design considerations that could be relevant to researchers planning to undertake similar studies.Our detailed analyses led to us reporting a range of interesting findings, which we hope will inspire future research in this field.In particular, there is a need to build upon this work by employing appropriate sensors to test specific hypotheses surrounding the behavioural and affective aspects of collaborative music making.

Figure 1 :
Figure 1: Schematic of the dual pathway to creativity model (reproduced from (De Dreu et al., 2008))

Figure 2 :
Figure 2: Image taken from the overhead camera illustrating the setup and the equipment used in the study

Figure 3 :
Figure 3: The post performance questionnaire (PPQ) that was used in our study.

Figure 4 :
Figure 4: A short segment of MIDI data for one of the participants, illustrating how rhythmic change points were identified.The blocks represent individual beats, and their colour denotes the velocity (strength) of the beat.There are two distinct patterns.Pattern 1 consists of soft (blue-green) and evenly spaced beats, and Pattern 2 consists of paired beats, played with more velocity (yellow-red).

Figure 5 :
Figure 5: Strip plot of individual participant data for a two window (w1 and w2 ) segment of a single improvisation task (under visual-contact condition).Rows 1 and 5 show continuous heart rate data (lighter colours represent higher values).Rows 2 and 4 represent discrete rhythmic change points (RCPs).Row 3 shows glance, which is binary data (glancing or not-glancing) shaded according to the participant.Black shading represents a mutual glance.

Figure 6 :
Figure 6: Plot of a segment of heart rate data showing detected extrema and corresponding timing of RCPs.

Figure 7 :
Figure7: Plots of the mean heart rates (HR) during the 7 seconds before and after RCPs for participants with significant RCP-extrema alignments.

Figure 8 :
Figure8: Plot of the mean heart rate (HR) for all particpants during the 7 seconds before (Pre-RCP), and after (Post-RCP) all of the RCPs from both conditions.

Table 1 :
A summary of the measures, sensors, and features used in this study.

Table 2 :
Percentages of rhythmic change points (RCPs) and heart rate extrema that are aligned for individual participants within each condition.

Table 4 :
Synchrony in continuous heart rate data between participants, within dyads.Expressed as the percentage of time where both participants' heart rates were changing in the same direction (positive or negative slope).

Table 5 :
Linear mixed effects modelling (LMM) estimates of fixed effects between continuous features.

Table 6 :
Linear mixed effects modelling (LMM) estimates of Fixed Effects between self report measures and continuous features.

Table 7 :
Linear mixed effects modelling (LMM) estimates of Fixed Effects of visibility condition on all self report measures and noteworthy (p < .15)continuous measures.