MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_NextPart_01C51F55.BA526390" This document is a Single File Web Page, also known as a Web Archive file. If you are seeing this message, your browser or editor doesn't support Web Archive files. Please download a browser that supports Web Archive, such as Microsoft Internet Explorer. ------=_NextPart_01C51F55.BA526390 Content-Location: file:///C:/26823104/aied.htm Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="us-ascii"
Thread-based analysis of patterns of c=
ollaborative
interaction in chat
Murat Cakir, Fat=
os
Xhafa,
Abstract. In this work we present a thread-based approach for analyzing synchronous collaborative math problem solving activities. Thread information is shown to be an important resource for ana= lyzing collaborative activities, especially for conducting sequential analysis of interaction among participants of a small group. We propose a computational model based on thread information which allows us to identify patterns of interaction and their sequential organization in computer-supported collabo= rative environments. This approach enables us to understand important features of collaborative math problem solving in a chat environment and to envisage several useful implications for educational and design purposes.
1. Introduction
The analysis of
fine-grained patterns of interaction in small groups is important for
understanding collaborative learning [1]. In distance education, collaborat=
ive
learning is generally supported by asynchronous threaded discussion forums =
and
by synchronous chat rooms. Techniques of interaction analysis can be borrow=
ed
from the science of conversation analysis (CA), adapting it for the differe=
nces
between face-to-face conversation and online discussion or chat. CA has
emphasized the centrality of turn-taking conventions and of the use of
adjacency pairs (such as question-answer or offer-response interaction patt=
erns).
In informal conversation, a given posting normally responds to the previous=
posting.
In threaded discussion, the response relationships are made explicit by a n=
ote
poster, and are displayed graphically. The situation in chat is more
complicated, and tends to create confusions for both participants and analy=
sts.
In this paper, we
present a simple mathematical model of possible response structures in chat,
discuss a program for representing those structures graphically and for
manipulating them, and enumerate several insights into the structure of chat
interactions that are facilitated by this model and tool. In particular, we
show that fine-grained patterns of collaborative interaction in chat can be
revealed through statistical analysis of the output from our tool. These
patterns are related to social, communicative and problem-solving interacti=
ons
that are fundamental to collaborative learning group behavior.
Computer-Supported Collaborative Learning (CSCL)
research has mainly focused on analyzing content information. A naïve sequential analysis solely based on the observed orderi=
ng
of postings without any claim about their threading could be misleading due=
to
artificial turn orderings produced by the quasi-synchronous chat medium [2]=
.
In recent years, we have seen increasing attention=
on
thread information, yet most of this research is focused on asynchronous
settings ([3], [4], [5], [6], [7]). Jeong [8] and K=
anselaar et al. [9], for
instance, use sequential analysis to examine group interaction in asynchron=
ous
threaded discussion. In order to do a similar analysis of chat logs, one ha=
s to
first take into account the more complex linking structures.
Our approach makes use of the thread information of
the collaboration session to construct a graph that represents the flow of
interaction, with each node denoting the content that includes the complete
information from the recorded transcript. By traversing the graph, we mine =
the
most frequently occurring dyad and triad structures, which are analyzed mor=
e closely
to identify the patterns of collaboration and sequential organization of
interaction under such specific setting. The proposed thread-based sequenti=
al
analysis is robust and scalable, and thus can be applied to study synchrono=
us
or asynchronous collaboration in different contexts.
The rest of the paper is organized as follo=
ws:
Section 2 introduces the context of the research, including a brief
introduction of the Virtual Math Teams project, and the coding scheme on wh=
ich
the thread-based sequential analysis is based. Section 3 states the researc=
h questions
we want to investigate. In Section 4 we introduce our approach. We present
interesting findings and discuss them to address our research questions and=
to envisage
several useful implications for educational and design purposes in Section =
5. Section
6 concludes this work and points to future research.
2. Context of the Research
The VMT Project and Data Collection
The Virtual Math Teams (VMT) project at
Table 1: Description of the coded chat logs.
Coding Scheme
Both quantitative and qualitative approaches
are employed in the VMT project to analyze the transcripts in order to
understand the interaction that takes place during collaboration within this
particular setting. A coding scheme has been developed in the VMT project t=
o quantitatively analyze the sequential organization of interactions
recorded in a chat log. The
unit of analysis is defined as one posting that is produced by a participan=
t at
a certain point of time and displayed as a single posting in the transcript=
.
The coding scheme includes nine distinct dimension=
s,
each of which is designed to capture a certain type of information from a d=
ifferent
perspective. They can be grouped into two main categories: one is to capture
the content of the session whereas another is to keep track of the threadin=
g of
the discussion, that is, how the postings are linked together. Among the
content-based dimensions, conversation and problem solving are two of the m=
ost
important ones which code the conversational and problem solving content of=
the
postings. Related to these two dimensions are the Conversation Thread and t=
he
Problem Solving Thread, which provide the linking between postings, and thus
introduce the relational structure of the data. The conversation thread also
links fragmented sentences that span multiple postings. The problem solving
thread aims to capture the relationship between postings that relate to each
other by means of their mathematical content or problem solving moves (see
Figure 1).
Figure 1: A coded excerpt from Pow2a.
Each dimension has a number of subcategories. The =
coding
is done manually by 3 trained coders independently after strict training
assuring a satisfactory reliability. This work is based on 4 dimensions onl=
y;
namely the conversation thread, conversation dimension, problem solving thr=
ead,
and problem solving dimension.
3. Research Questions
In this explorative study we
will address the following research questions:
Research Question 1:
&nbs=
p;
Research Question 2: How can
patterns of interaction be used to identify: (a) each member’s level =
of
participation; (b) the distribution of contributions among participants; an=
d, (c)
whether participants are organized into subgroups through the discussion?=
span>
Research Question 3: What
are the most frequent patterns related to the main activities of the math
problem solving? How do these patterns sequentially relate to each other?=
span>
=
span>
Research Question 4: What
are the (most frequent) minimal building blocks observed during
“local” interaction? How are these local structures sequentially
related together yielding larger interactional structures?
=
span>
4. The Computational Model
When a spreadsheet file containing the coded trans=
cript
is given as input, the program generates two graph-based internal
representations of the interaction, depending on the conversation and probl=
em solving
thread dimensions respectively. In this representation each posting is trea=
ted
as a node object, containing a list of references pointing to other nodes
according to the corresponding thread. Moreover, each node includes additio=
nal
information about the corresponding posting, such as the original statement,
the author of the posting, its timestamp, and the codes assigned in other
dimensions. This representation makes it possible to study various differen=
t sequential patterns, where sequent=
ial
means that postings involved in the pattern are linked according to the thr=
ead,
either from the perspective of participants who are producing the postings =
or
from the perspective of coded information.
After building a graph based representation,
the model performs traversals over these structures to identify frequently
occurring sub-structures within each graph, where each sub-structure
corresponds to a sequential pattern of interaction. Sequential patterns hav=
ing
different features in terms of their size, shape and configuration type are
studied. In a generic format dyads of type Ci-Cj,
and triads of type Ci-C<=
sub>j-Ck
where i<j<k are exa=
mined
in an effort to get information about the local organization of interaction=
. In
this representation Ci=
i>
stands for a variable that can be replaced by a code or author information.=
The
ordering given by i<j<k r=
efers
to the ordering of nodes by means of their relative positions in the
transcript. It should be noted that a posting represented by Cj can only be linked to
previous postings, say Ci where
i<j. In this notation the si=
ze of
a pattern refers to the number of nodes involved in the pattern (e.g. the s=
ize
is 2 in the case of Ci-C=
j).
Initially the size is limited to dyads and triads since they are more likel=
y to
be observed in a chat environment involving 3 to 5 participants. Nonetheles=
s,
the model can capture patterns of arbitrary size whenever necessary. The sh=
ape
of the pattern refers to the different combinations in which the nodes are
related to each other. For instance, in the case of a triad like Ci-Cj-Ck
there are two possible type configurations: (a) if Ci is linked to Cj
and Cj is linked to =
Ck , then we refer to t=
his
structure as chain type; (b) if=
Ci is linked to Cj and Ci is linked to Ck,
then we refer to this structure as =
star
type. The dyadic and triadic patterns identified this way reveal information
about the local organization of interaction. Thus, these patterns can be
considered as the fundamental building blocks of a group’s discussion,
whose combination would give us further insights on the sequential unfoldin=
g of
the whole interaction.
The type of the configuration is determined=
by the
information represented by each variable Ci.
A variable Ci can be
replaced by the author name, the conversation code, the problem solving cod=
e,
or a combination of conversation and problem solving codes. This flexibility
makes it possible to analyze patterns linking postings by means of their
authors, and the codes they receive from the conversational or problem solv=
ing
dimension.
As shown in Table 1, the maximum number of chat li=
nes
contained in a transcript in our data repository is about 700 lines, and we
analyzed a corpus containing 6 such transcripts for this explorative study.=
Thus,
in this study the emphasis is given to ways of revealing relevant patterns =
of
collaborative interaction from a given data set. Nonetheless, we take care =
of
efficiency issues while performing the mining task. Moreover, there exist
efficient algorithms designed for mining frequent substructures in large gr=
aphs
([10], [11], [12]), which can be used to extend our model to process larger
data sets. =
5. Results and D=
iscussion
In this section we show how the computation=
al
model presented in this work enables us to shed light on the research quest=
ions
listed in Section 3.
5.1 Local
Interaction Patterns
In order to identify the most frequent loca=
l interaction
patterns of size 2 and 3, our model performs traversals of corresponding
lengths and counts the number of observed dyads and triads. The model can
classify these patterns in terms of their contributors, in terms of convers=
ation
or problem solving codes, or by considering different combinations of these
attributes (e.g. patterns of author-conversation pairs). The model outputs a
dyad percentage matrix for each session in which the (i,j)th<=
/i> entry corresponds to the percentage that Ci is followed by Cj=
i> during that session. For example, a percen=
tage
matrix for dyads based on conversation codes is shown in Table 2. In additi=
on
to this, a row-based percentage matrix is computed to depict the local perc=
entage
of any dyad Ci-Cj among all dyads beginning with Ci. Table 3 shows a row-based percentage matr=
ix
for the conversation dyads. Similarly, the model also computes a list of tr=
iads
and their frequencies for each session.&nb=
sp;
5.2 Frequent Conversational Patterns
For the conversational dyads=
we
observed that there are a significant number of zero-valued entries on all =
six
percentage matrices. This fact indicates that there are strong causal
relationships between certain pairs of conversation codes. For instance, the
event that an Agree statement is
followed by an Offer statement =
is
very unlikely due to the fact that the Agree-Offer
pair has a zero value in all 6 matrices. By the same token, non-zero valued
entries corresponding to a pair C=
i>i-Cj suggests which Ci=
i> variables are likely to produce a reply of
some sort. Moreover, Cj
variables indicate the most likely replies that a conversational action Ci will get. This motivated us to call the mo=
st
frequent Ci-Cj pairs
as source-sink pairs, where the
source Ci most likely solicits the action Cj as the next immediate reply.
The most frequent conversati=
onal
dyads in our sample turned out to be Request-Response
(16%, 7%, 9%, 9%, 10%, 8% for the 6 powwows respectively), Response-Response (12%, 5%, 2%, 4%=
, 10%,
11%) and State-Response (8%, 6%=
, 4%,
2%, 5%, 16%) pairs. In our coding scheme conversational codes State, Respond, Request are assigned to those statements that belong t=
o a general
discussion, while codes such as Off=
er,
Elaboration, Follow, Agree, Critique and
Explain are assigned to statements that are specifically related to the
problem solving task. Thus, the computations show that a significant portio=
n of
the conversation is devoted to topics that are not specifically about math
problem solving. In addition to these, dyads of type Setup-X (8%, 14%, 12%, 2%, 3%, 4%) and X-Extension (14%, 15%, 9%, 7%, 9%, 6%) are also among the most
frequent conversational dyads. In compliance with their definitions, Setup and Extension codes are used for linking fragmented statements of a
single author that span multiple chat lines. In these cases the fragmented
parts make sense only if they are considered together as a single statement=
. Thus,
only one of the fragments is assigned a code revealing the conversational
action of the whole statement, and the rest of the fragments are tied to th=
at
special fragment by using Setup=
and Extension codes. The high percenta=
ge of Setup-X and X-Extension dyads shows that some participants prefer to intera=
ct
by posting fragmented statements during chat. The high percentage of fragme=
nted
statements strongly affects the distribution of other types of dyadic patte=
rns.
Therefore, a “pruning” option is included in our model to combi=
ne
these fragmented statements into a single node to reveal other source-sink
relationships.
5.3 Handle Patterns
Frequent dyadic and triadic
patterns based on author information can be very informative for making
assessments about each participant’s level and type of participation.=
For
instance, Table 4 contrasts two groups, namely Pow2a and Pow2b (hereafter,
group A and B, resp.) that worked on the same math problem in terms of their
author-dyad percentages.
The most striking difference between the two
groups, after pruning, is the difference between the percentage values on t=
he
diagonal: 10% for group A and 30% for group B. The percentages of most freq=
uent
triad patterns[1=
]
show a similar behavior. The percentage of triads having the same author on=
all
3 nodes (e.g. AVR-AVR-AVR) is 15% for group A, and 42% for group B. The pat=
tern
we see in group B is called an elaboration, where a member takes an extended
turn. The pattern in group A indicates group exploration where the members
collaborate to co-construct knowledge.
Patterns that contain the same author name =
on
all its nodes are important indicators of individual activity, which typica=
lly
occurs when a group member sends repeated postings without referring to any other group member. We =
call
this elaboration, where one member of the group explains his/her ideas The =
high
percentage of these patterns can be considered =
as a
sign of separate threads in ongoing discussion, which is the case for group=
B.
Moreover, there is anti-symmetry in between MCP’s responses to
REA’s comments (23%) versus REA’s responses to MCP’s comm=
ents
(14%). This shows that REA attended less to MCP’s comments, in respon=
se
to MCP’s attention towards REA’s messages. On the contrary, we
observe a more balanced behavior in group A, especially between AVR-PIN (17=
%,
18%) and AVR-SUP (13%, 13%). Another interesting pattern for group A is that
the balance with respect to AVR does not exist between the pair SUP-PIN. Th=
is
suggests that AVR was the dominant figure in group A, who frequently attend=
ed
to other two members of the group. To sum up, this kind of analysis points =
out
similar results on roles and prominent actors also addressed by social netw=
ork
analysis.
Table 2: Conversation dyads &nbs=
p; Table 3: Row based distribution of conversation dyads
=
The %s are computed over all pairs<=
span
style=3D'mso-spacerun:yes'> =
&nb=
sp; =
&nb=
sp;
The %s are computed separately for each row
Dyadic and triadic patterns can also be useful in
determining which member was most influential in initiating discussion duri=
ng
the session. For a participant i,=
i> the
sum of row percentages (i,j) wh=
ere i ≠ j can be used as a metri=
c to
see who had more initiative as compared to other members. The metric can be
improved further by considering the percent of triads initiated by user i. For instance, in group A the row percentages are 31%, 22%, 20% a=
nd
2% for AVR, PIN, SUP and OFF respectively. Moreover, the percentage of tria=
ds
initiated by each member is 41%, 29%, 20% and 7% for AVR, PIN, SUP and OFF
respectively. These numbers show that AVR had a significant impact in
initiaiting conversation. In addition to this, a similar metric for the col=
umns
can be considered for measuring the level of attention a participant exhibi=
ted
by posting follow up messages to other group members.
5.4 Problem Solving Patterns
A similar analysis of dyadic=
and
triadic patterns can be used for making assessments about the local
organization of a group’s problem solving actions. The problem solving
data produced by our model for groups A<=
span
lang=3DEN-GB style=3D'font-family:"Times New Roman";color:black;letter-spac=
ing:
-.15pt;mso-ansi-language:EN-GB'> and B will be used to aid the following
discussion in this section. Table 4 displays both groups’ percentage
matrices for problem solving dyads.
Before making any comparisons
between these groups, we briefly introduce how the coding categories are
related to math problem solving activities. In this context a problem solving activity
refers to a set of successive math problem solving actions. In our coding
scheme, Orientation, Tactic and Strategy codes refer to the elements of a certain activity in w=
hich
the group engages in understanding the problem statement and/or propose
strategies for approaching it. Next, a combination of Perform and Result =
codes
signal actions that relate to an execution activity in which previously
proposed ideas are applied to the problem. Summary
and Restate codes arise when the
group is in the process of helping a group member to catch up with the rest=
of
the group and/or producing a reformulation of the problem at hand. Further,=
Check and Reflect codes capture moves where group members reflect on the
validity of an overall strategy or on the correctness a specific calculatio=
n. Check and Reflect codes do not form an activity by themselves; rather they
are interposed among the activities described before.
Table 4: Handle & Problem Solving Dyads for Pow2a and Pow2b

SYS refers to sy=
stem
messages. GER and
Given this description, we u=
se
the percentage matrices (see Table 4) to identify what percent of the overa=
ll
problem solving effort is devoted to each activity. For instance, the sum of
percentage values of the sub-matrix induced by the columns and rows of Orientation, Tactic, Strategy, Check <=
/i>and
Reflect codes takes up 20% of t=
he
whole problem solving actions performed by the group A, whereas this value =
is
only 3% for group B. This indicates that group A put more effort in develop=
ing
strategies for solving the problem. When we consider the sub-matrix induced=
by Perform, Result, Check and Reflect, the corresponding values =
are
24% for group A and 49% for group B. This signals that group B spent more t=
ime
on executing problem solving steps. Finally, the values of the corresponding
sub-matrix induced by Restate, Summ=
arize,
Check, and Reflect codes ad=
ds up
to 4% for group A and 0% for =
B,
which hints a change in orientation of group A’s problem solving
activity. The remaining percentage values excluded by the sub-matrices belo=
ng
to transition actions in between different activities.
5.5 Maximal Patterns=
span>
The percentage values presen=
ted
in the previous section indicates that groups A and B exhibited significant=
ly
different local organizations in terms of their problem solving activities.=
In
order to make stronger claims about the differences at a global level one n=
eeds
to consider the unfolding of these local events through the whole discussio=
n. Thus, analyzing the sequential unfolding of local
patterns is another interesting focus of investigation which will ultimately
yield a “global” picture of a group’s collaborative probl=
em
solving activity. For instance, given the operational descriptions of probl=
em
solving activities in Subsection 5.4, we observed the following sequence of
local patterns in group A. First, the group engaged in a problem orientation
activity in which they identified a relevant sub-problem to work on. Then, =
they
performed an execution activity on the agreed strategy by making numerical
calculations to solve their sub-problem. Following this discussion, they
engaged in a reflective activity in which they tried to relate the solution=
of
the sub-problem to the general problem. During their reflection they realiz=
ed they
made a mistake in a formula they used earlier. At that point the session en=
ded,
and the group failed to produce the correct answer to their problem. On the
other hand, the members of =
group
B individually solved the problem at the beginning of the session without
specifying a group strategy. They spent most of the remaining discussion re=
vealing
their solution steps to each other.
6. Conclusion an=
d Ongoing
Research
In this work we have shown how thread
information can be used to identify most frequent patterns of interaction w=
ith
respect to various different criteria. In particular, we have discussed how
these patterns can be used for making assessments about the organization of
interaction in terms of each participant’s level of participation, th=
e conversational
structure of discussion as well as the problem solving activities performed=
by
the group. Our computations are based on an automated program which accepts=
a
coded chat transcript as input, and performs all necessary computations in =
an
efficient way.
In
our ongoing research we are studying other factors that could influence the
type of the patterns and their frequencies, such as the group size, the typ=
e of
the math problem under discussion, etc. Moreover, we are investigating whet=
her
the interaction patterns and the problem solving phases reveal information
about the type of the organization of the interaction, e.g. exploratory vs.
reporting work. Finally, we will be using our data to feed a
statistical model and thus study the research questions from a statistical
perspective. We are also planning to extend the existing computational mode=
l to
support XML input in order to make the model in=
dependent
of the specific features introduced by a coding scheme.
Re=
ferences
[1] Stahl, G. (2005). Group Cognition: Computer Support =
for
Collaborative Knowledge Building. C=
ambridge,
MA: MIT Press.
[2] Garcia, A. and Jacobs, J.B. (1998). The interactional
organization of computer mediated communication in the college classroom. Qualitative Sociology, 21(3), 299-317.=
[3] Smith, M.,
[4] Popolov, D.,
Callaghan, M., and Luker, P. (2000). Conversation Space: Visualising
Mulit-threaded Conversation. AVI 20=
00,
Italy
[5] King, F.B., and Mayall, H.J. (200=
1)
Asynchronous Distributed Problem-based Learning, Advanced Learning Technology Conference IEEE.
[6] Tay, M.H., Hooi, C.M., and Chee, Y.S. ( 2002)
Discourse-based Learning using a Multimedia Discussion Forum. Proceedings of the International Confe=
rence
on Computers in Education (ICCE’02), IEEE.
[7] Venolia, G.D. =
and
Neustaedter, C. (2003) Understanding Sequence and Reply Relationships within
Email Conversations: A Mixed-Model Visualization. CHI’03, USA
[8] Jeong, A.C. (2=
003).
The Sequential Analysis of Group Interaction and Critical Thinking in Online
Threaded Discussion. The American J=
ournal
of Distance Education, 17(1), 25-43
[9] Kanselaar, G.,
Erkens, G., Andriessen, J., Prangsma, M., Veerman, A., and Jaspers, J. (200=
3)
Designing Argumentation Tools for Collaborative Learning. Book chapter of Visualizing Argumentation: Software To=
ols
for Collaborative and Educational Sense-Making, Kirschner, P.A., et al. eds=
, Springer.
[10] Inokuchi, A., Washio, T. and Motodam H. (2000). An
apriori-based algorithm for mining frequent substructures from graph data. =
In 4th European Conerence on
Principles of Knowledge Discovery and Data Mining.
[11] Kuramochi,M. and Karypis, G. (2001). Frequent subgr=
aph
discovery. In 1st IEEE I=
nternational
Conference on Data Mining.
[12] Zaki, M.J. (2002). Efficiently mining frequent tree=
s in
a forest. In 8th ACM SIGKDD International Conference on Knowledge Discov=
ery
and Data Mining.
[1] For more results and our coding
scheme refer to http://mathforum.org/wiki/VMT?ThreadAnalResults.