The Turtleback Diagram for Conditional Probability

ABSTRACT

We elaborate on an alternative representation of conditional probability to the usual tree diagram. We term the representation “turtleback diagram” for its resemblance to the pattern on turtle shells. Adopting the set theoretic view of events and the sample space, the turtleback diagram uses elements from Venn diagrams—set intersection, complement and partition—for conditioning, with the additional notion that the area of a set indicates probability whereas the ratio of areas for conditional probability. Once parts of the diagram are drawn and properly labeled, the calculation of conditional probability involves only simple arithmetic on the area of relevant sets. We discuss turtleback diagrams in relation to other visual representations of conditional probability, and detail several scenarios in which turtleback diagrams prove useful. By the equivalence of recursive space partition and the tree, the turtleback diagram is seen to be equally expressive as the tree diagram for abstract concepts. We also provide empirical data on the use of turtleback diagrams with undergraduate students in elementary statistics or probability courses.

We elaborate on an alternative representation of conditional probability to the usual tree diagram. We term the representation “turtleback diagram” for its resemblance to the pattern on turtle shells. Adopting the set theoretic view of events and the sample space, the turtleback diagram uses elements from Venn diagrams—set intersection, complement and partition—for conditioning, with the additional notion that the area of a set indicates probability whereas the ratio of areas for conditional probability. Once parts of the diagram are drawn and properly labeled, the calculation of conditional probability involves only simple arithmetic on the area of relevant sets. We discuss turtleback diagrams in relation to other visual representations of conditional probability, and detail several scenarios in which turtleback diagrams prove useful. By the equivalence of recursive space partition and the tree, the turtleback diagram is seen to be equally expressive as the tree diagram for abstract concepts. We also provide empirical data on the use of turtleback diagrams with undergraduate students in elementary statistics or probability courses.

1. Introduction

Conditional probability [1] [2] [3] [4] is an important concept in probability and statistics. It has been widely acknowledged that the concept of conditional probability, and particularly its application in practical contexts, are difficult for students [5] [6] [7] [8] [9] [10] [11] [12] and especially those without much background or previous training in mathematics at the college level.

Let A and B be two events, then the conditional probability of A given B is defined as

$\mathbb{P}\left(A|B\right)=\frac{\mathbb{P}\left(A\cap B\right)}{\mathbb{P}\left(B\right)}.$ (1)

The focus of this article is on productive visual representations for the understanding and application of conditional probability. The significant role of visual representation in mathematics is well-established; see, for example, [13] [14] . While visualization is an important topic in statistics (see, e.g., [15] [16] ), the role of visualization in statistics education or practice is not as well documented. In particular, there is actually not much research into productive visualization of conditional probability [17] [18] ; popular books such as [19] do not dedicate much effort to visual explanations of the Bayes theorem. There has been some research on school student difficulties with conditional probability [6] [8] [10] [11] [12] but much less so for undergraduates. Our aim in discussing turtleback diagrams is to provide a visual tool for the representation of conditional probability that may, additionally, be used in further research on student understanding of conditional probability.

2. Student Difficulties in Understanding Conditional Probability

Tomlinson and Quinn [9] , in discussing their graphic model for representing conditional probability (see Section 3.2.1), state:

Documented student difficulties with conditional probability can be summarized as one of three main types [7] :

1) Interpreting conditionality as causality.

2) Identifying and describing the conditioning event.

3) Confusing $\mathbb{P}\left(A|B\right)$ and $\mathbb{P}\left(B\mathrm{|}A\right)$ .

Tarr and Jones [8] developed a valid and reliable framework for addressing student difficulties with conditional probability, in the context of sampling without replacement. This framework is particularly valuable in carrying out research as to which visual representation of conditional probability is most useful in assisting students and teachers.

3. Visual Representations of Conditional Probability

3.1. Tree Diagrams

Tree diagrams have been used by many to help understand conditional probability. The idea of a tree diagram is to use nodes for events, the splitting of a node for sub-events, and the edges in the tree for conditioning. For example, Figure 1 is an illustration of conditional probability. Node * indicates the sample space $\Omega $ , and we will use them interchangeably throughout. Two possible events, either $B$ or $\stackrel{\xaf}{B}$ , may happen. This is represented by two tree nodes $B$ and $\stackrel{\xaf}{B}$ . The splitting of node $B$ into two nodes $A$ and $\stackrel{\xaf}{A}$ indicates that, given $B$ , two possible events, $A$ and $\stackrel{\xaf}{A}$ , may occur. The edges, $B\to A$ and $B\to \stackrel{\xaf}{A}$ indicate conditional probabilities, $\mathbb{P}\left(A|B\right)$ and $\mathbb{P}\left(\stackrel{\xaf}{A}|B\right)$ , respectively. Tree diagrams help many students to understand the concept of conditional probability and apply it for problem solving, but is not so effective to many others especially those less prepared ones. Basically, they find the following two aspects non-intuitive. One is to represent events by tree nodes, which usually appear as dots or small circles, but events are sets and are more naturally represented by Venn diagram [20] type of notations. Another is the idea to represent conditional probability by tree edges; it is hard to see any straightforward connections of this to formula (1).

Figure 1. The tree diagram approach for conditional probability.

the diagram―the mathematical operation is done directly by the human visual system, instead of having to invoke both the visual system and the brain. On the other hand, for the tree diagram, each of the two ingredients does some job but there is room for improvement.

The turtleback diagram we propose tries to optimize the two steps involved in the design of a graphical tool for conditional probability. In particular, it views events and the sample spaces as sets, and uses elements from Venn diagrams―set intersection, complement and partition―for conditioning, with the additional notion that the area of a set indicates probability whereas the ratio of areas associated with relevant sets indicates conditional probability. Once parts of the diagram are drawn and properly labelled, the calculation of conditional probability involves just simple arithmetic on the area of relevant sets. This makes it particularly easy to understand and use for problem solving.

3.2. Other Visual Representations

There have been several prior attempts to represent conditional probability visually [9] [21] [22] [23] , and we discuss briefly three of these below.

3.2.1. Tomlinson-Quinn Graphical Model

This graphical model, for facilitating a visually moderated understanding of conditional probability, described in [9] , is a modified tree diagram.

Tomlinson and Quinn visualize compound events $A\cap B\mathrm{,}A\cap \stackrel{\xaf}{B}$ as nodes of a tree (see Figure 2 of [9] ), so essentially their idea is still a tree diagram in which they carry out a Venn-diagram like visualization at each tree node.

3.2.2. Roullete-Wheel Diagrams

Yamagishi [22] introduces roullete-wheel diagrams as a visual representation tool; see Figure 1, p. 98 of [22] . He argues that

“The graphical nature of [roulette-wheel diagrams] take advantage of people’s automatic visual computation in grasping the relationship between the prior and posterior probabilities.” (p. 105).

and provides experimental evidence that use of roulette-wheel diagrams increases understanding of conditional probability beyond that for tree diagrams. In this regard, Sloman et al. [24] state:

“The studies reported support the nested-sets hypothesis over the natural frequency hypothesis. .... The nested-sets hypothesis is the general claim that making nested-set relations transparent will increase the coherence of probability judgment.” (p. 307)

3.2.3. Iconic Diagrams

“Iconicity” is the lowest of Terrence Deacon’s three levels of symbolic interpretation^{1} [25] , as it is for Peirce on whose semiotic work Deacon’s theory is

Figure 2. A diagram that is universally iconic for humans.

based. An icon is a form of graphical representation that requires no significant depth of interpretation: an icon brings to mind, without any apparent intermediate thought, something that it resembles. For example, the diagram in Figure 2 is universally iconic for human beings. Brase [23] carried out a number of experiments from which he inferred that an iconic representation of a Bayesian probability question is more effective in eliciting correct responses than either no visual aids, or Venn diagrams.

A modified version of Brase’s question is as follows:

“A new test has been developed for a particular form of cancer found only in women. This new test is not completely accurate. Data from other tests indicate a woman has 7 chances out of 100 of having cancer. The test indicated positively only 5 of these women as having cancer. On the other hand, the test indicated a positive result for 14 of the 93 women without cancer.

Janine is tested for cancer with this new test. Janine has probability―of a positive result from the test, with a probability―of actually having cancer.”

An iconic representation for this problem is shown in Figure 3.

The strength of such iconic representations is that they reduce the calculation of probabilities to simple counting problems and, as Brase [23] demonstrates, are effective in assisting students to get correct answers. A weakness of iconic representations such as these, are that they rely on counting discrete items and so are quite limited in representing more realistic probabilities.

3.3. Turtleback Diagrams

Our focus is on how to represent an event graphically, how to relate it to the sample space, how to express the notion of conditioning such that it would be easy to understand the concept of conditional probability, to gather pieces of information together, and to solve problems accordingly.

We start by treating the sample space (denoted by $\Omega $ ) and events as sets, and in terms of graph, as a region and its sub-regions, similarly as in a Venn diagram. Assume the region representing the original sample space $\Omega $ has an area of 1. To simplify our discussion (or to abuse the notation), we will use a label, say B, to denote the region associated with event B. Note that here the label can be either a single letter, or several letters (such a case indicates the intersection of events. For example, a label AB indicates the intersection of events A and B and thus that of regions A and B). Similarly we can use the union of two regions (viewed as sets) to represent the union of two events. Other operations of events

Figure 3. An iconic representation of the effectiveness of a cancer test.

can also be defined accordingly in terms of set operations; we omit the details here. To quantify the chance of an event, we associate it with the area of the relevant region. For example, $\mathbb{P}\left(B\right)$ is indicated by the area of region $B$ .

The centerpiece in “graphing” conditional probability is to express the notion of conditioning. This can be achieved by re-examining the definition of conditional probability as given in (1). It can be interpreted as follows. Let $A$ be the event of interest. Upon conditioning, say, on event $B$ , both the new effective sample space and event $A$ in this new sample space can be viewed as their restriction on $B$ , that is, $\Omega $ becomes $\Omega \cap B=B$ and $A$ becomes $A\cap B$ , respectively. The conditional probability $\mathbb{P}\left(A|B\right)$ can now be interpreted as the proportion of the part of $A$ that is inside $B$ (i.e., $A\cap B$ ) out of region $B$ , that is,

$\mathbb{P}\left(A|B\right)=\frac{\text{area}\text{\hspace{0.17em}}\text{of}\text{\hspace{0.17em}}\text{region}\mathrm{}\text{\hspace{0.05em}}A\cap B}{\text{area}\text{\hspace{0.17em}}\text{of}\mathrm{}\text{\hspace{0.05em}}B}\mathrm{.}$ (2)

Now we can describe how to sketch a turtleback diagram. We start by drawing a circular disk which represents the sample space $\Omega $ . Then we represent events by partitioning the circular disk and the resulting subregions. To facilitate our discussion, we define the partition of a set [26] . $\mathcal{P}=\left\{{S}_{i}\mathrm{|}i\in \mathbb{I}\right\}$ is a partition of set $S$ if $S={\displaystyle {\cup}_{i}{S}_{i}}$ and ${S}_{i}\subseteq S$ , ${S}_{i}\cap {S}_{j}=\varnothing $ for all $i\ne j$ in the index set $\mathbb{I}$ .

We will use Figure 4 to assist our description. To represent the partition $\Omega =\stackrel{\xaf}{B}\cup B$ , we use a straight line “adc” to split the circular disk into two halves, i.e., regions surrounded by “abcda” and “adcea”, which stands for event $B$ and $\stackrel{\xaf}{B}$ , respectively. The regions corresponding to event $B$ and $\stackrel{\xaf}{B}$ can be further split for a more refined representation involving other events. To represent conditional probability as defined by (1), event $B$ is written as

$B=\left(A\cap B\right)\cup \left(\stackrel{\xaf}{A}\cap B\right)\mathrm{,}$ (3)

which can be represented by splitting the region for $B$ , i.e., “abcda”, with a straight line “db”. The conditional probability $\mathbb{P}\left(A|B\right)$ can then be calculated as the ratio of the area for region “bcdb” and that for region “abcda”.

Figure 4. Illustration of the turtleback diagram for conditional probability. The left panel shows the partition of $\Omega $ by $\Omega =B\cup \stackrel{\xaf}{B}$ , the middle panel shows event B is further partitioned by $B=\left(A\cap B\right)\cup \left(\stackrel{\xaf}{A}\cap B\right)$ , and the right panel is a simplified version of the middle panel where AB stands for $A\cap B$ , and $\stackrel{\xaf}{A}B$ stands for $\stackrel{\xaf}{A}\cap B$ . The conditional probability $\mathbb{P}\left(A|B\right)$ is the ratio of the area for region “bcdb” and that for the area “abcda”.

The turtleback diagram leads to a partition of the sample space $\Omega $ as follows

$\Omega =\stackrel{\xaf}{B}\cup B$ (4)

$=\stackrel{\xaf}{B}\cup \left(A\cap B\right)\cup \left(\stackrel{\xaf}{A}\cap B\right)\mathrm{.}$ (5)

Continuing this process, we can define events as complicated as we like in a simple hierarchical (recursive) fashion as a nesting sequence of partitions ${\mathcal{P}}_{0}\succ {\mathcal{P}}_{1}\succ {\mathcal{P}}_{2}\succ \cdots $ where ${\mathcal{P}}_{0}=\left\{\Omega \right\}$ , ${\mathcal{P}}_{1}=\left\{B\mathrm{,}\stackrel{\xaf}{B}\right\}$ , and ${\mathcal{P}}_{i+1}$ is a refinement of ${\mathcal{P}}_{i}$ for index $i>0$ in the sense that each element in ${\mathcal{P}}_{i+1}$ is a subset of some element in ${\mathcal{P}}_{i}$ .

We can now assign labels to each of the sub-regions, e.g., by the name of the relevant events to indicate that a particular region is associated with that event. For example in Figure 4, we assign labels AB and $\stackrel{\xaf}{A}B$ to regions bcdb and abda, respectively. Here, AB means $A\cap B$ , and $\stackrel{\xaf}{A}B$ indicates $\stackrel{\xaf}{A}\cap B$ , and the same convention carries over throughout. Accordingly, the turtleback diagram simplifies to the right panel in Figure 4. Note that here an event need not be a connected region, rather it could be a collection of patches (i.e., small regions) with each of them capturing information from a different source. This causes a little burden in calculation but costs really nothing conceptually, or, in terms of the ability of visualization.

One advantage of such a recursive-partition representation of the sample space $\Omega $ is that the data are now highly organized and we can easily operate on it, for example to find out the probability of a certain event. The idea of organizing the data via recursive space-partition and manipulating by their labels has been explored in CART (classification and regression trees [27] ) and more recently, random projection trees [28] , as well as a recent work of one author and his colleagues [29] . Note that dividing a region into a number of small patches also entails the total probability formula, an important ingredient in conditional probability to which formula (3) is related. We will use the “Lung disease and smoking” example to illustrate the use of turtleback diagrams for conditional probability.

3.4. The Lung Disease and Smoking Example

This example is taken from online sources (see [30] ). It is described as follows.

“According to the Arizona Chapter of the American Lung Association, 6.0% of population have lung disease. Of those having lung disease, 92.0% are smokers; of those not having lung disease, only 24.0% are smokers. Answer the following questions.

1) If a person is randomly selected in the population, what is the chance that she is a smoker having lung disease?

2) If a person is randomly selected in the population, what is the chance that she is a smoker?

3) If a person is randomly selected and is discovered to be a smoker, what is the chance that she has lung disease?”

According to the information given in the problem, we can sketch a graph as Figure 5. Labels and area information to each sub-regions are assigned properly. Assume the circular disk has an area of 1. Now we can answer the questions quickly as follows.

1) The answer is simply the area of region abda, which is $6\%\times 92\%=0.0552$ .

2) The answer is the area of region edbae, which is $6\%\times 92\%+94\%\times 24\%=0.2808$ . This is, in essence, the total probability formula $\mathbb{P}\left(S\right)=\mathbb{P}\left(L\cap S\right)+\mathbb{P}\left(\stackrel{\xaf}{L}\cap S\right)$ .

3) Recognizing that this involves conditional probability and is the ratio of two relevant areas, (area of abda/area of edbae) = 0.0552/0.2808 = 0.1966.

3.5. Difficulty with the Venn Diagram

The Venn diagram is known as the standard graphical tool for set theory. Both Venn diagram and the turtleback diagram use regions to represent sets. However, there is a major difference. In a turtleback diagram, as illustrated in Figure 4, straight lines, such as line “adc”, “db” etc., are used to split the sample space and regions. In contrast, the Venn diagram represents events by drawing circular disks. Partitioning the sample space $\Omega $ in such a way would cause substantial difficulty in handling the complement operation, one crucial ingredient in conditional probability. One has to deal with a setting where the complement of a region would surround the region itself, for example, in Figure 6, $\stackrel{\xaf}{S}$ and $\stackrel{\xaf}{L}$ surround S and L, respectively. This would cause extra burden to the human brain or the visual system. We will illustrate with the “Lung disease and smoking” example.

In Figure 6, one would find it tricky to label the region and put area information for $\stackrel{\xaf}{L}$ (which is 94%) without causing confusion. Moreover, it may require some extra work (versus simply “reading” from the graph) to assign the label $\stackrel{\xaf}{L}\stackrel{\xaf}{S}$ , or to calculate the area of this region. The turtleback diagram (c.f. Figure 5) introduces straight lines, e.g., “adc”, “ed”, and “db”, which readily avoiding obstacles caused by set intersection or complement in a Venn diagram.

Figure 5. The turtleback diagram for the “Lung disease and smoking” example. The letters “ $L$ ” and “ $\stackrel{\xaf}{L}$ ” stand for “with lung disease” and “without lung disease”, “ $S$ ” and “ $\stackrel{\xaf}{S}$ ” for “smoking” and “nonsmoking”, respectively.

Figure 6. The Venn diagram approach to the “Lung disease and smoking” example.

4. Semantic Equivalence of the Turtleback and the Tree Diagram

The way that the turtleback diagram progressively refines the partition over the sample space is essentially a recursive space partition, where the sets involved in the partition are organized as a chain of enclosing sets. For example, in Figure 4, we have

$\left(A\cap B\right)\subseteq B\subseteq \Omega \mathrm{}\text{\hspace{0.05em}}\text{and}\text{\hspace{0.05em}}\text{}\left(\stackrel{\xaf}{A}\cap B\right)\subseteq B\subseteq \Omega $ (6)

By equivalence (see, for example, [27] ) between the recursive space partition and the tree structure, we can actually show the “semantic” equivalence between the turtleback diagram and the tree diagram. The remaining of this section is dedicated to this. Let a tree node correspond to a set in a recursive space partition with the following three properties:

1) The root node corresponds to the sample space $\Omega $ ;

2) All the child nodes of a node form a decomposition of this node;

3) Down from the root node, the nodes along any path form a chain of enclosing sets.

Property 2) entails the total probability formula, and property 3) corresponds to a refinement of a partition. This allows one to turn the turtleback diagram in Figure 4 into a tree representation, that is, the left panel of Figure 7. The “chain” property forces a child node to be a restriction of its parent node. We can use this to simplify the labels for the tree nodes, e.g., the left panel becomes the right in Figure 7. Note that in the right panel, really node A corresponds to the set $\Omega \cap B\cap A$ , that is, the intersection of all sets along the path from the root to node A (i.e., the tree path $\ast \to B\to A$ ).

For real world conditional probability problems, often the following formula is used instead of (1), due to availability of information from multiple sources

$\mathbb{P}\left(A|B\right)=\frac{\mathbb{P}\left(A\cap B\right)}{{\displaystyle {\sum}_{i}\mathbb{P}\left(B\cap {A}_{i}\right)}}$ (7)

where ${\sum}_{i}}{A}_{i}=\Omega $ . This requires the calculation of probabilities in the form of $\mathbb{P}\left(B\cap {A}_{i}\right)$ , or in other words, the probability of the intersection of multiple events.

In Figure 7, by construction node A, through path $\ast \to B\to A$ , has a size $\mathbb{P}\left(\Omega \cap B\cap A\right)$ , and node B has size $\mathbb{P}\left(B\right)$ . We can now endow the weight of edge $B\to A$ according to the proportion of node A (treated as a subset of B) out of B, or the probability of transition to node A given that one has reached node B from the root. This equals $\mathbb{P}\left(A|B\right)$ . Such a definition is valid as the size of nodes $A\mathrm{,}\stackrel{\xaf}{A}$ and B satisfies $\mathbb{P}\left(A\cap B\right)+\mathbb{P}\left(\stackrel{\xaf}{A}\cap B\right)=\mathbb{P}\left(B\right)$ .

Thus, in Figure 8, the probability that one arrives at a node, say A, along the path $\Omega \to B\to A$ is given by

$\mathbb{P}\left(\Omega BA\right)=\mathbb{P}\left(\Omega \cap B\cap A\right)=\mathbb{P}\left(BA\right)=\mathbb{P}\left(B\right)\cdot \mathbb{P}\left(A\mathrm{|}B\right)\mathrm{,}$ (8)

which is simply the product of edge weights along the path $\Omega \to B\to A$ (the edge weight for $\Omega \to B$ is $\mathbb{P}\left(B\right)$ ). Same reasoning extends to any node in a

Figure 7. The tree diagram representation of the turtleback diagram in Figure 4.

Figure 8. The tree diagram approach illustrated.

tree. Thus we have provided a tree-based interpretation of the turtleback diagram for conditional probability. Such an algebraic system on the tree has the following two properties:

1) The probability of arriving at any node equals the product of edge weights along the path.

2) The weight of an edge $H\to L$ has weight given by $\mathbb{P}\left(L\mathrm{|}\ast \mathrm{,}\cdots \mathrm{,}H\right)$ .

This is exactly what a tree diagram would represent. The above properties extend readily to a series of events. For example, the probability of a series of events, $B\to C\to D$ can be computed as the probability of arriving at node D along the tree path $\star \to B\to C\to D$ (c.f., Figure 8)

$\begin{array}{c}\mathbb{P}\left(B\cap C\cap D\right)=\mathbb{P}\left(\ast \to B\right)\cdot \mathbb{P}\left(B\to C\right)\cdot \mathbb{P}\left(C\to D\right)\\ =\mathbb{P}\left(B\right)\cdot \mathbb{P}\left(C|B\right)\mathbb{P}\left(D|B,C\right).\end{array}$ (9)

This approach applies even for non-sequential events, as one can artificially attach an order to the events according to the “arrival” of relevant information. Thus, we have shown the semantic equivalence between the turtleback diagram and the tree diagram. Their difference is mainly on the visual representation, which matters as visual tools.

The tree diagram appears to be less intuitive than the turtleback diagram as there is no longer an association between the area of a region and its probability (one may use the thickness of an edge to indicate the probability, but that is less attractive too). However, the tree diagram seems to scale better to large problems.

5. Case Studies

We consider four examples in case study, including the “Lung disease and smoking” example, the “History and war” example, the “Lucky draw” example, and “the urn model” example [1] . As a matter of fact, very few students (about 10% - 15%) can do the “History and war” example completely correctly in an in-class practice, after explaining to them the non-graph based concept of conditional probability. That motivated us to adopt the graph-based approach. In the following, we provide the details of the examples.

5.1. The Lung Disease and Smoking Example

With the tree diagram, the answer to (1) is the probability of reaching node S along the path $\star \to L\to S$ , which is the product of edge weights along this path and is calculated as $6\%\times 92\%=0.0552$ . The solution to (2) is the sum of products of edge weights along two paths, $\star \to L\to S$ and $\star \to \stackrel{\xaf}{L}\to S$ , that is, $6\%\times 92\%+94\%\times 24\%=0.2808$ , and (3) by the ratio of the product of edge weights along path $\star \to L\to $ over that over two paths, which is 0.0552/0.2808 = 0.1966 (Figure 9).

5.2. The History and War Example

This example is artificially created so that it has a similar problem structure as the “Lung disease and smoking” example. It is described as follows.

“According to a market research about the preference of movies, 10% of the population like movies related to history. Of those who like movies related to history, 90% also like movies related to wars; of those who do not like movies related to history, only 30% like movies related to wars. Answer the following questions.

(a) If a person is randomly selected in the population, what is the chance that she likes both movies related to wars and movies related to history?

(b) If a person is randomly selected in the population, what is the chance that she likes movies related to wars?

(c) If a person is randomly selected and is discovered to like movies related to wars, what is the chance that she likes movies related to history?”

We can construct a turtleback diagram as the left panel of Figure 10. One can quickly answer the questions as follows. (a) is the area of region abda, which is given by $10\%\times 90\%=0.09$ , (b) is the total area of region ebdae, which is given by $10\%\times 90\%+90\%\times 30\%=0.36$ , and (c) is the ratio of (a) and (b) which is 0.09/0.36 = 0.25.

Similarly, the right panel of Figure 10 is a tree diagram. One can answer the questions as follows. (a) is the product of edge weights along the path $\star \to H\to W$ , which is given by $10\%\times 90\%=0.09$ , (b) is the sum of the product of edge weights along two paths, $\star \to H\to W$ and $\star \to \stackrel{\xaf}{H}\to W$ , which is given by $10\%\times 90\%+90\%\times 30\%=0.36$ , and (c) is the ratio of (a) and (b) which is 0.09/0.36 = 0.25.

5.3. The Lucky Draw Example

The lucky draw example is taken from the popular lucky draw game. This example is especially useful as many sampling without replacement problems can be converted to this and solved easily. Here we take a simplified version with the total number of tickets being 5 and there is only one prize ticket. The description is as follows.

“There are 5 tickets in a box with one being the prize ticket. 5 people each

Figure 9. The tree diagram approach for the “Lung disease and smoking” example. The letters “ $L$ ” and “ $\stackrel{\xaf}{L}$ ” stand for “with lung disease” and “without lung disease”, “ $S$ ” and “ $\stackrel{\xaf}{S}$ ” for “smoking” and “nonsmoking”, respectively.

Figure 10. Solving the “History and War” example with the turtleback diagram and tree diagram, respectively. The letters “ $H$ ” and “ $\stackrel{\xaf}{H}$ ” stand for “like movies related to history” and “do not like movies related to history”, “ $W$ ” and “ $\stackrel{\xaf}{W}$ ” for “like movies related to wars” and “do not like movies related to wars”, respectively.

randomly draws one ticket from the box without returning the drawn ticket to the box. Is this a fair game (i.e., each draws the prize ticket with the same chance)?”

Figure 11 depicts the process of ticket drawing. As here our interest is the prize ticket, the tree branch that has already seen the prize ticket will not grow further. Easily the probability of getting the prize ticket at the first draw is 1/5. Following Figure 11, the probability of getting the prize ticket at the second draw is the product of edge weights along the path $\star \to N\to P$ , which is $0.8\times 0.25=0.2$ . Similarly, the probability of getting the prize ticket at the third draw is given by $0.8\times 0.75\times 1/3=0.2$ , and so on.

Figure 12 is the turtleback diagram for the “Luck draw” game. Easily the probability of getting the prize ticket at the first draw is the area of the region

Figure 11. The tree diagram for the “Luck draw” game. The letters “P” and “N” denote the prize ticket and non-prize ticket, respectively.

Figure 12. The turtleback diagram for the “Luck draw” game. The letters in the labels indicates status of each attempt, “P” for prize and “N” for a non-prize ticket. For example, “NNP” means getting non-prize tickets for the first two draws and the prize ticket at the third draw. The percentage next to the label indicates the probability of a prize at the last draw, conditional on the outcome of all preceding draws. For example, “25%” next to “NP” means the conditional probability of getting a prize is 25% in the second draw if the first draw is not a prize. Or in other words, that is the ratio of the area of the slice containing “NP” to all slices after the first slice is taken away.

labelled as “P”, which is 0.2. Following the figure, the probability of getting the prize ticket at the second draw is the area of the region labelled as “NP”, which is $0.8\times 25\%=0.2$ . Similarly, the probability of getting the prize ticket at the third draw is given by $0.8\times 75\%\times 1/3=0.2$ , and so on.

5.4. An Urn Model Example

This can be viewed as an extension of the lucky draw problem in the sense that there are more than one prize tickets here. Note that this example mainly serves to demonstrate that both the tree and the turtleback diagram could be used to solve problems of such a complexity (one can solve this problem quickly by distinguishing the two green balls and apply result of the lucky draw game^{2}). Assume there are 2 greens balls and 3 red balls. The problem is described as follows.

“There are 2 green balls and 3 red balls in an urn. One randomly picks one ball for five times from the urn without returning. Will each draw have the same chance of getting the green ball?”

Figure 13 is the tree diagram for the urn model. We are not going to calculate the probability of getting a green ball for each draw, instead we only do it for the third draw. The probability of getting a green ball at the third draw is give by the sum of the product of edge weights along three paths

$\star \to G\to R\to G\mathrm{,}\star \to R\to R\to G\mathrm{,}\star \to R\to G\to G\mathrm{,}$

which is (2/5)(3/4)(1/3) + (3/5)(1/2)(2/3) + (3/5)(1/2)(1/3) = 2/5. One can similarly calculate that the probability of getting a green ball at other draws all equal to 2/5.

Figure 14 is the turtleback diagram for the urn model. To calculate the probability that the third draw gets a green ball, we simply sum up the area of all regions with a label such that the third letter is “G”. That is, the total area of regions labelled as

“RGG”, “GRG”, “RRGG”, “RRGRG”,

which is

$\frac{3}{5}\times \frac{1}{2}\times \frac{1}{3}+\frac{2}{5}\times \frac{3}{4}\times \frac{1}{3}+\frac{3}{5}\times \frac{1}{2}\times \frac{2}{3}\times \frac{1}{2}+\frac{3}{5}\times \frac{1}{2}\times \frac{2}{3}\times \frac{1}{2}=\mathrm{0.4.}$

The calculation seems a little tedious, but conceptually very simple, as long as one could follow the way the regions are partitioned.

6. Empirical Data

We carried out case studies on over 200 students. This includes students in the elementary statistics class, STAT235 (non-calculus based), at University of Mis

Figure 13. The tree-based approach for an urn model with 2 green balls and 3 red balls. We use the $\star $ to denote the root node, and “R” and “G” for red ball and green ball, respectively. As our interest is the green balls, so tree branches that have seen 2 green balls will not grow any further.

Figure 14. The turtleback diagram for an urn model with 2 green balls and 3 red balls. “R” and “G” are used to denote red ball and green ball, respectively. As our interest is the green balls, so tree branches that have seen 2 green balls will not grow any further. Each letter, “G” or “R”, indicates the outcome of a particular draw. For example, a “RGRG” indicates that the first draw gets a red ball, the second draw a green ball, the third a red and the fourth a green ball.

souri Kansas City (UMKC) during 2012-2013, and students from elementary statistics, MTH231, and elementary probability, MTH331, classes at University of Massachusetts Dartmouth (UMassD) during 2015-2017. These three courses had a fairly different student population. For STAT235, about 30% from engineering, 30% from business, and the rest from such diverse majors as biology, chemistry, psychology, political sciences, education etc. For MTH231, about 80% are from mathematics or data science, and the rest from majors such as computer science, electrical engineering, criminal justice etc. For MTH331, about 75% from computer science or electrical engineering, 20% from mathematics or data science, and the rest from other engineering majors or economics, physics etc. Table 1 gives a summary of students involved in the case studies. The study is carried out as follows. First we explain to students the concept of conditional probability with a non-graph based approach. Then we continue with two exercises. In the first exercise, we explain to students the “Lung disease and smoking” example, with both the turtleback and the tree diagram, and have students solve the “History and war” problem, or vice versa (for different classes we were teaching). In another exercise, we explain the “Lucky draw” example, and have students solve the “Urn model” problem, or vice versa. Due to time constraints on the course schedule, we did not ask students to solve problems using a particular technique followed by its discussion. Rather we discussed both the turtleback and the tree diagrams, and let students choose one of them for problem solving. The following is a breakdown of the number of students involved (Table 2).

We collect two types of data from the case studies, one on students’ preference between graph and non-graph based approach, and the other on students’ preference between the turtleback and the tree diagram. Here, except for the case of non-graph based approach, by preference we mean the students actually used the technique for problem solving, and nearly in all such cases they could apply it correctly in solving the assigned problem; so we use this as measurement of learning outcome (with an understanding that further experiments may be needed to validate this). The results are reported in Table 3. The data collected are quite encouraging. About 78% - 88% students found a graph tool helpful. For the “Lucky draw” and the “Urn model”, fewer students found it helpful. This is possibly because these two problems appear to be harder to students: even a graphical tool may not help them much. Further experiments are needed to validate or understand this.

In terms of a preference for which graphical tool, the results show an interesting pattern. For the “Lung disease and smoking” and the “War and history” example, more students prefer the turtleback diagram to the tree diagram, around 53% - 54% vs 33% - 34%. The “Lucky draw” and the “Urn model” examples exhibit an opposite pattern, more students prefer the tree diagram to the turtleback diagram, around 46% - 48% vs 31% - 34%^{3}. This is probably due to the fact that, in the first two examples, the sample spaces and events involve populations in the usual sense, while the last two examples involve sequential decisions, for which a tree structure that represents the decision dichotomy may be more natural (although in such cases, the concept of conditional probability is not as natural as that in the turtleback diagram). Further experiments are needed to confirm this. The advantage of the turtleback diagram over the tree diagram

Table 1. Students involved in the case studies.

Table 2. Number of students involved in the empirical study breakdown by course and problem.

Table 3. Data collected in the case studies on whether graphs help understand the concept of conditional probability, and the preference between the turtleback diagram and the tree diagram.

appears to decrease as the problem becomes harder, but this is not a serious problem for beginning students as those who most need help from a graphical representation are just those who could not solve simple problems. Moreover, we do not expect one single graphical tool can help solve all the problems, rather different people may use different tools for a particular problem.

7. Potential Research Questions

Many instances of conditional probability occur in sampling without replacement. Tarr and Jones [8] describe a framework for assessing middle school students’ thinking in conditional probability and independence, which is elaborated in [12] . This framework is a levels model, with 4 levels―Subjective, Transitional, Informal Quantitative, and Numerical―subject to all the difficulties such a model has as students transition from one level to another.

Research Question 1: Are turtleback diagrams, as compared to tree diagrams, helpful to students, at any or all of the Tarr-Jones framework levels, in understanding conditional probability. If so, how can we measure and assess the comparative utility of turtleback diagrams compared to tree diagrams?

Research Question 2: Related to Research Question 1, specifically, how helpful are turtleback diagrams in helping students understand conditional probability in the context of sampling without replacement?

Conditional probability is increasingly being introduced into middle school in the United States. The Conference Board of the Mathematical Sciences [31] stated:

Of all the mathematical topics now appearing in middle grades curricula, teachers are least prepared to teach statistics and probability. Many prospective teachers have not encountered the fundamental ideas of modern statistics in their own K-12 mathematics courses... Even those who have had a statistics course probably have not seen material appropriate for inclusion in middle grades curricula. (p. 114)

Research Question 3: Are turtleback diagrams helpful to middle school teachers of probability and statistics in (a) enhancing their own understanding of conditional probability and (b) assisting them to better teach conditional probability? If so, how and to what extent?

8. Conclusions

Motivated by difficulties encountered by many undergraduate students new to statistics, we re-examined the definition and representation of conditional probability, and presented a Venn-diagram like approach: the turtleback diagram. We discussed our graphical tool in the context of other graphical models for conditional probability, and carried out case studies on over 200 students of elementary statistics or probability classes. Our case study results are encouraging and the graph-based approaches could potentially lead to significant improvements in both the students’ understanding of conditional probability and problem solving. While the existing tree diagram is preferred to the turtleback diagram on problems that involve a sequential decision, the turtleback diagram is considered more helpful in settings where the underlying population resembles the usual human population; it is exactly in such situations that weaker students are more likely to need help. Though the turtleback diagram appears very different from the tree diagram, we are able to unify them and show their equivalence in terms of semantics.

Our case studies suggest that it is worthwhile to introduce such graphical tools to students whose success would seem to depend on them. We hope that this will benefit our statistics colleagues who are teaching elementary statistics and students who are struggling with the concept of conditional probability and its application to problem solving. The potential savings in time can be huge. As a conservative estimate, assume each year there are about 1.5 million bachelor’s degrees awarded in US (about 1.67 million awarded in 2009). Assume there are about 200,000 of them have taken an elementary statistics class, and about 10% of them need help and succeed with our proposed approach, and further assume an average class size of 40. If each instructor saves 2 hours of time in each elementary statistics class and each student who benefits from our approach saves 1 hour, then the estimated total amount of time saved is at least 30,000 hours per year in the U.S. alone.

Acknowledgements

The authors are grateful to Professor Yong Zeng at UMKC for kindly pointing to the “Lung disease and smoking” example, and for encouragement and support on some of the case studies.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

NOTES

^{1}In Deacon’s framework, there are three levels of referential relationship in a cognitive process, including iconic, indexical, and symbolic reference, where higher levels are built hierarchically upon lower levels.

^{2}Label the two green balls as G_{1} and G_{2}, respectively. Then the probability of getting a green ball at each draw is simply that of getting G_{1} or G_{2}. Either G_{1} or G_{2} can be treated as the only prize ticket in the lucky draw game thus the probability of getting either one is 1/5, and so the probability of getting a green ball at any draw is always 2/5.

^{3}Since in all cases, the sample size is large enough and the difference between contrast groups is significant, we did not carry out a hypothesis testing using the reported data.

Cite this paper

Yan, D. and Davis, G. (2018) The Turtleback Diagram for Conditional Probability.*Open Journal of Statistics*, **8**, 684-705. doi: 10.4236/ojs.2018.84045.

Yan, D. and Davis, G. (2018) The Turtleback Diagram for Conditional Probability.

References

[1] Rice, J. (1995) Mathematical Statistics and Data Analysis. 2nd Edition, Duxbury Press, Duxbury, Massachusetts.

[2] Johnson, R. and Tsui, K. (2003) Statistical Reasoning and Methods. John Wiley, New York.

[3] Ancker, J.S. (2006) The Language of Conditional Probability. Journal of Statistics Education, 14, 1-5.

https://doi.org/10.1080/10691898.2006.11910584

[4] Mann, P. (2003) Introductory Statistics. John Wiley, New York.

[5] Tversky, A. and Kahneman, D. (1980) Causal Schemas in Judgments under Uncertainty. Progress in Social Psychology, 1, 49-72.

[6] Fischbein, E. and Gazit, A. (1984) Does the Teaching of Probability Improve Probabilistic Intuitions? Educational Studies in Mathematics, 15, 1-24.

https://doi.org/10.1007/BF00380436

[7] Falk, R. (1986) Conditional Probabilities: Insights and Difficulties. Proceedings of the Second International Conference on Teaching Statistics, The Second International Committee on Teaching Statistics, Victoria, 292-297.

[8] Tarr, J.E. and Jones, G.A. (1997) A Framework for Assessing Middle School Students? Thinking in Conditional Probability and Independence. Mathematics Education Research Journal, 9, 39-59.

https://doi.org/10.1007/BF03217301

[9] Tomlinson, S. and Quinn, R. (1997) Understanding Conditional Probability. Teaching Statistics, 19, 2-7.

https://doi.org/10.1111/j.1467-9639.1997.tb00309.x

[10] Tarr, J.E. (2002) The Confounding Effects of the Phrase ‘50-50 Chance’ in Making Conditional Probability Judgments. Focus on Learning Problems in Mathematics, 24, 35-53.

[11] Yánez, G.C. (2002) Some Challenges for the Use of Computer Simulations for Solving Conditional Probability Problems. The 6th International Conference on Teaching Statistics, Cape Town.

[12] Tarr, J.E. and Lannin, J.K. (2005) How Can Teachers Build Notions of Conditional Probability and Independence? In: Jones, G.A., Ed., Exploring Probability in School, Springer, US, New York, 215-238.

https://doi.org/10.1007/0-387-24530-8_10

[13] Arcavi, A. (2003) The Role of Visual Representations in the Learning of Mathematics. Educational Studies in Mathematics, 52, 215-241.

https://doi.org/10.1023/A:1024312321077

[14] Presmeg, N.C. (2006) Research on Visualization in Learning and Teaching Mathematics. In: Gutierrez, A. and Boero, P., Eds., Handbook of Research on the Psychology of Mathematics Education, Sense Publishers, Rotterdam, 205-235.

[15] Tukey, J. (1977) Exploratory Data Analysis. Addison-Wesley, Reading, Mass.

[16] Cleveland, W.S. (1993) Visualizing Data. Hobart Press.

[17] Morris, D. (2016) Bayes’ Theorem Examples: A Visual Introduction for Beginners.

[18] Collins, R. (2017) Bayes Theorem Examples: Visual Book for Beginners.

[19] Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A. and Rubin, D.B. (2013) Bayesian Data Analysis. 3rd Edition, Chapman and Hall, London.

[20] Edwards, A.W.F. (2004) Cogwheels of the Mind: The Story of Venn Diagrams. Johns Hopkins University Press, Baltimore.

[21] Gigerenzer, G. and Hoffrage, U. (1995) How to Improve Bayesian Reasoning without Instruction: Frequency Formats. Psychological Review, 102, 684-705.

https://doi.org/10.1037/0033-295X.102.4.684

[22] Yamagishi, K. (2003) Facilitating Normative Judgments of Conditional Probability: Frequency or Nested Sets? Experimental Psychology, 50, 97-106.

https://doi.org/10.1026//1618-3169.50.2.97

[23] Brase, G.L. (2009) Pictorial Representations in Statistical Reasoning. Applied Cognitive Psychology, 23, 369-381.

https://doi.org/10.1002/acp.1460

[24] Sloman, S.A., Over, D., Slovak, L. and Stibel, J.M. (2003) Frequency Illusions and Other Fallacies. Organizational Behavior and Human Decision Processes, 91, 296-309.

https://doi.org/10.1016/S0749-5978(03)00021-9

[25] Deacon, T.W. (1998) The Symbolic Species: The Co-Evolution of Language and the Brain. W.W. Norton & Company, New York.

[26] Chartrand, G. and Zhang, P. (2011) Discrete Mathematics. Waveland Press, Inc., Long Grove.

[27] Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984) Classification and Regression Trees. Chapman & Hall, New York.

[28] Dasgupta, S. and Freund, Y. (2008) Random Projection Trees and Low Dimensional Manifolds. 40th ACM Symposium on Theory of Computing, Victoria, 17-20 May 2008, 537-546.

https://doi.org/10.1145/1374376.1374452

[29] Yan, D., Huang, L. and Jordan, M. (2009) Fast Approximate Spectral Clustering. Proceedings of the 15th International Conference on Knowledge Discovery and Data Mining, Paris, 28 June-1 July 2009, 907-916.

https://doi.org/10.1145/1557019.1557118

[30] Weiss, N.A. (2012) Introductory Statistics. 9th Edition, Addison-Wesley Pearson Inc., Boston, 193.

[31] Conference Board of the Mathematical Sciences (2001) The Mathematical Education of Teachers. American Mathematical Society, Providence.

[32] Tufte, E. (1983) The Visual Display of Quantitative Information. Graphics Press, Cheshire.

[33] Yau, N. (2011) Visualize This: The Flowing Data Guide to Design, Visualization, and Statistics. Wiley, Hoboken.

[1] Rice, J. (1995) Mathematical Statistics and Data Analysis. 2nd Edition, Duxbury Press, Duxbury, Massachusetts.

[2] Johnson, R. and Tsui, K. (2003) Statistical Reasoning and Methods. John Wiley, New York.

[3] Ancker, J.S. (2006) The Language of Conditional Probability. Journal of Statistics Education, 14, 1-5.

https://doi.org/10.1080/10691898.2006.11910584

[4] Mann, P. (2003) Introductory Statistics. John Wiley, New York.

[5] Tversky, A. and Kahneman, D. (1980) Causal Schemas in Judgments under Uncertainty. Progress in Social Psychology, 1, 49-72.

[6] Fischbein, E. and Gazit, A. (1984) Does the Teaching of Probability Improve Probabilistic Intuitions? Educational Studies in Mathematics, 15, 1-24.

https://doi.org/10.1007/BF00380436

[7] Falk, R. (1986) Conditional Probabilities: Insights and Difficulties. Proceedings of the Second International Conference on Teaching Statistics, The Second International Committee on Teaching Statistics, Victoria, 292-297.

[8] Tarr, J.E. and Jones, G.A. (1997) A Framework for Assessing Middle School Students? Thinking in Conditional Probability and Independence. Mathematics Education Research Journal, 9, 39-59.

https://doi.org/10.1007/BF03217301

[9] Tomlinson, S. and Quinn, R. (1997) Understanding Conditional Probability. Teaching Statistics, 19, 2-7.

https://doi.org/10.1111/j.1467-9639.1997.tb00309.x

[10] Tarr, J.E. (2002) The Confounding Effects of the Phrase ‘50-50 Chance’ in Making Conditional Probability Judgments. Focus on Learning Problems in Mathematics, 24, 35-53.

[11] Yánez, G.C. (2002) Some Challenges for the Use of Computer Simulations for Solving Conditional Probability Problems. The 6th International Conference on Teaching Statistics, Cape Town.

[12] Tarr, J.E. and Lannin, J.K. (2005) How Can Teachers Build Notions of Conditional Probability and Independence? In: Jones, G.A., Ed., Exploring Probability in School, Springer, US, New York, 215-238.

https://doi.org/10.1007/0-387-24530-8_10

[13] Arcavi, A. (2003) The Role of Visual Representations in the Learning of Mathematics. Educational Studies in Mathematics, 52, 215-241.

https://doi.org/10.1023/A:1024312321077

[14] Presmeg, N.C. (2006) Research on Visualization in Learning and Teaching Mathematics. In: Gutierrez, A. and Boero, P., Eds., Handbook of Research on the Psychology of Mathematics Education, Sense Publishers, Rotterdam, 205-235.

[15] Tukey, J. (1977) Exploratory Data Analysis. Addison-Wesley, Reading, Mass.

[16] Cleveland, W.S. (1993) Visualizing Data. Hobart Press.

[17] Morris, D. (2016) Bayes’ Theorem Examples: A Visual Introduction for Beginners.

[18] Collins, R. (2017) Bayes Theorem Examples: Visual Book for Beginners.

[19] Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A. and Rubin, D.B. (2013) Bayesian Data Analysis. 3rd Edition, Chapman and Hall, London.

[20] Edwards, A.W.F. (2004) Cogwheels of the Mind: The Story of Venn Diagrams. Johns Hopkins University Press, Baltimore.

[21] Gigerenzer, G. and Hoffrage, U. (1995) How to Improve Bayesian Reasoning without Instruction: Frequency Formats. Psychological Review, 102, 684-705.

https://doi.org/10.1037/0033-295X.102.4.684

[22] Yamagishi, K. (2003) Facilitating Normative Judgments of Conditional Probability: Frequency or Nested Sets? Experimental Psychology, 50, 97-106.

https://doi.org/10.1026//1618-3169.50.2.97

[23] Brase, G.L. (2009) Pictorial Representations in Statistical Reasoning. Applied Cognitive Psychology, 23, 369-381.

https://doi.org/10.1002/acp.1460

[24] Sloman, S.A., Over, D., Slovak, L. and Stibel, J.M. (2003) Frequency Illusions and Other Fallacies. Organizational Behavior and Human Decision Processes, 91, 296-309.

https://doi.org/10.1016/S0749-5978(03)00021-9

[25] Deacon, T.W. (1998) The Symbolic Species: The Co-Evolution of Language and the Brain. W.W. Norton & Company, New York.

[26] Chartrand, G. and Zhang, P. (2011) Discrete Mathematics. Waveland Press, Inc., Long Grove.

[27] Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984) Classification and Regression Trees. Chapman & Hall, New York.

[28] Dasgupta, S. and Freund, Y. (2008) Random Projection Trees and Low Dimensional Manifolds. 40th ACM Symposium on Theory of Computing, Victoria, 17-20 May 2008, 537-546.

https://doi.org/10.1145/1374376.1374452

[29] Yan, D., Huang, L. and Jordan, M. (2009) Fast Approximate Spectral Clustering. Proceedings of the 15th International Conference on Knowledge Discovery and Data Mining, Paris, 28 June-1 July 2009, 907-916.

https://doi.org/10.1145/1557019.1557118

[30] Weiss, N.A. (2012) Introductory Statistics. 9th Edition, Addison-Wesley Pearson Inc., Boston, 193.

[31] Conference Board of the Mathematical Sciences (2001) The Mathematical Education of Teachers. American Mathematical Society, Providence.

[32] Tufte, E. (1983) The Visual Display of Quantitative Information. Graphics Press, Cheshire.

[33] Yau, N. (2011) Visualize This: The Flowing Data Guide to Design, Visualization, and Statistics. Wiley, Hoboken.