The importance of displaying information in ways that are understandable has had profound repercussion dating back to the 1854 cholera epidemic in London and to the more recent disastrous 1986 launch of the space shuttle Challenger . The consequences of poor display design were also highlighted by the controversy over the “butterfly” ballot in the 2000 presidential election. As  noted, the fate of the entire election may have hinged on the design of this ballot. This is not a one-time issue. The fate of other elections twenty years later may also hinge on the design of the ballot . If the display design of something as simple as a ballot can have such far-reaching effects, proper design of complex displays for air traffic control, nuclear power plants, and management information systems may be even more critical. There are several excellent books about the efficient and optimal design of visual data   . This study investigated several variables that influence the use, interpretation, and exploration of visual displays. The hope is that the results will lead to recommendations that can assist system users (e.g., engineers, managers, web designers, medical professionals) in transforming complex data into information and usable knowledge.
The research background on search processes of data displays and data analysis (i.e., extracting knowledge, information, and insights from data) has focused mostly on the methods and variables with few theories as to the mechanisms . Perhaps the best theoretical treatment of this topic emphasizes the cognitive process of schemas and sensemaking. As  explain, the mind creates cognitive structures that attempt to represent external reality. These structures go by various names (e.g., mental models, frames, scripts, prototypes), but the term “schema” seems to capture the notion the best. In a process known as “sensemaking”  , the user creates a schema and then explores the data in an effort to confirm, revise, or replace the schema.
The following study explores several variables within the framework of sensemaking. In general, we will attempt to address some problems related to the type of data display (e.g., tables, bar graphs, line graphs), the complexity of the display, the nature of the information to be extracted, and the experience level of the user. Many of these variables have been investigated by previous researchers (e.g.,   ). But because of mixed results, more research is needed for a more complete understanding of the search processes. For example, despite the old adage that “a picture is worth a thousand words”, the evidence that graphs lead to better performance than tables is inconclusive    . Whether graphs are superior to tables depends on many factors, including the nature of the task and the types of questions asked    . We hope to supply additional evidence to help resolve this ongoing issue.
Another important variable is user experience, both with a particular data display and with displays in general . For example,  found that user’s experience with a particular data display led to decreased reaction times for tables, line graphs, and bar graphs, with the greatest decrease noted for tables. They also found that, in general, accuracy (getting the right answer) increased with greater experience with a particular display.
Complexity of the display is also an important variable in understanding how information is extracted from data sets . Typically, studies examine the effects of complexity by selecting two or more levels of complexity and maintaining those levels throughout the experiment (e.g.,   ). In many software applications, however, complexity is often under the control of the user (e.g.,   ). Software developers often build in the ability for the user either to increase the amount of information displayed on the screen, or reduce the clutter (e.g.,  ). Other authors  describe what they call the abstract/elaborate technique for user-interaction with visual displays. When users are presented with a complex display, they are often given the tools with which to simplify the display. Likewise, if users are given a simple display, they can add more detail if they desire more information. For example, the word processing software used to create this report allows the user to work with just characters on a blank screen, or they can choose to 1) view all the hidden characters (e.g., paragraph marks, spaces between words, tab markers); 2) show the margins, menus, and a tool bar along the top; and/or 3) see a status bar and more tools at the bottom. The present study will allow users to manipulate the complexity of the data display.
Based on this review of the literature, the following hypotheses are proposed:
Hypothesis 1: Performance (e.g., response time, accuracy) on a task comparing graphs and tables will depend on the difficulty (conceptual level) of the questions asked in regards to the data display. User performance when using graphs will be superior to tables for the more difficult questions.
This hypothesis recognizes that there are circumstances in which graphs are not necessarily superior to tables, contrary to conventional wisdom. Several researchers have shown that task characteristics are important in deciding what display to use    . What questions the user is asked is part of the task, and as the questions become more difficult (e.g., as the questions progress from simple point reading to discerning trends and finding general patterns), graphs should be more helpful  . Another way to view this is to suggest that the graphical displays will be more beneficial in the users' attempts to build and edit their mental schemas for difficult problems.
Hypothesis 2: Experienced users of a system will explore more features of that system and make more changes to the display than novice users.
This prediction is consistent with findings that more advanced users, compared to novice users, take more total time to respond when solving complex problems . In other words, more advanced users exert more effort in the sensemaking process than novice users. We predict, therefore, that when users are exposed to complex data displays, the experienced users will change the displays more frequently than the novices. This exploratory behavior reflects a more sophisticated and complete search process for the experienced users .
The study adopted an experimental approach to the investigation. We recruited novice and experienced users of a Navy force-modeling tool, and assigned them at random to a between-subject condition (display complexity). Two other conditions (display format and questions difficulty) were within-subject variables. We recorded data on response accuracy, response time, and search behavior.
A total of 64 students participated in the study. Thirty-nine of these participants were students at a moderately large mid-southern university. The remaining 25 participants were students at the Naval Postgraduate School.
Each participant viewed the data presentation on a desktop PC. The data display was either tabular (spreadsheet) or a colored stacked-bar graph (see Figures 1-3). The data were outputs from a Navy force-modeling tool used by Navy personnel managers to project officer personnel inventories. Participants had the ability of either adding or subtracting categories of data from the tabular or the graphical display. The initial data presentation was either a blank table/graph for the simple condition, or a full table/graph for the complex condition. Below the data presentation was a box where multiple-choice questions were displayed. Each subject responded to a total of 30 questions, 15 for the table display format and 15 for the graphical format.
2.4. Research Design
The experimental design was 2 × 2 × 2 × 3 mixed design. There were two between-subject variables: display complexity (low and high) and user experience (novice and intermediate). There were also two within-subject variables: display format (table and graph) and question difficulty (easy, medium, and hard). Performance was measured by three dependent variables in the study: 1) time (in seconds) to answer each question, 2) accuracy (the percentage of questions answered correctly), and 3) the frequency with which the user altered the display. Accuracy was not a very useful performance measure due to the high level of correct responses (even for the “difficult questions”) and so will only be reported once in the results section.
2.4.1. Between-Subject Variables
The first between-subject variable was display complexity. Display complexity was defined as how much data was initially presented to each participant. Participants in the low complexity condition were presented with a table or graph that was not populated with any data. They could add as much data as necessary to answer a particular question by selecting the type of data desired. An example of a low complexity graph is shown in Figure 1.
Participants in the high complexity condition were initially presented with a table or graph fully populated with data. These participants could remove data from the display that they considered irrelevant to answer a particular question. An example of a fully populated, high complexity table is shown in Figure 2. A high complexity graph is shown in Figure 3. The ability to control the display is similar to pan and zoom designs of complex displays  and not unlike the pinch and spread features for Google Maps.
Figure 1. Simple (low complexity) graph.
Figure 2. High complexity table (for a low complexity table, imagine the same figure without any values in the body of the spreadsheet).
The second between-subject variable was user experience. The 39 undergraduate participants were considered to be novice users because they had no experience with the actual data displays and minimal, if any, experience with the Navy terminology that was used to describe the displays. The remaining 25 graduate students were considered intermediate users. They had more formal education, had taken more math and statistics courses, had some limited experience with data displays similar to the ones used in this experiment, and were very familiar with Navy terminology. The average novice users had 2.5 years of college and had taken two math or statistics courses. The average intermediate user had 5.5 years of college and had taken 6.5 math or statistics courses. Ideally, expert users, rather than intermediate users, would have been desirable. Unfortunately, Navy personnel management experts were not available for this study.
2.4.2. Within-Subject Variables
The first within-subject variable was question difficulty. Question difficulty was broken into three levels: easy, moderate, and hard. Multiple-choice questions were developed based on interviews with subject matter experts (personnel familiar with the Navy force-modeling tool used here). Each level of difficulty consisted of five questions for a total of 15 questions. The easy questions consisted of point-reading questions (e.g., “How many total survivors are there with 15 years of service?”). The answer to a point-reading question was contained within a single cell of the table or at a single point on the graph. Moderate and hard questions consisted of trend and comparison questions that required a more “conceptual” understanding of the data and numerical relationships (e.g., “As years of service increase from 6 - 12, the number of survivors tends to increase, decrease, stay the same, or can’t tell?”).
The second within-subject variable was display format (tables versus graphs, see Figure 2 and Figure 3). Two different question sets were used (one for tables, another for graphs). Each subject saw all questions. Display format and question set were counterbalanced to control for order effects and possible differences in the question sets. Question difficulty, however, was not counterbalanced.
Figure 3. High complexity graph.
All participants had question sets that started with easy questions and progressed through moderate and hard questions. A counterbalanced design was considered unnecessary in this case because all participants would receive the same, constant conditions and the task would be easier to master if it started out easy and progressed to more the more difficult items. Participants were randomly assigned to conditions using a block randomization procedure.
Participants were given printed instructions that explained the various data presentation formats and how to operate the different on-screen functions. The printed instructions also contained definitions for the various labels that were found on the data displays. They were then given an opportunity to answer a practice set of five questions using both data presentation formats (a table and a graph). Participants had to respond correctly to all practice questions before they were allowed to enter the experimental condition. After successfully completing the practice session they began the first experimental session with either the table or graph format (which format presented first was randomly selected). When they had finished the first set of 15 questions, they received feedback on how many questions they had answered correctly. They were then presented with the next experimental condition consisting of either the table or graph and a different set of 15 questions.
Performance was measured by three dependent variables as noted above: 1) time (in seconds) to answer the set of questions, 2) accuracy of the response, and 3) the frequency with which the user altered the display. These measures were recorded by the software used to administer the study and the data were downloaded into a statistical package (SPSS) for analysis. The calculation formulas for the different statistical analyses below were embedded in the statistical software. All data were analyzed using a mixed Analysis of Variance (ANOVA) model to test the effects of two between-subject variables and two within-subject variables. As noted above, the between-subject variables were 1) display complexity (simple versus complex) and 2) user expertise (novice versus intermediate). The within-subject variables were 1) display type (table versus graph) and 2) question difficulty (easy, moderate, and hard). These variables formed a 2 × 2 × 2 × 3 mixed design.
The findings are reported in terms of the hypotheses that were stated above. Hypothesis 1 predicted an interaction between the type of display and the difficulty level of the question. Specifically, differences in performance between graphs and tables would depend on the level of difficulty of the questions. There were 15 questions in the study that progressed in difficulty (the first items were basic point-reading and data-comparison questions, the latter items required a more conceptual understanding of the relationships). For the purpose of analysis, the questions were organized into sets of five questions resulting in a difficulty variable with three levels (easy consisted of items 1 - 5, moderate consisted of items 6 - 10, and hard consisted of items 11 - 15).
Figure 4 plots performance (response time) as a function of display (table versus graph) and difficulty (easy, moderate, and hard). (Accuracy as a performance measure was not used for reasons noted above.) The figure indicates that the type of display does interact with the difficulty level of the questions as predicted in Hypothesis 1. Closer inspection of the figure indicates that response time for graphs was faster than for tables when the user worked on more difficult questions. The statistical analysis supports the predicted interaction—F (2, 120) = 135.63, p < 0.001, MS = 84,018, Eta Square = 0.69. The results suggest that the users had difficulty with the table displays when the questions became more conceptual. Interestingly, experience was unrelated to difficulty and display type (both F values < 1.00), but the novices performed faster overall (M = 59.37 sec.) than the intermediate users (M = 71.34 sec.)—F (1, 60) =4.39, p < 0.05, MS = 13,447, Eta Square = 0.07. Also of interest is the finding that novices, despite performing faster than intermediate users, were less accurate (Novice M = 74%, Intermediate M = 87%)—F (1, 60) = 6.54, p < 0.05, MS = 1.25, Eta Square = 0.10.
Hypothesis 2 stated that the intermediate users would engage in more display changes than the novice users. The software that recorded time on the task also revealed which screen elements the subject inspected by recording each time the user clicked the mouse and changed the display. Figure 5 plots the number of display (screen) changes for the novice and intermediate users across the different difficulty levels of the questions. The figure reveals that the intermediate users do make more changes than the novices as predicted by Hypothesis 2. In addition, the figure shows that this difference is greatest for the harder questions. An analysis of the mouse clicks (changes in the display screen) revealed that the intermediate users examined significantly more screen elements than the novice users—F (1, 60) = 16.68, p < 0.001, MS = 669.51, Eta Square = 0.22 and that
Figure 4. Response time (average number of seconds to answer each question) as a function of type of display (table versus graph) and question difficulty (easy, moderate, and hard).
Figure 5. Number of changes to the display (screen) as a function of question difficulty and user experience (novice versus intermediate).
this difference interacted with question difficulty—F (2, 120) = 9.88, p < 0.001, MS = 53.23, Eta Square = 0.14. Additional findings were that there was no main effect for display complexity, and complexity did not interact with experience (both F values < 1.00).
The present study examined user experience, display complexity, display type, and task difficulty as variables affecting the user’s ability to explore complex management data. Previous research on the superiority of graphs over tables as a display feature had revealed mixed results      . And it is clear that which display type produces the best performance depends on many other factors. One of those factors is the difficulty of the task  . Given a difficult problem to solve that requires the use of data, we hypothesized that graphs would lead to better performance than tables (Hypothesis 1). The results of the study supported this hypothesis. As the questions progressed from simple, point-reading items, to more complex questions regarding trends and general relationships, the graphs allowed the user to answer in less time compared to the table displays.
Hypothesis 2 proposed that the advanced users would engage in more exploratory behavior than the novice users. The results showed that the intermediate users did explore more of the data on the screen than the novice users. This increased exploratory behavior may be because the experienced users have more content knowledge, are wary of “trick” questions, and want to make sure they have all the information they need to answer the question. As  noted, “the study of visual displays of quantitative information is largely the study of visual search” (p. 284), and there are several models that outline the components of these search processes      . Possibly, what our results reveal is that these search processes are more elaborate and extensive for the more experienced users because they have a more fully developed schema for sensemaking.
These results have implications for designing more effective and efficient data displays. The first thing that can be recommended is that graphs should be used rather than tables, at least for complex data sets such as those in the personnel management task employed in this study. There is good evidence from this study that graphs are an effective device for presenting complex data. There was one condition under which graphs were superior to tables (i.e., when answering difficult questions), but in no case were graphs ever significantly worse than tables.
The second recommendation for designing data displays is that the designers should consider the difficulty level of the task and the expertise of the users. Experienced users, compared to novices, make more adjustments to their displays (as revealed by more screen changes), especially for the more difficult tasks. If the problem is easy, the advanced users make few adjustments to the display (but still more than novices). If the problem is difficult, however, advanced users make many more changes to the display (again, compared to novices). This difference in screen changes between novices and advanced users probably explains why novices responded faster than intermediate users. The speed of the response did not, however, coincide with accuracy; both groups were highly accurate. Because many software applications deal with difficult problems, designers should aim to create displays that require few changes. This would benefit both the novice and the expert, but would be especially beneficial to the expert who would not have to spend as much time adjusting the screen and could react faster without a penalty for accuracy.
Author Notes and Acknowledgements
The views expressed in this paper are those of the authors and are not necessarily those of the Department of the Navy or National University.
Send correspondence concerning this article to: Charles Tatum, Ph.D., Department of Psychology, National University, 3429 Yonge Street, San Diego, CA 92106. firstname.lastname@example.org.
We wish to thank the following individuals for their assistance in the data collection and conceptual design of this study: Dr. Priti Shaw (University of Michigan), Julie Filizetti (Navy Postgraduate School), Tim Brogdon (Navy Personnel Research, Studies, and Technology), and Gary Ropp (Navy Personnel Research, Studies, and Technology).
 Zhang, P. and Soergel, D. (2014) Towards a Comprehensive Model of the Cognitive Process and Mechanisms of Individual Sensemaking. Journal of the Association for Information Science and Technology, 65, 1733-1756.
 Powers, M., Lashley, C., Sanchez, P. and Shneiderman, B. (1984) An Experimental Comparison of Tabular and Graphic Data Representations. International Journal of Man-Machine Studies, 20, 545-566.
 Saket, B. Endert, A. and Stasko, A. (2016) Beyond Usability and Performance: A Review of User Experience-Focused Evaluations in Visualization. BELIV Workshop, Baltimore, 24 October 2016, 133-142.
 Yoghourdjian, V., Archambault, D., Diehl, S., Dwyer, T., Klein, K., Purchase, H. and Wu, H.Y. (2018) Exploring the Limits of Complexity: A Survey of Empirical Studies on Graph Visualisation. Visual Informatics, 2, 264-282.
 Casali, J.G. and Gaylin, K.B. (1988) Selected Graph Design Variables in Four Interpretation Tasks: A Microcomputer-Based Pilot Study. Behaviour and Information Technology, 7, 31-49.
 Dwyer, T., Lee, T., Fisher, D., Quinn, K.I., Isenberg, P., Robertson, G. and North, C. (2009) A Comparison of User-Generated and Automatic Graph Layouts. IEEE Transactions of Visualizations and Computer Graphics, 15, 961-968.
 Xing, J. (2007) Information Complexity in Air Traffic Control Displays. Civil Aerospace Medical Institute, Federal Aviation Administration.
 Nekrasovski, D., Bodnar, A. McGrenere, J., Guimbretière, F. and Munzner, F. (2006) An Evaluation of Pan and Zoom and Rubber Sheet Navigation with and without an Overview. Human Factors in Computing Systems-Reaching through Technology CHI 2006 Proceedings, Montréal, April 2006, 11-20.
 Yi, J.S., Kang, Y. and Stasko, J.T. (2007) Toward a Deeper Understanding of the Role of Interaction of Information Visualization. IEEE Transactions of Visualizations and Computer Graphics, 13, 1224-1231.
 Wickens, C.D. and Andre, A.D. (1990) Proximity Compatibility and Information Display: Effects of Color, Space, and Objectness on Information Integration. Human Factors, 32, 61-77.
 Sparrow, J.A. (1989) Graphical Displays in Information Systems: Some Data Properties Influencing the Effectiveness of Alternative Forms. Behaviour and Information Technology, 8, 43-56.
 Chi, M.T.H., Feltovich, P.J. and Glaser, R. (1982) Categorization and Representation of Physics Problems by Experts and Novices. Cognitive Science, 5, 121-152.
 Cleveland, W.S. and McGill, R. (1984) Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods. Journal of the American Statistical Association, 79, 531-554.
 Gillian, D.J. and Neary, M. (1992) A Componential Model of Human Interaction with Graphs: II. Effects of the Distances among Graphical Elements. Proceedings of the Human Factors Society Annual Meeting, 36, 365-368.
 Lohse, J. (1991) A Cognitive Model for the Perception and Understanding of Graphs. Human Factors in Computing Systems-Reaching through Technology, CHI 1991 Conference Proceedings, New York, March 1991, 137-144.
 Simkin, D. and Hastie, R. (1987) An Information-Processing Analysis of Graph Perception. Journal of the American Statistical Association, 82, 454-465.