Over the last few decades, web-based learning has become more and more common and has been recognized as a potentially very effective educational method and resource. Web-based learning systems automatically collect and record a huge amount of data on students’ learning behavior as students use them. To exploit this goldmine of educational data and use it to understand better how students actually proceed with learning, data-mining techniques have begun to be applied to educational data. This active research field is called educational data mining (Romero & Ventura  ; Romero, Ventura, & García  ; Baker & Yacef  ; Romero, Ventura, Pechenizkiy, & Baker  ; Romero & Ventura  ).
As a relatively recent development within this field, text mining techniques have been applied in educational research, allowing researchers to analyze text data such as formal text documents as well as informal ones like e-mails, chat messages, digital diaries, and online questions. Studies adopt a text mining approach to educational data include Hung  , who used cluster analysis to examine extensive literature on e-learning, and Abdous and He  and He  , who analyzed chat messages and online questions using text mining techniques, again including cluster analysis.
2. Materials Used for Text Mining
2.1. Learning Program Posters
2.2. About Students
The authors of the posters were university freshmen enrolled in the Creative Engineering Education Program in the university’s faculty of engineering. All students belonged to one or the other of the following two courses depending on their choice at their entrance examination: “Materials and Energy” (ME hereafter; 62 students) and “Computer and Social Engineering” (CS; 42 students) course. Specific topic areas covered by each course are listed in Table 1.
3.1. Frequently Appearing Words
Posters prepared by 104 students, pooled between the two courses, were employed for the analysis. The number of words extracted from the posters was
Table 1. Specific areas covered by the two courses.
17,983 in total, 2975 of which were unique. The 100 most frequently appearing words are summarized in Table 2. As seen in the table, overwhelmingly common words include “development” and “technology,” which seems natural in that the authors of the posters are students in the faculty of engineering. Some words such as “goal,” “study,” and “learn” would be used in a general sense to construct a learning program. It is noteworthy that even though the students were university freshmen only three months into their program, some of them had a good command of technical engineering terms such as “live body,” “sugar chain,” “catalyzer,” “macromolecule,” and “synthesis.”
To distinguish between general words on posters and specific words giving information on students’ learning programs and career goals and to examine how frequently given words were used by the students, a hierarchical cluster analysis was conducted it identified words that appeared in the posters at least 17 times and grouped them into five clusters, as shown in Table 3. The five clusters can be characterized as follows.
Cluster1 consists of the following five words: “challenge,” “present situation,” “change,” “value,” and “realization.” These words suggest that the students are highly motivated to create something new and valuable.
Cluster 2 consists of the following four words: “goal,” “career,” “study,” and “plan.” These words are commonly used among the students to construct posters.
Cluster 3 is characterized by the following typical words: “technology,” “development,” “universe,” “research,” “disaster,” and “earthquake.” These words suggest that the students are willing to work in research and development to deal with future risk or unknown territory. In particular, after the Great East Japan Earthquake in March 2011, they would have become more aware of disaster- prevention measures and the role of engineering therein.
Cluster 4 is characterized by the following typical words: “efficiency,” “method,” “power generation,” “nature,” “healthcare,” “cost,” “light,” “live body,” and “use.” These words suggest that the students are interested in improving existing technologies or saving energy and resources, in addition to creating whole new technologies/structures/concepts.
Table 2. Top 100 most frequently appearing words and their frequencies of appearance.
Table 3. Hierarchical cluster analysis of words that appeared at least 17 times.
Cluster 5 is characterized by the following typical words: “solution,” “problem,” “design,” “application,” “think,” “knowledge,” “necessary,” and “possibility.” These words suggest that the students value knowledge and problem-solving thought to overcome present issues.
3.2. Differences between the Two Courses
For the analysis of differences between the two courses, their data were separated. Taking into consideration the frequently used words previously found, the following three coding rules were produced to formulate groups of words used in a similar context.
Human beings: “disaster,” “earthquake,” “human beings,” “robots,”“people.”
Technology: “development,” “technology,” “research,” “efficiency,”“cost.”
Value creation: “challenge,” “present situation,” “change,” “value,” “realization.”
For example, according to the first coding rule, if a sentence in a poster contains at least one word such as “disaster,” “earthquake,” or “human beings,” the code “human beings” is given to the sentence.
Table 4 is a cross-tabulation table that compares the appearance ratios of codes under the two courses. As we can see from the table, the appearance ratio of the code “human beings” under the ME course was lower than that under CS. In contrast, the code “technology” was under ME than under CS. These results were statistically supported by chi-squared tests; both differences were significant at the 1% level. Overall, it appeared that CS courses are more human oriented while ME courses are more technology oriented, which is natural given the content of the courses. Finally, the appearance ratio of “value creation” was not significantly different across the two courses at the 10% level; that is, students in both courses used words related to value creation with approximately the same frequency.
4. Concluding Remarks
The present study examined university students’ learning program posters using text mining techniques. It was found that even though the students were university freshmen and only three months had passed between the beginning of their university engineering education and their preparation of the posters, their learning programs and career goals were rather concrete and well adapted to
Table 4. Cross-tabulation of frequency of appearance and appearance ratio of each code for each of the two courses.
Note: *denotes significance at the 1% level.
their fields and courses. This result suggests that the majority of the students had thought ahead about their future careers before admitted to university, instead of only after.
The results of the present study could be enriched by the following expansions. First, it might be interesting to apply text mining techniques to the learning programs of students majoring in fields other than engineering and compare the results to the current results. Second, it would be useful to trace how students’ career plans change as their education advances. These issues should be tackled by future research.
1The term “C-plan” refers to the following three Cs: curriculum, career, and creativity.
 Romero, C., Ventura, S. and García, E. (2008) Data Mining in Course Management Systems: Moodle Case Study and Tutorial. Computers & Education, 51, 368-384.
 Hung, J.-L. (2012) Trends of E-Learning Research from 2000 to 2008: Use of Text Mining and Bibliometrics. British Journal of Educational Technology, 43, 5-16.
 Abdous, M. and He, W. (2011) Using Text Mining to Uncover Students’ Technology-Related Problems in Live Video Streaming. British Journal of Educational Technology, 42, 40-49.
 He, W. (2013) Examining Students’ Online Interaction in a Live Video Streaming Environment Using Data Mining and Text Mining. Computers in Human Behavior, 29, 90-102.