IELTS and TOEFL iBT are two of the most widely available and accepted, the most authentic, objective and researched language proficiency tests. They are originally and mainly designed to act as a proof of English proficiency as a foreign language for non-native English speakers with a plan to study in a foreign higher institution where they are required to communicate effectively in the classroom, on the campus and even in the culture. However, an increasing number of domestic universities in the countries of non-English speaking world begin to acknowledge the results of these two international standardized English tests when they are admitting Ph.D. or master candidates. Take China for example, Peking University, one of the top universities, requires its Ph.D. candidates to have a certificate of Academic IELTS (no less than Band 7) or TOEFL iBT (no less than 95). Otherwise, those candidates have to participate in the English test in accompany with other subject tests administered by the university itself. Some of the candidates choose to take either of these two tests because they have more chances to take the exam. Despite shared purpose and ambition, these two test batteries differ widely in such aspects as testing approach and criteria. Therefore, candidates are faced with one difficult choice: which test to take? When they turn to English teachers for advice, teachers may provide some suggestions based on their own experience; or when they search for the answer on the Internet; in most cases, the analysis given is simple and not evidence-based. One of the examples is that Ross  simply says TOELF iBT is easier than IELTS due to its consistent test format. As a matter of fact, things are much complicated than that. To provide a reliable guidance, this article aims to offer a comprehensive analysis by comparing the reading parts of these two international language tests.
2. Framework for Comparison
Bachman’s  framework of test method facet is one of the most influential frameworks to be consulted when designing and evaluating a test. However, this framework is formulated in a much broader and more general sense. By which, it means the framework can be applied into speaking, writing, listening and reading test. Meanwhile, Alderson  explained variables affecting the nature of reading in a full-length fashion. To make the comparison framework more specific for reading tests, elements in these two systems are taken into consideration. Under such circumstances, a new comparison framework is formulated. Experimental researches and theoretical literature concerning the essential components in the framework are presented to lay a foundation for the proceeding analysis and interpretation.
2.1. Characteristics of Input Format: Length of Text
The length of reading passages is an important variable affecting the nature of reading. Bachman  and Chastain  maintained that the difficulty of comprehending one reading passage would increase with the length of the reading passage because it would have a higher demand of testees’ memory to retain and process a heavier load of information. However, empirically, some relevant researches have yielded ambiguous results. In a series of studies (Gaite and Newsom  ; Mehrpour and Riazi  ; Wang  ; Jalilehvand  ), no significant difference was shown in students’ performance on two versions of texts: the lengthy one and the shortened one. Nevertheless, there are some studies showing positive results. Commander and Stanwyck  focused more on the impact of passage length on the illusion of knowledge and the monitoring strategy which significantly contributed to the comprehension. It turned out readers of shorter passages were more likely to evoke the illusion of knowledge, while those reading the longer version of the passage elicited more accurate monitoring. As one of the most recent researchers of the topic in question, Minryoung  explored the effect of both text length and question type on learners’ (both college students and high school students) test performance and perception. He concluded that significant differences existed between the two types of tests for students from two different learning levels. To be more specific, students of the advanced and intermediate-level performed considerably better with the reading comprehension tests designed from longer version of texts. Similar correlations between students’ test performance and text length were found in those of Rothkopf & Billington  and Cha  .
2.2. Characteristics of Input Language: Lexical Features
Lexical features play a significant part in reading tests. According to Alderson  , vocabulary tests hold a powerful predictability for testees’ performance on reading tests. He believes that the single best predictor for comprehending texts is to measure testees’ lexical knowledge. This belief is supported by some researches (Laufer  ; Graham & Watts  ; Golkar & Yamini  ). Bachman  developed two facets of variables in terms of vocabulary: frequency and specialization. Lexical diversity and density as well as readability are also analyzed.
In terms of frequency, Bachman  declared that the level of difficulty in understanding the text had a negative correlation with the frequency of words, which means passages with more high frequency words are more accessible than those with more low-frequency words. This can be explained by theories. Morton  proposed the logogen model in which frequency is an important variable in word recognition. It takes less time and less information for readers to activate words with high frequency. The opposite holds true for low frequency words. Forster & Bednall  furthered the theory with their own model, the autonomous search model which stated that words in mental lexicon are arranged in an order based on frequency with high frequency words in the front while low frequency word at the back. That is why more time and information are commanded to recognize and activate the low frequency words. Another explanation put forward by Gough  claimed it took more procedures to process low frequency words, compared with high frequency words. As stated by him, low frequency words have to be processed through phonological system and then delivered to comprehending procedure. For high frequency words, they can avoid phonological media and reach straight to the words in the mental lexicon with the help of visual information of that particular word. On top of that, in the study of investigating the variables affecting text difficulty, Viking  noticed that the easier the texts, the higher the number of important or basic words in the texts.
Specialization means that some words are related to particular technical register. Bachman  indicated that the specialization degree of vocabulary was positively associated with that of textual difficulty. In the academic setting, more attention has been diverted to academic vocabulary. Paribakht and Webb  investigated the relationship between academic word coverage and testees’ performance on CanTEST (one standardized language proficiency test for the purpose of college admission). The results did not show any correlation between academic word coverage and listening and reading comprehension. Regardless of this, they strongly call for refining corresponding tests used for university admission purposed in terms of lexical specification because students can not afford to neglect the significance of knowing academic vocabulary for academic success in the university. One of tools utilized in their study is Academic Word List, developed by Coxhead  , including 570 word families derived from academic written text. However, some researches are turning to Academic Vocabulary List  a list based on a representative and larger corpus, based on lemmas instead of word families, not based on previous lists such as General Service List.
2.2.3. Lexical Diversity and Lexical Density
The Typo/Token Ration (TTR) refers to the ration of the number of types (different words) to the number of the tokens (the running words) in the text. It is a simple index of lexical diversity. However, Richards  analyzed children language developed at different stages with TTR. It turned out that the measure could not distinguish them and the result even showed that the ration may decrease as the child grew older. Further, McEnery and Wilson  suggest that it just indicates the frequency of a new “word-form” in one text. Despite this, Type/Token Ratios are utilized in the present study due to the following reason. Kettunen  compared TTR with its elaborate version MATTR (The Moving-Average Type-Token Ratio) and the other common tool (Juola complexity figures) used for measuring the morphological complexity of language. The results of the study showed that TTR ordered the language in a significantly similar way with MATTR and thus confirmed that TTR can be used to show morphological complexity.
The proportion of content words to the total number of word in the text is termed as lexical density, which is a concept developed by introduced by Halliday  . But Ure  introduced the distinction between content words that possess lexical properties such as verbs, nouns, adjectives, and often also adverbs and function words that have grammatical-syntactic functions such as prepositions, pronouns, interjections, conjunctions and count words. They agreed that a text with a higher lexical density is more difficult to comprehend than one with a lower lexical density. This conclusion has something to do with the notion of information packaging. When a text is packed with a higher percentage of content words, it will accommodate more information carried by those content words.
Readability is defined as the easiness with which readers read and understand a particular text. It can be calculated by a readability formula, a method used to offer a numerical estimate and predicator of the text’s difficulty. Up to the present moment, various readability formulas (Gunning Fog, Automated Readability Index, Linsear Write Formula, Flesch Reading Ease score) have been developed by factoring in text-related features such as word difficulty and sentence length. However, this does not mean that they are flaw-free. Standal  criticized the formula by arguing that the text-related factors utilized in the formula such as the length of word and sentence are not comprehensive enough to measure the difficulty. Carrell  made an addition to the point that by maintaining that the formula ruled out reader-specific elements such as readers’ purpose and their background knowledge. Another issue worth mentioning by Carrell  is how these formulas are associated with readers’ ability. These formulas determine the predictability by comparing various texts at different levels with readers’ ability from the first language. However, the potential readers of texts from language proficiency testing are a large population of non native speakers. To some extent, the formula originally developed for first language population hold small significance for some tests. But for IELTS and TOEFL iBT test takers, they will be admitted into universities where they are supposed to read materials at the local college level in their native language, in this case, English. Therefore, for this study, this formula still holds revelance. Since Flesch Reading Ease Formula is the most widely acknowledged and favored formula in various researches, it will also be used in this study. FER (Flesch Reading Ease) assumes that the lower the score the text obtains, the more difficult the text is to comprehend.
2.3. Characteristics of Input Language
2.3.1. Grammatical Intricacy
Grammatical features can also put obstacles on readers’ way to comprehend texts. Givón  stated that grammatical complexity was closely related to the usage of passives, negatives, imperative, interrogatives and subordination. In the discussion of a multitude of linguistic parameters that complicate the process of comprehending the materials, Berman  commented that the heaviness and opacity of sentences’ constituent structure led subject matter or topic to readers’ difficulty in parsing sentences through identifying the basic constituents such as subject-predicate-object. To quantify the heaviness and opacity, this study will adopt Halliday’s way to measure grammatical intricacy, which checks how simple clauses are connected in a clause complex at the clausal level. It is measured by the percentage of the number of ranking clauses to the number of clause complexes in the text  . Ranking clauses include hypotactic and paratactic clauses. Paratactic clauses, also known as independent clauses, are finite clause. Hypotactic clauses, also known as dependent clauses, can be either finite clauses or non-finite clauses. Clause complexes refer to the unit formed by linking two or more clauses together. In the notational conventions, Systemic Functional Linguistics uses ||| as the boundary of clause complexes, and signs || at the start of a ranking clause. According to the formula of grammatical intricacy, it assumes that the higher the score is, the more intricate the text is.
2.3.2. Topical Knowledge
Topical knowledge, sometimes known as real-world knowledge or knowledge schema by Bachman and Palmer  is seen as interwoven mental structures existing in individual’s long-term memory. They declared that topical knowledge could facilitate reading comprehension. Alderson  asserted that familiarity of subject matter or topic was expected to play a facilitative role in enhancing reader’s understanding of the text. When readers are processing texts in the areas they are familiar with, they will feel that they are much easier than those from some subjects that they have not learned or heard about. Alderson explained this by stating that the topic familiarity could compensate for the shortage of linguistic knowledge which was essential in bottom-up reading model. Other scholars (Cromley and Azevedo  ; Fisher and Frey  ) also acknowledged the important role of topical knowledge: the more topical knowledge a reader has for the subject matter, the better he or she would understand the text. Besides, a pyramid of researches (Stahl  Krekeler  and Tarchi  ) in relation to content schemata and topical knowledge on reading comprehension confirmed that background knowledge had an impact on reader’s understanding by influencing what and how much is comprehended. Alderson and Urquhart  went further to investigate the impact of students’ academic discipline on their scores on ESP (English for Specific Purpose) reading test. The results of their study showed that the students coming from different majors performed significantly better in the reading tests related to their own discipline.
The question of texts related to which area is easier for readers to comprehend has been explored. Alderson  stated that non-specialist texts in social science and humanities, on the whole, were easier to comprehend than scientific texts concerning natural science for readers who share a similar level of educational background.
2.3.3. Text Type
The three text types of exposition, argumentation, historical/biographical narrative dominate the reading materials in the academic setting, especially for general English for academic purpose. Lengthy expository texts present readers with details about persons, objects, events, concepts, places and other information by various means of descriptions, explanations, contrasts, comparisons and elaborations. The major purpose of exposition is to keep readers informed, which is why it is the most prevalent text type in the college classroom. In argumentative or persuasive texts writers present their points of view in relation to a particular topic by providing supporting reasons and evidence and sometimes analyzing the opponents’ flaws in the reasoning. Historical or biographical narratives, also commonplace in the class, inform readers of significant outcomes or the significant influence on the prevailing individuals and the society of some true events by narrating the history of a particular discipline such as psychology, geology, sociology, botany. Due to their distinctive characteristics, different text types assign distinct tasks for readers to complete and pose disparate challenges to readers. According to Enrigh  , the seemingly least commanding task comes from exposition which requires readers to understand the basic meaning in the texts. By contrast, more complicated tasks need to be fulfilled in the argumentation and narratives which expect readers of high-order thinking such as analyzing, evaluating and inferring. However, this only implies the general complication of different text types and it does not necessarily follow that the texts of according types themselves are difficult in nature. In actual fact, compared with the other two text types, the historical or biographical narrative is relatively easier thanks to the prominent features: conventionalized macrostructures related to the stories and the potential to induce visualization. The results of experiments indicated that test takers of expository texts did not perform as well as those of narrative texts not only for the group of older learners but also for the children  .
2.4. Characteristics of Tasks
2.4.1. Test Techniques
Reading comprehension is assessed through a wide range of techniques in the language proficiency testing.
Liu  conducted an experiment on Chinese learners of English to investigate the impact of test methods on test performance. The results of the study revealed that the choice of testing method did affect students’ performance and the extent to which students with different proficiency were affected varied with different testing methods. Among the methods analyzed in his study, short-answer question was the most difficult, compared to multiple choice and dichotomous items. This also applies to Turkish EFL learners  who performed better in multiple choice items than in open-ended items such as sentence completion. Nbiria  also confirmed the impact of test methods on performance. In his study, he found that cloze test was more difficult than multiple choice and short answer questions. The main finding of Kobayashi  is that students with a higher level of proficiency outperformed in open-ended questions and summary writing. Therefore, it can be concluded that these two types of test methods are able to distinguish students of different levels.
2.4.2. Question Type
Alderson  factored question types when analyzing what variables affect the difficulty of reading test items. Scholars have divided opinions pertinent to the classification of question types.
Gallagher  classified question types into ten types according the content involved in the question items: vocabulary, restatement, inference, negative question, referent, organization, support, main idea, author’s attitude, previous or following topic.
From the perspectives of conceptual level, Van Dijk and Kintch  put forward five levels: from the lowest level of word, then to proposition, local coherence, macrostructure level and finally down to the top level of superstructure.
From the scope of locating the information, Pearson and Johnson  categorized them into three types: textually explicit questions, textually implicit questions and script-based questions. Textually explicit questions are those whose correct answer and question information is located in the same sentence. Textually implicit questions require test takers to combine information from more than one sentence. Script-based questions demand testees to integrate their background knowledge with the text content because the answer could not be deduced within the text itself. They asserted that textually explicit questions are easier than textually implicit questions. Meanwhile, Bensoussan et al.  identified two types: global and local comprehension. By definition, global comprehension requires respondents to relate the correct answer to the question to the whole passage. By contrast, local comprehension is related to a specific part of the reading passage.
Based on the literature aforementioned in this part, the comparison framework in Table 1 has been formulated.
3. Test Content Comparison Analysis
3.1. The Comparison of the TOEFL iBT and the IELTS Reading Test Rubric
In the framework, three areas are listed under the title of test rubric. They are test organization, time allocation and test instructions. Detailed comparison between the TOEFL iBT and the IELTS reading tests unfolds as follows.
3.1.1. Test Organization
Most language tests are composed of an assemblage of parts whose salience, sequence arrangement and relative importance are believed to impact testees’ performance. These two tests enjoy a high degree of similarity in salience of parts and sequence arrangement. They both contain three reading passages and there are no specific requirements for arranging them in order.
However, in terms of relative importance, they are divided. One thing that will not change is that TOEFL iBT reading parts have 45 points. However, the number of test items in each passage varies from test to test or from passage to passage. In the most frequent case, there are 14 question items for each passage. Each of the first 13 multiple choice questions is worth one point. When it comes to the last summary question, two points are given. Under this circumstance, 45
Table 1. Comparison framework for the current study.
points are collected and will be converted at a scale of 30 scores in the end. However, not all the passages include one summary question at the end. Occasionally, other types of questions (related to classification or procedures) worth more than one point are also likely to occur in any of the passages. If such kinds of question items occur more than necessary times, their number may decline. This can explain why there are only 41 questions in TPO 54 (Test Practice Online 54). One thing is unique in TOEFL iBT. One extra reading passage in accompany with 13 or 14 questions will be randomly added in each time of real test and three reading passages with questions worth 45 points will be randomly selected to be marked for the final scores. For IELTS, each question item is equal, worth one point and there are always 40 questions for three reading passages. That means 40 points will be converted to the final score in terms of 9. In Cambridge IELTS Test 13, besides test 1, the remaining three tests share the same question item distribution with the first two passages having 13 questions and the last one 14 questions. Therefore, the difference in the number of questions stands for the disparity in the weight each passage carries for the whole reading test. Major information is summarized in Table 2.
3.1.2. Time Allocation
How much time is allocated for test tasks can also be a factor influencing examinees’ performance. Generally, test takers of the two exams are both assigned 60 minutes taken together for the completion of all the tasks if no extra reading passage is added to TOEFL iBT. Otherwise, the reading test time will be extended for another 20 minutes for the extra passage. Specifically, TOEFL iBT only states that test takers will have 60 minutes in the directions at the beginning of reading test. IELTS examiners announce that test takers will have 60 minutes all in all and they will be reminded to transfer all the answers onto the answer sheet when they have 10 minutes to go. But no extra time is allocated for transferring answers since it is clearly stated on the booklet that examinees should spend about 20 minutes on questions related to the according passages. Details also can be seen in Table 3.
3.1.3. Test Instructions
Test instructions come last under the heading of test rubric but can exert a crucial influence on testees’ performance. As it can be seen from Table 4, the TOEFL iBT and the IELTS listening tests have more differences than similarities.
These two test batteries’ similarity exists in the language used for instructions.
Table 2. Facet of test rubric: test organization of TOEFL iBT and IELTS reading tests.
Table 3. Facet of test rubric: time allocation of TOEFL iBT and IELTS reading tests.
Table 4. Facet of test rubric: instructions of TOEFL iBT and IELTS reading tests.
These two standardized tests both adopt the target language or native language, in this case, English.
Differences literally outweigh similarities in other aspects. First, about the channel, assessees of TOEFL iBT are required to read the instructions screened on the computer while those of IELTS have to read them printed on the text booklet. In addition, the examiner in each IELTS test room is obliged to read instructions aloud to examinees before they start the clock for the reading test. Secondly, they contrast with each other considerably at the beginning and the end of procedures. A short passage of directions for the whole reading section, as long as 176 words, elaborates the notes concerning test time, number of test passages, scoring and instructions for technical operation. It comes to the sight of TOEFL iBT takers when they sit in front of the computer and begin the whole test. If they want to start the reading section, test takers have to click the button “continue”. The direction can be seen in Figure 1.
By contrast, such directions for IELTS are read by the examiner in the test room. Since it is not publicized, this article will not quote these remarks. Besides the instructions read out loud by the examiner, some instructions for each
Notes: This screenshot and the following question items of TOEFL iBT come from one of the most popular practice test online software in China sponsored by the private institution Xiao Zhan who is authorized by the official TOEFL test administration.
Figure 1. Reading section directions for TOEFL iBT on the computer screen.
passage are also provided to the testees as short as 18 words in print on the test booklet. The instructions including test time, number of test items based on which test passage are presented at the very beginning of each IELTS reading passage. The difference continues at the next step. TOEFL iBT takers are not directed to the questions at the start. On the contrary, they are strongly recommended to scan the passage first before answering the questions. To prove the passage has already been read once, they have to scroll the bar on the right of the column to the bottom and back to the top. Otherwise, they are not allowed to enter the page where questions are placed on the left along with the passage on the right. In this regard, IELTS takers are given the full freedom. They either choose to read the passage first and then answer the questions following the passage, or decide to take an initial look at the questions and read the passage with purposes. The last difference lies in the final procedure in which IELTS takers have to transfer their answers to the answer sheet before the examiner declares the reading section is over while TOEFL iBT takers can submit their answers at the click of one button.
As regards to the explicitness of criteria for correctness, similarity does exist since the two tests both adopt multiple-choice format and any answers written on the draft are not valid and acceptable. However, distinctions are rather revealing. First, because IELTS is paper-based while TOEFL is internet or computer based, all the answers written on the answer sheet are counted valid for IELTS and all the answers submitted to the system are valid for TOEFL scoring. Zoomed in, differences are more apparent. Then, due to different test methods, the ways to answer questions have specific features of their own. In IELTS reading section, most questions require examinees to choose the right answer not only from given letters but also from given numbers. Some questions require testees to write words down, in this case, first, the number of the words is limited and then these words have to come from the passage. Otherwise, it would not be counted as right even if the meaning is the same. The last thing that calls for attention is the variance between “YES” “NO” and “TRUE” “FALSE”. When the question is about the claims of writer in the passage, “YES” “NO” “Not Given” are utilized; when it is about the information given in the passage, “TRUE” “FALSE” “Not Given” are employed. It is a commonplace that examiners neglect this point in the test, which cause serious consequences in which they know the answer but they put down the wrong words and score nothing. As for TOEFL reading, besides choosing the letters of given options like in IELTS, test takers have to drag the chosen answers to the corresponding columns (Figure 2) or in some case click the black square inserted in the passage (Figure 3).
Last, another prominent difference lies in whether partial credit is given. In TOEFL, since some question items (prose summary question, classification question) are worth more than one point, partial credit is given to the examinees
Figure 2. Question item from reading passage three in Test Practice Online 54.
Figure 3. Question item from reading passage three in Test Practice Online 54.
when they can choose most of the right answers. For example, in a prose summary, examinees are required to choose three out of the five given options so that they can gain the full marks of two points. When the examinee can get two out of three right, this examiner can score one point. But if the examinee can only get one choice right, then (s)he gains nothing from his try. In IELTS, when one question requires examinees to choose two out of three, the examinee has to put down the letters of all the right choices. If only one of choices is right, the examiner will mark the answer wrong and the examinee obtains null scores.
In a nutshell, the test rubric of TOEFL iBT and the IELTS reading tests have a lot of differences but also share several similarities. Relative importance of parts and explicitness of criteria for correctness are the most prominent differences, which might reflect the underlying difference regarding to reading construct.
3.2. The Comparison of the TOEFL iBT and the IELTS Reading Test Input
Two major components are included in this area: format and nature of language.
These two tests are largely similar to each other in terms of the test format except for two things: identification of problems and length. In IELTS reading test, problems are not identified for testees in the reading passages so they do not know which paragraph the correct response to each question item come from. In contrast, when testees are working on most of questions, a diamond sign is marked in front of the corresponding paragraph (Figures 2-4) from to draw the testees’ attention to the particular part of the reading materials to be processed and evaluated. For questions that require examiners to insert the sentence in the correct position in the reading passage, four black squares are put in different places in the paragraph. Furthermore, two types of question items are specifically identified in exact sentence.
For questions related to vocabulary meaning identification, that particular word is highlighted both in the corresponding text and in the question stems.
For questions about simplifying complex sentences, the targeted sentences are shaded in the context (Figure 5).
As for the text length, the high-stake standardized examination like IELTS utilize longer texts rather than short excerpts of texts. Even TOEFL iBT evolved into this practice.
Figure 4. Question item from reading passage two in Test Practice Online 13.
From Table 5, averagely, IELTS reading texts are 26.6% longer than TOEFL iBT reading passages, about having more 187 words. In addition, IELTS shows a wider range of variance in text length within the test than TOEFL iBT. An independent sample t-test was conducted to see whether there is significant difference between these two tests. The p value is 0.003, smaller than 0.05, which means that TOEFL and IELTS reading texts are significantly different in the aspect of length.
To compare TOEFL and IELTS reading across tests, the sample size is enlarged. As it shows in Table 5, similar to the comparison within one test, the text length of TOEFL iBT is larger than that of IELTS across tests. This difference is significantly measurable because the independent sample t-test results show that p value is 0.00, smaller than 0.05.
Based on the results in Table 6, it is safe to conclude that TOEFL iBT reading tests are comparatively easier when variables related to individual readers and other factors are not considered on two grounds: first, the reading texts are shorter and easier to process than those in IELTS; second, the identification of the problem is more specific in TOEFL iBT than IELTS.
Figure 5. Question item from reading passage one in Test Practice Online 13.
Table 5. Comparison of mean facet for length across TOEFL and IELTS reading.
Note: These passages are from the most recent preparation materials containing authentic tests in the past for Chinese mainland test-takers. Three IELTS texts are from Test one in Cambridge English IELTS 13 newly published by Cambridge ESOL and three TOEFL texts are from TOEFL iBT Practice Online 54 recently released by ETS. Twelve texts are from four tests in Cambridge English IELTS 13 and twelve texts are from four tests in TOEFL iBT Practice Online 54, 53, 52, 51.
Table 6. Facet of test input: format of TOEFL iBT and IELTS reading tests.
3.2.2. Language Input
The current study employs General Service List comprising the 2000 most frequent word families and Academic Word List derived from a large corpus.
From Table 7, on average, it can be seen that IELTS reading tests contain about 2% more K1 words but around 7% less K2 words than TOEFL iBT reading passages. In this case, IELTS takers enjoy a slice of advantage. However, such differences are not sufficiently significant for both K1/Token (p = 0.26 > 0.05) and K2/Token (p = 0.731 > 0.0) to make a distinction between the two batteries of test according to the independent sample t-test. The means of TOEFL iBT and IELTS AWL/Token rations are close to each other. And no significant differences are shown in the independent sample t-test results (p = 0.97 > 0.05). It can be concluded that TOEFL iBT and IELTS reading texts are similar in this aspect. Last, TOEFL iBT and IELTS reading texts have the same mean in the lexical diversity, while the former is slightly (2%) higher than the latter on average in terms of lexical density. Therefore, as far as lexical density and mean are concerned, TOEFL iBT reading is marginally difficult than IELTS reading. But no significant difference is shown in the independent sample t-test results (p = 0.44 > 0.05).
From Table 8, the mean Flesch Reading Ease readability score of IELTS is higher than that of TOEFL iBT. This can be interpreted that IELTS reading texts are predicted to be averagely less difficult to understand than TOEFL iBT texts. An independent sample t-test is also done. The results (p = 0.52 > 0.05) show that no significant differences exist.
The grammatical intricacy is calculated in accordance with Halliday’s formula. Take the first paragraph of IELTS reading passage (Cambridge English IELTS 13 test one passage one) for example, there are three simple sentences, two compound sentences and one complex sentence. From the scope of SFL, there are three clause complexes and eight ranking clauses.
New Zealand is a small country of four million inhabitants, a long-haul flight from all the major tourist-generating markets of the world. ||| Tourism currently makes up 9% of the country’s gross domestic product, || and is the country’s largest export sector. ||| ||| Unlike other export sectors, || which make products and then sell them overseas, tourism brings its customers to New Zealand. ||| The product is the country itself―the people, the places and the experiences. ||| In 1999, Tourism New Zealand launched a campaign || to communicate a new brand position to the world. ||| ||| The campaign focused on New Zealand’s scenic beauty, exhilarating outdoor activities and authentic Maori culture ||, and it made New Zealand one of the strongest national brands in the world |||.
Table 9 shows the detailed information concerning the grammatical features of TOEFL iBT and IELTS reading tests.
Table 7. Comparison of mean facet for lexical features across TOEFL and IELTS reading.
Note: Three texts are from TOEFL iBT Practice Online 54 and three texts are from Cambridge English IELTS 13 Test one.
Table 8. Comparison of mean facet for readability across TOEFL and IELTS reading.
Note: Three texts are from TOEFL iBT Practice Online 54 and three texts are from Cambridge English IELTS 13 Test one.
Table 9. Comparison of mean facet for grammatical intricacy across TOEFL iBT and IELTS reading tests.
Note: Three texts are from TOEFL iBT Practice Online 54 and three texts are from Cambridge English IELTS 13 Test one.
The means of grammatical intricacy of TOEFL iBT and IELTS reading tests are calculated. The data shows that the average score of TOEFL iBT (2.12) is lower than that of IELTS (3.03). This means that IELTS reading texts are more difficult than TOEFL iBT reading texts when grammatical intricacy is referred to as a measure. Further, no significant difference is displayed from the data (p = 0.102 > 0.05).
Other features are also counted to triangulate this result. As far as the complex sentences are concerned, the ration of complex sentences to sentences in the whole text is shown in Table 10. The mean of TOEFL iBT and IELTS reading texts are 42.6%, 53.2% respectively, which dictates that TOEFL iBT reading texts are easier for readers to process. In terms of sentences including more than one complex or compound sentences, the mean of TOEFL iBT and IELTS reading texts are 15.8%, 28.9% respectively, which infers that IELTS reading texts are more difficult for readers to recognize the noun-verb-noun relations.
The data utilized in Table 11 are the recall versions of the real tests of TOEFL iBT and IELTS by test-takers of mainland China. Some organizations have done a thorough work by collecting different version from various websites in an attempt to produce the most original tests. The author gathered five versions from the official websites of the five most prestigious institutions (New Oriental School, New Channel, Xiao Zhan, Global Education, Smart Study). Since it is quite impossible to recall the English version of all the reading tests each time, a majority of recalled reading texts are simplified based on test takers’ understanding and
Table 10. Comparison of grammatical features across TOEFL iBT and IELTS reading.
Table 11. Comparison of text topic across TOEFL iBT and IELTS reading tests.
Note: Reading tests are the recall versions of the real tests of TOEFL iBT and IELTS by test-takers of mainland China from January to May.
are written in Chinese. The collected date start from the beginning of January and have been updated till the end of May when the final draft of this article was done. There were 20 times IELTS tests and 15 times TOEFL iBT tests, from which 60 pieces of IELTS reading tests and 51 TOEFL iBT are analyzed.
From Table 11, 41.2% of TOEFLT iBT reading texts fall into the category of natural science and the remaining (58.2%) texts are about social science and humanities, while the proportions of IELTS reading passages are 20% and 80% respectively. It is clearly seen that technical texts are more likely to appear in TOEFLT iBT reading test, while the opposite is true to IELTS reading test. According to Alderson, test takers are more likely to perform better in reading tests related to social science and humanities. It follows that IELTS reading test is comparatively easier when topical knowledge is concerned. In addition, these two tests share several common topics both in natural science (biology and geology) and in social science and humanities (sociology, history, agriculture, business, culture, arts, geography). However, a striking difference between the test is seated in the specific fields involved under the title of social science and humanities. First, a wider range of topics are involved in IELTS reading test than TOEFLT iBT, with the former 13 topics and the latter 10 topics. Moreover, archaeology, industry and economics are only exist TOEFLT iBT. By contrast, technology, health, psychology, law, language, and education are only included in IELTS. It appears that IELTS extends a preference to psychology. Last, within the common set of topics, agriculture dominates TOEFLT iBT reading test while arts occupies the largest proportion in IELTS reading test. This analysis sheds light on the difference in the topics favored by these two tests.
From Table 12, 80% of TOEFL iBT reading passages are expository texts while the remaining texts are argumentative. Till the last seat of TOEFL iBT in May, no narratives have appeared. For IELTS, 61.7% of reading texts are expositions and 33.3% argumentation, only 5% narrative. It is revealing that narrative is the least adopted text type in these two batteries of tests and test takers are more likely to encounter expositions in TOEFL iBT than IELTS. On the contrary, argumentation is more frequent in IELTS than TOEFL iBT.
3.3. Characteristics of Tasks
3.3.1. Test Techniques
From Table 13, it can be seen that the two batteries employ multiple choice, which is also the only technique adopted by TOEFL iBT. Multiple choice techniques are popular in various tests because they can be quickly marked by computers and test designers can control testees’ answers. Despite their virtues, multiple choices are questioned because it is difficult and time-consuming to develop good distractors and assessees might get the right answer by guessing. It is also criticized that many test-coaching schools are focusing on teaching students to be test-wise instead of improving their reading skills.
Besides multiple choices, IELTS reading part utilizes other means of testing. Multiple matching, an objective technique, requires test takers to match two sets of stimuli against each other, such as matching the beginning part of one
Table 12. Comparison of text types for TOEFL iBT and IELTS reading texts.
Note: Reading tests are the recall versions of the real tests of TOEFL iBT and IELTS by test-takers of mainland China from January to May.
Table 13. Comparison of test techniques in TOEFL iBT and IELTS reading texts.
sentence to its ending, the opinions to their believers, the name of objects to their features, headings of paragraphs to their according paragraphs, part of information to the responding paragraph. They are easy to mark but difficult to construct. Then, dichotomous items present students with some statements related to the text and require them to judge whether they are true or false. They are applauded for the ease of construction. But students stand a high chance of guessing it right. To counteract this effect, a third option is offered such as ‘‘not given’’. Unlike the aforementioned techniques, short-answer methods are subjective or semi-objective. They provide the justification that test takers can be assessed whether they understand the text by checking their written responses. But they have to be constructed in a way that all the potential answers have to be taken into consideration and other variables such as learners’ written ability are involved. To avoid the interference of other issue, sentence completion is designed in which testees only need to write down individual words or phrases instead of grammatically correct whole sentences. Finally, the gapped summary is a summary of the target text with some important words removed. To fill the gap, readers have to read the whole passage and figure out the main idea. Sometimes, a word bank is provided but in most cases they are not taken originally from the text. For this test method, students do not have the chance of guessing the answers since they are constructed on the condition that it is impossible to have the words without reading the right text. Information transfer technique expects test takers to locate relevant information from the text and then transfer it to some forms such as table, graph.
From the literature concerning test techniques above, it can be concluded that TOEFL iBT reading is easier.
3.3.2. Question Type
The data analyzed are thirty-six IELTS reading tests from Cambridge IELTS 4-12 and fifty-five TOEFL reading tests from Test Practice Online 1 - 50 and 5 tests from Official Guide. From Table 14, several points can be drawn.
Seen from Table 14, when the conceptual level is concerned, the bottom level involving word recognition or meaning identification is missing in IELTS reading test, while questions at this level occupy a fairly significant share (27.8%) in TOEFL iBT.
At the following three levels (proposition, local coherence), the two batteries of tests share some common features in containing same types of questions but also distinguish from each other in the proportion of these questions. Both tests allocate the heaviest weighing to restatement questions, which focus on the comprehension of facts and details in the text. However, the weight in IELTS (80.7%) is about 2.6 times heavier than that in TOEFL iBT (30.7%). Inference questions in TOEFL iBT (7%) were ten times those in IELTS (0.7%). Author’s attitude questions are not existent in TOEFL iBT while negative question does not exist in IELTS. In terms of local coherence, despite the small percentage, referent questions are considerably more frequent in TOEFL iBT (1.3%) than IELTS (0.07%). As a matter of fact, it has only appeared once in the IELTS reading test.
At the level of macrostructure, these two tests display opposite trend in employing questions. IELTS reading test employ main idea question (15.1%) more frequently while the opposite holds true for TOEFL iBT reading (0.8%). On the contrary, 16.5% questions are about organization in TOEFL iBT reading while this figure is 0.9% in IELTS. Both tests do not adopt previous or following questions. At the top level, they have one thing in common that superstructure is not tested in both tests.
When the location of information needed for the correct response to each question is concerned, questions following texts from TOEFL iBT Practice Online 54 and 3 texts from Cambridge English IELTS 13 are analyzed.
According to Table 15, these two batteries of tests have less in common. Firstly, in TOEFL iBT, seventeen out of forty-one (17/41) questions need writers to locate just one sentence and figure out the right response and thirteen questions out of the seventeen questions, are questions of identifying the meaning of
Table 14. Comparison of question types across TOEFL iBT and IELTS reading texts.
Note: TOEFL texts are from 50 tests in TOEFL Practice Online and 5 tests in TOEFL Official Guide and IETLS texts are from Cambridge IELTS Test 4 to 12.
Table 15. Comparison of question types of the TOEFL iBT and the IELTS reading tests.
words in context, while the ration (24/40) is much higher in IELTS since a majority of questions are related to the literal understanding of fact or details in the text. These questions might take the form of sentence completion, summary gap filling, short answer questions, matching, dichotomous items with a third option, information transfer and part of multiple choices.
Secondly, almost half of questions (21/41) in TOEFL iBT need readers to locate and integrate the relevant information across sentences. The figure for IELTS is smaller (16/40).
Last, in TOEFL iBT reading, questions of support, organization require readers to have a good understanding of the whole passage and locate the information across paragraphs. In Table 15 (TPO54), the number is three and Table 14 shows that the proportion is 23.7%, a combination of 16.5% and 7.2%. By contrast, in IELTS, this type of question does not exist in this test. In the data of Table 14, a type of question which requires readers to choose the best title or the main point for the whole passage was counted under the title of main ideas. Since it has only appeared 8 times, the chances are quite small.
In conclusion, IELTS reading tests have no vocabulary-related questions at the bottom conceptual level while there are three to five questions for each TOELF iBT reading passage. Moreover, IELTS reading tests have more textually explicit questions. TOELF iBT does not only have more textually implicit questions but also has global questions involving understanding the whole passage while IELTS does not have.
From the above comprehensive comparison, it can be seen that TOEFL iBT and IELTS reading parts have more differences than common grounds. Most of similarity between them are about test rubrics and input format. They are slightly different in the areas of score method and specification of procedures and tasks. Major differences lie in characteristics of input (identification of the problem, length, lexical and grammatical features), tasks (test techniques, question types) and testing environment (test delivery medium).
With regards to characteristics of input, in TOEFL iBT, test takers are clearly instructed which paragraph the information needed for the correct response of each question item is, and which enables them to devote more time in comprehending the text instead of locating the information. In addition, the statistics reveal that the texts in TOEFL iBT are significantly shorter than those in IELTS. Moreover, despite no significant differences in terms of lexical features and grammatical features, averagely, TOEFL iBT reading texts contain a higher proportion of words in General Service List and have a lower level of grammatical intricacy, a smaller number of complex sentences with more than two ranking clauses. On top of that, compared with IELTS reading texts, a larger portion of TOEFL iBT reading passages are expositions, which are generally less difficult to comprehend. However, IELTS reading texts have a higher average readability score, which means they are less difficult in this sense. Besides, an overwhelming majority of IELTS reading texts elaborate topics in the field of social science and humanities which are easier for readers to process thanks to the advantage of background knowledge.
In terms of characteristics of tasks, TOEFL iBT loses its edge. Although TOEFL iBT reading is dominated by multiple choices, for which test candidates have more chances of guessing, and readers have to read more words for question items in comparison with IELTS question items. Besides, a larger number of questions need readers to integrate information across sentences in the paragraph and even across paragraphs.
One factor worth mentioning but not elaborated above is test delivery medium. TOEFL is computer or internet based while IELTS is paper and pencil based. As computer based testing is relatively more authentic given the situation in which students are required to read research articles, papers or books on the screen, some readers still think screen reading slows them down. Considering the results of research related to test delivery medium effect are mixed, readers need think for themselves.
The last major difference is that one extra reading passage along with 13 or 14 questions may or may not be added in the real test. The random addition could be a bless for those who are better at reading than listening, but a curse for who have difficulties in reading.
These differences have profound implications. According to the reports on the IELTS and TOEFL official websites, Chinese students are more likely to achieve higher scores in reading part in comparison with other sessions. In 2017, average reading scores of TOEFL (21) were higher than listening (19), speaking (19) and writing (20) (https://www.ets.org/toefl/). The same is true for IELTS with 6.1, 5.9, 5.3, 5.4 for reading, listening, writing and speaking respectively (https://www.ielts.org/). Therefore, if learners can make informed decisions and take the tests most suitable for them, they will be able to achieve their potential and maximize their advantages in the test so as to meet the requirement. For students aiming to further their education abroad, they can save a large portion of tuition fees, because if they fail to reach the required standard, they have to pay for the language programme ranging from half a year or a whole year before they are admitted to the college of their choice. For learners applying to universities that recognize IELTS and TOEFL iBT scores, they have more opportunities to take the test. By contrast, learners can only have one chance to take the entrance English examination. If they fail the test, they will be denied no matter how high their scores are in their major subjects. Therefore, it is important that these differences should be taken into consideration when English learners take a test, especially when the test results could make a difference to their academic and professional prospects. For teachers, they could provide students with proper instructions on the basis of test features so as to help them achieve their goals. At last, for test designers, they can investigate more to refine tests and ensure the validity of tests.
 Commander, N.E. and Stanwyck, D.J. (1997) Illusion of Knowing in Adult Readers: Effects of Reading Skill and Passage Length. Contemporary Educational Psychology, 22, 39-52.
 Rothkopf, E.Z. and Billington, M.J. (1983) Passage Length and Recall with Test Size Held Constant: Effects of Modality, Pacing, and Learning Set. Journal of Verbal Learning and Verbal Behavior, 22, 667-681. https://doi.org/10.1016/S0022-5371(83)90395-X
 Laufer, B. (1989) What Percentage of Text-Lexis Is Essential for Comprehension? In: Lauren, C. and Nordman, M., Eds., Special Language: From Human Thinking to Thinking Machines, Multilingual Matters, Clevedon, 316-323.
 Morton, J. (1979) Facilitation in Word Recognition: Experiments Causing Change in the Logogen Model. Processing of Visible Language. Springer, Berlin.
 Paribakht, T.S. and Webb, S. (2016) The Relationship between Academic Vocabulary Coverage and Scores on a Standardized English Proficiency Test. Journal of English for Academic Purposes, 21, 121-132. https://doi.org/10.1016/j.jeap.2015.05.009
 Cromley, J.G. and Azevedo, R. (2007) Testing and Refining the Direct and Inferential Model of Reading Comprehension. Journal of Educational Psychology, 99, 311-325.
 Stahl, S.A. (1991) Defining the Role of Prior Knowledge and Vocabulary in Reading Comprehension: The Retiring of Number 41. Journal of Literacy Research, 23, 487-508.
 Tarchi, C. (2010) Reading Comprehension of Informative Texts in Secondary School: A Focus on Direct and Indirect Effects of Reader’s Prior Knowledge. Learning and Individual Differences, 20, 415-420. https://doi.org/10.1016/j.lindif.2010.04.002
 Alderson, J.C. and Urquhart, A.H. (1985) The Effect of Students’ Academic Discipline on Their Performance on ESP Reading Tests. Language Testing, 2, 192-204.
 Enright, M.K., Grabe, W., Koda, K., Mosenthal, P., Mulcahy-Ernt, P. and Schedl, M. (2000) Toefl 2000 Reading Framework: A Working Paper. TOEFL Monograph Series, Educational Testing Service, Princeton, i-157.
 Weaver, C.A. and Bryant, D.S. (1995) Monitoring of Comprehension: The Role of Text Difficulty in Metamemory for Narrative and Expository Text. Memory and Cognition, 23, 12-22. https://doi.org/10.3758/BF03210553