The GRE Analytical Measure Mary Enright and Tim Habick Educational Testing Service History and Description In 1977 the GRE General Test was modified by the addition of an analytical measure. This measure was introduced in order to expand the definition of academic talent to include more aspects of reasoning than those included in the existing verbal and quantitative measures. Originally, the analytical measure included four different types of reasoning items but two of them were eliminated from the analytical measure in 1981 because they appeared to be affected by short-term practice and special test preparation. The current version of the analytical measure includes two types of items, logical reasoning items and analytical reasoning items. Logical reasoning items test the ability to understand, analyze, and evaluate arguments. Each logical reasoning item is based on a short argument or a simple graph or table, usually followed by one or two questions. These questions assess any of a variety of critical reasoning skills, such as recognition of assumptions, evaluation of arguments and counterarguments, and analysis of evidence. Analytical reasoning items test the ability to understand a given structure of arbitrary relationships among fictitious persons, places, or things and to deduce information from the relationships given. For each group of analytical items, a brief scenario together with a group of rules (conditions) is presented and is followed by a set of four to seven questions. The questions, often stated in conditional (if-then) form, assess candidates' skills in grasping and combining rules to arrive at deductions as to what must be true or could be true, given the stated conditions. Thus, the logical reasoning item type draws on a broader range of reasoning skills than does the analytical reasoning item type, which primarily involves deductive reasoning. Because the ratio of analytical reasoning items to logical reasoning items is about three to one in the current analytical measure, the measure emphasizes deductive reasoning skills. Item and Test Development Test items for the General Test are usually written by ETS item writers who are specialists in the types of skills measured by the General Test. These item writers typically have graduate degrees in relevant areas and may have taught at the college or university level. The procedures for assuring the quality of test items are complex and each item undergoes a number of different types of reviews. Items are critiqued by independent reviewers with respect to content, accuracy, clarity, and cultural sensitivity. After items are reviewed and modified as necessary, they are pretested on large samples of examinees in order to gather information about how each item performs statistically. Items may be deemed unacceptable because they are too difficult or too easy, or if they fail to differentiate examinees at various levels of ability, or if there are large differences among subgroups of examinees (for example Hispanic examinees and Non-Hispanic white examinees) who have scored similarly on the measure. Items that survive this review and pretest process can then be used in new test forms or item pools. In the past, when new test forms were created by test developers, similarity between forms was maintained by meeting the same content and statistical specifications for each form. With computer adaptive testing these specifications are incorporated into the algorithms that select items to be presented to examinees. Validity Careful development of items and tests is not enough to establish that the test measures something meaningful and useful. Therefore the GRE tests are also subject to a program of validity research to evaluate the appropriateness, meaningfulness, and usefulness of inferences made from test scores. The collection of evidence to establish the validity of the analytical measure is an ongoing process. The kinds of evidence collected to evaluate the validity of the analytical measure include studies of the measure's content representativeness, structure, and predictive validity. Content Representativeness. Initially, the analytical measure was developed based on current knowledge about reasoning in fields such as philosophy, psychology, and education. Subsequently, Powers and Enright (1987) surveyed graduate faculty from six disciplines about the reasoning skills they considered most important for academic success. Five dimensions were found to underlie faculty judgments of the importance of reasoning skills and there were differences among disciplines in how important these aspects of reasoning were considered. The five dimensions identified in this study included: 1. Critical analysis of arguments (very important in English literature) 2. Generating, supporting, and evaluating conclusions and explanations (very important in all disciplines except computer science) 3. Analysis of formal problems (very important in computer science and engineering) 4. Induction (important in all disciplines) 5. Generating alternative hypotheses and explanations (most important in psychology and education). Many of the skills that define the first, second, and third dimension above are assessed by the analytical measure as it is currently constituted but the skills defining the fourth and fifth factor are less well represented. Construct Validity. A traditional approach to documenting what a test measures is to examine the test's structure as revealed by the correlations of items on the test to each other and to tests or items that are supposed to assess different kinds of ability. The aim of these studies is to establish that performance on items that are supposed to measure a particular ability is more strongly related to performance on other items measuring the same ability than to performance on items measuring a different ability. A number of factor analytical studies of the structure of the GRE General Test have been carried out. The two that are most relevant to understanding the structure of the analytical measure as it is now constituted are by Stricker and Rock (1987) and Schaeffer and Kingston (1988). Stricker and Rock analyzed a test form utilizing confirmatory factor analysis. Separate analyses were conducted for three age groups (20-29, 30-39,40-49). They found that a three-factor solution consisting of a verbal, quantitative, and analytical factors provided a better account of the data for all age groups either a one-factor model, which accounted for less variance, or a four-factor model separating reading comprehension and vocabulary factors, which proved to be very highly correlated. Schaeffer and Kingston conducted factor analyses on samples of examinees representing different undergraduate majors. A three-factor solution, verbal, analytical, and quantitative, was selected as most appropriate for each group. Although the results of these factor analyses generally support the interpretation of the GRE General Test as a measure of three distinct abilities, there is also evidence that the analytical measure is not as distinctive as are the verbal and quantitative measures. For example, Schaeffer and Kingston noted that the analytical factor they identified was primarily defined by the analytical reasoning items while logical reasoning items loaded variously on the verbal, analytical, quantitative or no factors. Wilson (1985) noted that logical reaoning items tended to correlate more highly with verbal items and analytical reasoning items with quantitative items than they do with each other. This less-than-desirable convergence between the two types of reasoning items is being addressed in ongoing research on alternative item types that could be added to the measure to broaden the construct being measured and improve its convergence. Predictive Validity. Predictive or criterion related validity is supported by evidence that the measure is predictive of success in graduate school. Issues that complicate the assessment of predictive validity are how to measure success in graduate studies, restriction in the range of the test scores and criterion performance of graduate students, and how a measure is best used in conjunction with other information to predict success. The most commonly used measure of success in studies of predictive validity are the average grades students receive in their first year of graduate study. However, as a criteria, first year graduate grades have a number of limitations including the fact that the range of grades awarded is narrow, typically A's and B's, and grades often do not measure important aspects of graduate student performance such as independent thinking and initiative. Occasionally other criteria, such as completion of a degree, time to complete the degree, or faculty ratings of students, are used. Another problem with validity studies are that they are conducted after students are admitted to graduate school and the tests being evaluated have been used as part of the selection process. Thus the range of test scores is very narrow. This has the effect of reducing the size of the correlation between test scores and any criterion that is likely to be used. Despite these problems that complicate the assessment of a measure's predictive validity, overall the analytical measure has been shown to predict success in graduate studies to the same extent that that the verbal and quantitative measures do and only slightly less than undergraduate grades or subjects tests do. Schneider and Briel (1990) summarized the results of predictive validity studies for 606 graduate departments conducted between 1984 and 1988. Size-adjusted average correlations of analytical scores with first year graduate average ranged from .20 in engineering to .28 in the social sciences and humanities. These correlations were of the same magnitude as correlations with V and Q but smaller than those with UGPA (r=.29 to .39) or with many subject tests (r=.21 to .48). Future Directions Presently the paper and pencil version of the GRE Test is being phased out and in the near future the test will be available only on the computer. The initial transition to a computer-based test involved presenting a traditional multiple-choice test in an adaptive format. However, computer presentation offers many alternatives for new kinds of test items, many of which are currently being evaluated. Among the other research and test development initiatives that may affect the way reasoning is assessed on the GRE in the future are the addition of a writing test which includes a "Critique an Argument" task, and the exploration of the use of artificial intelligence tools to score opened-responses. Examples of Analytical Reasoning and Logical Reasoning Items Analytical Reasoning Directions: Each question or group of questions is based on a passage or set of conditions. In answering some of the questions, it may be useful to draw a rough diagram. For each question, select the best answer choice given. Questions 1-4 are based on the following. A manager who has exactly four projects --F, G, H, and I--to undertake in a given month has made the following determinations: F has priority over G H has priority over I. If one project has priority over another, the project with priority must be started earlier than the other one. 1. Given only the determinations above, each of the following is a possible sequence in which the four projects could be started EXCEPT (A) F, G, H, I (B) F, H, G, I (C) F, H, I, G (D) H, F, I, G *(E) H, G, F, I 2. If each of the projects take equally long to complete, it must be true that (A) F is completed before H is completed (B) F is completed before I is completed (C) G is completed before H is completed (D) H is completed before G is completed *(E) H is completed before I is completed 3. Which of the following pairs of additional determinations would NOT conflict with the priorities initially determined? (A) F has priority over H, and I has priority over F. *(B) F has priority over I, and H has priority over G. (C) G has priority over H, and H has priority over F. (D) G has priority over H, and I has priority over F. (E) G has priority over I, and I has priority over F. Logical Reasoning Nonprescription sunglasses shield the wearer's eyes from damaging ultraviolet sunlight. Squinting, however, provides protection from ultraviolet rays that is at least as good as the protection from nonprescription sunglasses. There is, therefore, no health advantage to be gained by wearing nonprescription sunglasses rather than squinting. Which of the following, if true, most seriously weakens supports for the conclusions above? (A) Many opticians offer prescription sunglasses that not only screen out ultraviolet sunlight but also provide corrective vision. (B) Some nonprescription sunglasses provide less protection from ultraviolet sunlight than does squinting. (C) Squinting strains facial muscles and causes headaches and fatigue. (D) Many people buy sunglasses because they feel the sunglasses are fashionable. (E) Some people squint even when they are wearing sunglasses. school administrators translate educational research into a standardized teaching program and mandate its use by teachers, students learn less and learn less well than they did before, even though the teachers are the same. The translation by the administration of theory into prescribed practice must therefore be flawed. The argument above is based on which of the following of the following assumptions? (A) Teachers differ in their ability to teach in accordance with standardized programs. *(B) The educational research on which the standardized teaching programs are based is sound. (C) Researchers should be the ones to translate their own research into teaching programs. (D) The ways in which teachers choose to implement the programs is ineffective. (E) The level of student learning will vary from state to state. The claim that learning computer programming is a sure way to a bright future is analogous to the contention, popular a few years ago, that if one wanted a successful career, one should study law. If the analogy continues to hold, however, learning computer programming will not put a student at great advantage, because ______. (A) there will soon be more jobs for lawyers than there are now (B) graduating law students and computer programming students will soon be competing with each other for the same jobs *(C) there are currently more law students graduating than job openings for law school graduates (D) lawyers are making increasing use of computers in their work (E) computer programmers will increasingly need the services of lawyers. During her three years in office, the governor has frequently been accused of bias against Middletown. Yet she has filled five of the nineteen vacant high- level positions in her administration with appointees from Middletown, all of whom are still serving. This evidence shows that the governor has no bias against Middletown. In the argument given, the part that is underlined plays which of the following roles? (A) Acknowledging an objection that could plausibly be made to the argument (B) Supporting a claim that is itself used to support the conclusion of the argument *(C) Introducing the position to be refuted by the argument (D) Presenting a consequence of the conclusion stated in the argument (E) Providing an illustrative example that neither strengthens nor weakens the argument References Briel, J. B., O'Neill, K., & Scheuneman, J. D. (Eds.). (1993). GRE Technical Manual. Princeton, NJ: Educational Testing Service. Powers, D. E., Enright, M.K. (1987). Analytical reasoning skills in graduate study. Journal of Higher Education, 58(6), 658-682. Schaeffer, G. A., & Kingston, N. M. (1988). Strength of the analytical factor of the GRE General Test in several subgroups: A full-information factor analysis approach (GRE Board Professional Rep. 86-7P): Princeton, NJ: Educational Testing Service. Stricker, L. J., & Rock, D. A. (1987). Factor structure of the GRE General Test in young and middle adulthood. Developmental Psychology, 23, 526-536. Schneider, L. M., & Briel, J. B. (1990). Validity of the GRE: 1988-89 Summary Report . Princeton, NJ: Educational Testing Service. Wilson, K. M. (1985). The relationship of GRE General Test item-type part scores to undergraduate grades (GRE Board Professional Rep. 81-22P, ETS Research Report No. 84-38): Princeton, NJ: Educational Testing Service.