How to Compare & Interpret Different IQ Test Results - More on the Effects of Newly Released Tests
IQ tests are not perfect, but they can be very helpful in figuring out how a child learns and how best to help that child maximize his or her potential. The unavoidable anomalies of standardized tests often contribute to either over or understating one’s relative position among age-mates and the greater population. The reasons for this are many. First, very young test takers who understand the point of the test and who cooperate have an advantage over other members of their same-aged norm group; thus, their scores are inflated comparatively speaking, based somewhat on cooperation as well as actual knowledge or ability. Older test-takers are at somewhat of a disadvantage on such tests because there is always someone in the normative sample who can ace a subtest. This means that even the most brilliant people lose points on the scaled scoring if they miss even one item. This deflates their actual relative position overall. So, while a child’s actual intelligence is not likely to change, one’s scores could fluctuate down a bit over the years, especially on any freshly normed ability tests.
Second, the age of the test—the time period that’s elapsed since it was last normed—greatly affects a person’s score. Normed tests are statistically evaluated to find means or averages for populations, such as grade and age group. I've found myself in the unusual position of seeing first-hand the effects of the Flynn Effect when it comes to the assessment of gifted children. What is the Flynn Effect? James Flynn has studied changing IQ score patterns in populations around the world for a number of years. He's discovered that over the lifetime of a test—the time between when it is first normed on specific age ranges to when it is updated on new children in the same specific age ranges, typically between 13 and 15 years—the scores rise an average of .3 points each year. I have discovered that the rise is even more significant for children in the gifted ranges. This has many serious and significant results, only some of which I discuss here.
I have been administering the Stanford-Binet 5 for more than 12 years, as I was part of the normative sampling team for the gifted population. Over the years, such assessment techniques have proven to be quite reliable, but the effects of a gradual increase in scores, as noted by James Flynn of the Flynn Effect (about .3 points a year) makes it important that we always look at the big picture when trying to determine a child’s real “fit” compared with others of the same age or in the same classroom. My own experience shows me that the scores in the tails of the distribution curve are more sensitive to this Effect and tend to increase more and at a more rapid pace. One simply cannot compare a Full Scale Score of 146 in 2013 to that of a 146 on a same-aged child in 2002. This is not to diminish the exceptionally high intellectual level of some children but to lend perspective to it.
Because I give a different IQ test to clients than the majority of administrators in my state of Minnesota (the SB5 rather than the Wechsler tests, WPPSI-III and WISC-IV), when I got the opportunity to compare the Wechsler Preschool & Primary Scales of Intelligence, Fourth Edition (WPPSI-IV) norm results to the 12 year old Stanford-Binet 5, I saw great drops in scores for the normative sample on the WPPSI-IV last year compared to the results I was getting with the same kids for the SB5. Although I didn’t recognize this effect back in 2002 when the normative scores for my gifted sample came back, I—and a number of other assessment professionals—noted that the SB5 seemed to score gifted children lower than earlier tests. We recommended that school programs allowed for that and none of us realized—at the time—that this score drift upwards—The Flynn Effect—would eventually moderate the scores once again.
We should have known, and the schools and other programs should have known, though, that this always happens. The gifted normative sample for the Stanford-Binet, Fifth Edition, based on the testing of children who had previously been identified as gifted at the 98th percentile (FSIQ 130) or higher, was now only 123.7. The same thing has happened with the normative sampling of equally high IQs for the Wechsler Preschool & Primary Scales of Intelligence, Fourth Edition: the average result for the gifted sample on the normative test was an FSIQ of 127.2. This essentially means that nearly half of the children who were previously tested as 2 standard deviations above the mean (98th percentile, FSIQ 130) now tested below the typical “gifted” cut-off score.
As of December, 2012, I'm actually administering the new, official WPPSI-IV and can compare the results, by testing the same kids for free, on the older SB5, alternating which test I give first to each child. The difference in scores is huge! What I fear is that for most districts who use only the Wechlser tests, they will simply think that fewer truly gifted kids are coming through their doors for the next few years as they use the new WPPSI-IV, and by next year, the new WISC-V. There is nothing wrong with any of these tests; they're all excellent. But the Flynn Effect appears to affect the results of children in the tails of the bell curve more than in the average ranges. Flynn reports an average increase during the life of a typical IQ test to be about .3 IQ points per year's age of the test. Most individual tests are only updated about every 13 to 15 years. This means that the 5 year-old who helped norm the test is eventually compared to a 5 year old who is 13 to 15 years younger. The matrix pattern reasoning and one or two other subtests account for most of this inflation over the years, but the real point is that we are over-estimating the level of giftedness of kids who take the tests when the tests are older and under-estimating the level of giftedness of the kids who take the test when the test is fairly new. More kids test as being Profoundly Gifted who may not actually be if they take the tests when the tests are older. Think about it and think about how we are making a bunch of educational decisions based on possible misinformation. Also, think about the kids who may indeed be quite highly gifted but we didn't see it if they took one of these tests at the wrong time.
The WPPSI-IV Technical Manual points out what many of us know, too, that achievement scores often have more to do with exposure and opportunity and don’t necessarily identify intellectually gifted children.
The two best-known group IQ tests, the Cognitive Abilities Test and the Otis-Lennon, are less susceptible to this Flynn Effect because they are renormed every few years. These tests are quite reliable for understanding groups of students but less reliable for the individual child, generally speaking, than the individually administered IQ tests. For more information on that, please see the Internet.
I have more to say and write but wanted to get the discussion rolling! By the way, The Ruf Estimates Kids IQ Test is not affected by these fluctuations because I describe levels of giftedness within a fairly broad estimated IQ band. This is incidental to this discussion but some people familiar with my work may have been curious. Also, I invite any other assessment professionals out there who work with gifted children and adults to contact me with their own experiences and test results if you wish to collaborate in any way.