Misfit diagnosis in Rasch models: infit outfit
Tanya and I are writing a paper where we will report some Rasch analyses. It’s no where near the sophistication the field of psychometrics has gone nowadays. What we wish to show is that the data stochastically fits a Guttman scale, i.e., items fall on an underlying difficulty scale, where one has to pass an easy item before passing a hard one but cannot be the other way around.
Since neither of us did Rash modeling for life, we had to be careful what to report. For model diagnosis, for example, the key is whether the items fall on a linear scale. Folloowing advice on Winstep’s website, we will report OutFit, and perhaps InFit.
Misfit diagnosis: infit outfit mean-square standardized
Outfit: outlier-sensitive fit statistic. This is based on the conventional chi-square statistic. This is more sensitive to unexpected observations by persons on items that are relatively very easy or very hard for them (and vice-versa).
Infit: inlier-pattern-sensitive fit statistic. This is based on the chi-square statistic with each observation weighted by its statistical information (model variance). This is more sensitive to unexpected patterns of observations by persons on items that are roughly targeted on them (and vice-versa).
Mean-square: this is the chi-square statistic divided by its degrees of freedom. Consequently its expected value is close to 1.0. Values greater than 1.0 (underfit) indicate unmodeled noise or other source of variance in the data - these degrade measurement. Values less than 1.0 (overfit) indicate that the model predicts the data too well - causing summary statistics, such as reliability statistics, to report inflated statistics. See further dichotomous and polytomous mean-square statistics.
General rules:
First, investigate negative point-measure or point-biserial correlations. Look at the Distractor Tables, 10.3. Remedy miskeys, data entry errors, etc.Then, the general rule is Investigate outfit before infit, mean-square before t standardized, high values before low values.
There is an asymmetry in the implications of out-of-range high and low mean-squares (or positive and negative t-statistics). High mean-squares (or positive t-statistics) are a much greater threat to validity than low mean-squares (or negative fit statistics).
Interpretation of parameter-level mean-square fit statistics:
>2.0 Distorts or degrades the measurement system.
1.5 - 2.0 Unproductive for construction of measurement, but not degrading.
0.5 - 1.5 Productive for measurement.
<0.5 Less productive for measurement, but not degrading. May produce misleadingly good reliabilities and separations.In general, mean-squares near 1.0 indicate little distortion of the measurement system, regardless of the Zstd value.
Evaluate high mean-squares before low ones, because the average mean-square is usually forced to be near 1.0.Outfit mean-squares: influenced by outliers. Usually easy to diagnose and remedy. Less threat to measurement.
Infit mean-squares: influenced by response patterns. Usually hard to diagnose and remedy. Greater threat to measurement.
What do Outfit and Infit tell about the model?
Poor fit does not mean that the Rasch measures (parameter estimates) aren’t linear. The Rasch model forces its estimates to approximate linearity. Misfit means that the reported estimates, though effectively linear, provide a distorted picture of the data.
High outfit mean-squares may be the result of a few random responses by low performers. If so, drop with PDFILE= these performers when doing item analysis, or use EDFILE= to change those response to missing.
High infit mean-squares indicate that the items are mis-performing for the people on whom the items are targeted. This is a bigger threat to validity, but more difficult to diagnose than high outfit.
Mean-squares show the size of the randomness, i.e., the amount of distortion of the measurement system. 1.0 are their expected values. Values less than 1.0 indicate observations are too predictable (redundancy, model overfit). Values greater than 1.0 indicate unpredictability (unmodeled noise, model underfit). Mean-squares usually average to 1.0, so if there are high values, there must also be low ones. Examine the high ones first, and temporarily remove them from the analysis if necessary, before investigating the low ones.
Which one to report?
Question: Should I report Outfit or Infit?
A chi-square statistic is the sum of squares of standard normal variables. Outfit is a chi-square statistic. It is the sum of squared standardized residuals (which are modeled to be standard normal variables). So it is a conventional chi-square, familiar to most statisticians. Chi-squares (including outfit) are sensitive to outliers. For ease of interpretation, this chi-square is divided by its degrees of freedom to have a mean-square form and reported as "Outfit". Consequently I recommend that the Outfit be reported unless there is a strong reason for reporting infit.In the Rasch context, outliers are often lucky guesses and careless mistakes, so these outlying characteristics of respondent behavior can make a "good" item look "bad". Consequently, Infit was devised as a statistic that downweights outliers and focuses more on the response string close to the item difficulty (or person ability). Infit is the sum of (squares of standard normal variables multiplied by their statistical information). For ease of interpretation, Infit is reported in mean-square form by dividing the weighted chi-square by the sum of the weights. This formulation is unfamiliar to most statisticians, so I recommend against reporting it unless the data are heavily contaminated with irrelevant outliers.
May 18th, 2007 at 8:05 pm e
I am not sure what topic you are writing about, but I remember my adviser used Rasch model for his working memory research.
As I remember correctly, he used the following software, BILOG for assessment of his model.
http://assess.com/xcart/product.php?productid=217
According to my (rather blur…) memory, this software check the “part-whole correlation” first. Then, if its coefficient is enough high, it calculates model fit in the next step.
I am not so sure this software (or its manual) would be helpful for your work, either. So, please check it in your convenience.
May 21st, 2007 at 11:50 am e
Hi Nobuyuki,
We are using Ministep, the free version of WinStep, for this project. We are really interested in more of a stochastic version of Guttman’s Scalogram than the Rasch model in the psychometric sense. The part-whole relationship you refer to is probably checking how individual item scores correlate with the overall sum scores. Since we have only a handful of items and less than 30 subjects at each group, we checked these by tabulating the patterns. So far so good.
Thanks for the lead.