Introduction to Logistic Regression
Logit models are subject to many of the same problems as in multiple
regression:
i) Omitted variable(s) can result in bias
in the coefficient estimates. To test for omitted variables you
can conduct a likelihood ratio test:
- LR[q] = {[-2LL(constrained model, i=k-q)] - [-2LL(unconstrained
model, i=k)]}
- where LR is distributed chi-square with q degrees of freedom,
with q = 1 or more omitted variables
- This test is conducted automatically by SPSS if you specify
"blocks" of independent variables (look for the "block
chi-square" in the SPSS output)
ii) The inclusion of irrelevant variable(s)
can result in poor model fit. You can consult your Wald statistics
or conduct a likelihood ratio test (see above) to search for independent
variables with low explanatory power.
iii) Errors in functional form can
result in biased coefficient estimates and poor model fit. You
should try different functional forms and consult the Wald statistics
and model chi-square statistics for overall model fit.
iv) The presence of multicollinearity will
not lead to biased coefficients, but the standard errors of the
coefficients will be inflated. If a variable which you think
should be important (statistically significant) is not,
consult the correlation coefficients. Any r(x,y) greater than
.4 (.6 - .8 is usually the troublesome range) may be causing the problem.
v) You may have structural breaks
in your data. Pooling the data imposes the restriction that an
independent variable has the same effect on the dependent variable
for different groups of data when the opposite may be true. You
can conduct a likelihood ratio test:
LR[i+1] = -2LL(pooled model) - [-2LL(sample 1) + -2LL(sample
2)]
where samples 1 and 2 are pooled, and i is the number of independent
variables.