Next, run a logistic regression model in SPSS with the bass.sav data.
Use YES as the dependent variable and include three independent variables:
Model 1: YES = f(COST, CATCH, INCOME)
Here are some advanced exercises:
Conduct hypothesis tests for groups of coefficients. Run another model adding a "block" of
demographic variables: EMPLOYED, EDUCATIO, MARRIED, SEX, and AGE (in the Logistic
Regression box, click on "Next" then choose the
demographic "covariates"). Is the block of variables statistically significant (look for the
"block chi-square" statistic in the output)?
Conduct tests for structural breaks
in the data. Do North and South Carolinians behave similarly? Run 3 versions of model 1: NC, SC,
and pooled (in the Logistic
Regression box, click on "select" then click on NC, as your "selection variable", choose NC=1 as the "rule"
and run the logit model; then do the same for NC=0). What is the likelihood ratio test statistic equal to?
Is multicollinearity a problem? Run (1) Model 1 (1) MODEL 1 with EMPLOYED, (2) MODEL 1 with
EMPLOYED and without INCOME. What are the effects
on the statistical significance of INCOME? What is the correlation between EMPLOYED and
INCOME?
Conduct more tests for the appropriate model specification. In Model 1: is there a superior
functional form? In the SPSS data window, select COST and "transform" and "compute" COST into a new variable: LNCOST=ln(COST).
Select INCOME and "transform" and "compute" INCOME into a new variable: INCOMESQ=income*income.
Run the alternative functional form:
MODEL 2: YES = f(LNCOST, CATCH, INCOME)
MODEL 3: YES = f(COST, CATCH, INCOME, INCOMESQ)
Finally, if you need to be convinced that the logistic regression model is superior
to the linear probability model, here are some things to check:
Test for normality of dependent variable (choose the "skewness"
option when you calculated "descriptive statistics,"
if the t-stat on skewness is greater than 2 then the variable
is probably non-normal ...).
Check predicted probabilities from the LP model to determine if they fall outside
of the 0, 1 range (save the "unstandardized" predicted
value when you run a "regression", "linear"
in SPSS).