The Tobit model is based on the following latent variable model:
Y* = b'X + U,
The latent variable Y* is only observed if Y* > 0. In particular, the actual dependent variable is:
Y = max(0,Y*)
For example, let Y be the amount of money that an individual spends on tobacco, given his or her characteristics X. Then Y > 0 if the individual is a smoker, and Y = 0 if not. The Tobit model is a convenient way of modeling this type of data.
For the technical details of the Tobit model, see TOBIT.PDF. In this guided tour I will mainly focus on how to estimate a Tobit model with EasyReg.
As has been explained in TOBIT.PDF, if you ignore the fact that Y comes from a tructated regression model and regress Y on X using the the positive observations on Y only, then the OLS estimator of b will be biased, due to the fact that
where F and f are the distribution function and the density, respectively, of the standard normal distribution.
The appropriate method to estimate the Tobit model is maximum likelihood. See TOBIT.PDF for details.
The data has been generated artificially as follows. The independent variables X1,j and X2,j and the error Uj for j = 1,....,n = 500 have been drawn independently from the standard distribution, and Y has been generated as:
Y =max(0,X1,j + X2,j + Uj).
Thus, if an intercept is included in the model, so that the vectors of regressions are
Xj = (X1,j,X2,j,1)',
then the true parameter vector is
Now open "Menu > Single equation models > Tobit models" in the EasyReg main window, select the variables Y, X1 and X2, and keep the the default intercept, similar to running an OLS regression with intercept, until you arrive at the following window.
In general there is no need to adjust the stopping rules of the Newton iteration which is used to maximize the likelihood function. Thus, click "Tobit analysis". Then after a few seconds the maximum likelihood estimation results appear:
If you click "Continue", the module NEXTMENU will be activated:
You have seen this window before after running an OLS regression, so no further explanation is necessary.
The output is listed below. Note that I have used the option "Wald test of linear parameter restrictions" to test the joint null hypothesis:
y = y* if y* > 0, y = 0 if y* <= 0, where y* = b'x + u
with x the vector of regressors, b the parameter vector,
and u a N(0,s^2) distributed error term.
Y = Y
First observation = 1
Last observation = 500
Number of usable observations: 500
Minimum value: 0.0000000E+000
Maximum value: 5.4575438E+000
Sample mean: 7.2127526E-001
This variable is nonnegative, with 244 zero values.
A Tobit model is therefore suitable
X(1) = X1
X(2) = X2
X(3) = 1
Frequency of Y = 0: 48.80%
(244 out of 500)
Newton iteration succesfully completed after 5 iterations
Last absolute parameter change = 0.0001
Last percentage change of the likelihood = 0.0603
Tobit model: Y = max(Y*,0), with
Y* = b(1)X(1) + b(2)X(2) + b(3)X(3) + u,
where u is distributed N(0,s^2), conditional on the X variables.
Maximum likelihood estimation results:
Variable ML estimates (t-value)
x(1)=X1 b(1)= 1.0547731 (17.0084)
x(2)=X2 b(2)= 0.9905518 (15.2253)
x(3)=1 b(3)= -0.0243418 (-0.3450)
standard error of u s= 1.0635295 (21.9209)
[The p-values are two-sided and based on the normal approximation]
Log likelihood: -4.74065017126E+002
Pseudo R^2: 0.60984
Sample size (n): 500
If the model is correctly specified then the maximum likelihood
parameter estimators b(1),..,b(3), minus their true values, times the
square root of the sample size n, are (asymptotically) jointly normally
distributed with zero mean vector and variance matrix:
1.92290870E+00 6.77554263E-01 -9.38221607E-01
5.37455447E-01 2.11638376E+00 -9.79444588E-01
-9.81136382E-01 -1.09217153E+00 2.48931672E+00
x(1)=X1 b(1)= 1.0547731 (17.0084)(*)
x(2)=X2 b(2)= 0.9905518 (15.2253)(*)
x(3)=1 b(3)= -0.0243418 (-0.3450)(*)
(*): Parameters to be tested
1.x(1)+0.x(2)+0.x(3) = 1.
0.x(1)+1.x(2)+0.x(3) = 1.
0.x(1)+0.x(2)+1.x(3) = 0.
Null hypothesis in matrix form: Rb = c, where
1. 0. 0.
0. 1. 0.
0. 0. 1.
and c =
Wald test statistic: 0.98
Asymptotic null distribution: Chi-square(3)
p-value = 0.80630
Significance levels: 10% 5%
Critical values: 6.25 7.81
Conclusions: accept accept
(*) See TOBIT.PDF for the definition of pseudo R-square.
As an example of a case for which EasyReg refuses to conduct Tobit analysis, select the variables Z, X1, X2 and the constant 1 for the intercept, and declare Z the dependent variable. Then you will get stuck here:
The problem is that Z is discrete, because I have generated it as
Z = Int(100*Y)
where the "Int" function trucates its argument to an integer, by cutting off all the digits after the decimal symbol (a dot "." in the US, a comma "," in Europe). But the Tobit model assumes that Z has a continuous distribution, conditional on Z > 0 and X1 and X2, so that the assumptions of the Tobit model do not hold. Therefore, in order to prevent you from doing bad econometrics, EasyReg will not allow you to continue.
In view of the queries I have gotten about this issue, the message in this window may not be clear enough. If so, click the "Yes" button, which opens a PDF file:
However, the same explanation, and more, can be found in TOBIT.PDF.
If the observed dependent variable Y is confined to an interval (a,b], where
-¥ < a < b < ¥, with
To create this variable Z, open Menu > Input > Transform variables, and conduct the following transformations:
The new variable Z in step 8 can now be used as dependent variable in a Tobit model. However, keep in mind
that in this case a negative coefficient of an X variable implies a positive effect on the original dependent
variable Y, because
Although needless to say (but I will say it anyhow), if a = 0 and b = 1 then you can skip the steps 1 to 6, and use Y instead of YminA/BminA in step 7.
If Y Î [a,b), where
-¥ < a < b < ¥, with
Note that now we model the conditional distribution of Y by
This case cannot be handled by standard Tobit analysis.