## Guided tour on Tobit models

### The model

The Tobit model is based on the following latent variable model:

Y* = b'X + U,

where X is a k-vector of regressors, possibly including 1 for the intercept, and the error term U is N(0,s2) distributed, conditionally on X.

The latent variable Y* is only observed if Y* > 0. In particular, the actual dependent variable is:

Y = max(0,Y*)

For example, let Y be the amount of money that an individual spends on tobacco, given his or her characteristics X. Then Y > 0 if the individual is a smoker, and Y = 0 if not. The Tobit model is a convenient way of modeling this type of data.

For the technical details of the Tobit model, see TOBIT.PDF. In this guided tour I will mainly focus on how to estimate a Tobit model with EasyReg.

### Truncation bias

As has been explained in TOBIT.PDF, if you ignore the fact that Y comes from a tructated regression model and regress Y on X using the the positive observations on Y only, then the OLS estimator of b will be biased, due to the fact that

E[Y|X,Y > 0] = b'X + sf(b'X/s)/F(b'X/s),

where F and f are the distribution function and the density, respectively, of the standard normal distribution.

The appropriate method to estimate the Tobit model is maximum likelihood. See TOBIT.PDF for details.

## Tobit modeling with EasyReg

### The data

The data has been generated artificially as follows. The independent variables X1,j and X2,j and the error Uj for j = 1,....,n = 500 have been drawn independently from the standard distribution, and Y has been generated as:

Y =max(0,X1,j + X2,j + Uj).

Thus, if an intercept is included in the model, so that the vectors of regressions are

Xj = (X1,j,X2,j,1)',

then the true parameter vector is b = (b1,b2,b3)', where

• b1 = 1
• b2 = 1
• b3 = 0
Moreover, the true value of s is
• s = 1
The data file involved is TOBITDATA.TXT, which is in former EasyReg default format. This data file also contains a variable Z, which I will use and explain later.

### How to estimate a Tobit model with EasyReg

Now open "Menu > Single equation models > Tobit models" in the EasyReg main window, select the variables Y, X1 and X2, and keep the the default intercept, similar to running an OLS regression with intercept, until you arrive at the following window.

In general there is no need to adjust the stopping rules of the Newton iteration which is used to maximize the likelihood function. Thus, click "Tobit analysis". Then after a few seconds the maximum likelihood estimation results appear:

If you click "Continue", the module NEXTMENU will be activated:

You have seen this window before after running an OLS regression, so no further explanation is necessary.

The output is listed below. Note that I have used the option "Wald test of linear parameter restrictions" to test the joint null hypothesis:

• b1 = 1
• b2 = 1
• b3 = 0
This hypothesis is not rejected, of course, at any reasonable significance level.

### The output

```Tobit model:
y = y* if y* > 0, y = 0 if y* <= 0, where y* = b'x + u
with x the vector of regressors, b the parameter vector,
and u a N(0,s^2) distributed error term.

Dependent variable:
Y = Y

Characteristics:
Y
First observation = 1
Last observation  = 500
Number of usable observations: 500
Minimum value: 0.0000000E+000
Maximum value: 5.4575438E+000
Sample mean:   7.2127526E-001
This variable is nonnegative, with 244 zero values.
A Tobit model is therefore suitable

X variables:
X(1) = X1
X(2) = X2
X(3) = 1

Frequency of Y = 0: 48.80%
(244 out of 500)
Newton iteration succesfully completed after 5 iterations
Last absolute parameter change = 0.0001
Last percentage change of the likelihood = 0.0603

Tobit model: Y = max(Y*,0), with
Y* = b(1)X(1) + b(2)X(2) + b(3)X(3) + u,
where u is distributed N(0,s^2), conditional on the X variables.

Maximum likelihood estimation results:
Variable                       ML estimates      (t-value)
[p-value]
x(1)=X1                        b(1)=   1.0547731 (17.0084)
[0.00000]
x(2)=X2                        b(2)=   0.9905518 (15.2253)
[0.00000]
x(3)=1                         b(3)=  -0.0243418 (-0.3450)
[0.73011]
standard error of u            s=      1.0635295 (21.9209)
[0.00000]
[The p-values are two-sided and based on the normal approximation]

Log likelihood:      -4.74065017126E+002
Pseudo R^2:                      0.60984
Sample size (n):                     500
Information criteria:
Akaike:               1.912260069
Hannan-Quinn:         1.925490511
Schwarz:              1.945976933

If the model is correctly specified then the maximum likelihood
parameter estimators b(1),..,b(3), minus their true values, times the
square root of the sample size n, are (asymptotically) jointly normally
distributed with zero mean vector and variance matrix:

1.92290870E+00  6.77554263E-01 -9.38221607E-01
5.37455447E-01  2.11638376E+00 -9.79444588E-01
-9.81136382E-01 -1.09217153E+00  2.48931672E+00

Wald test:

x(1)=X1                        b(1)=   1.0547731 (17.0084)(*)
x(2)=X2                        b(2)=   0.9905518 (15.2253)(*)
x(3)=1                         b(3)=  -0.0243418 (-0.3450)(*)
(*): Parameters to be tested

Null hypothesis:
1.x(1)+0.x(2)+0.x(3) = 1.
0.x(1)+1.x(2)+0.x(3) = 1.
0.x(1)+0.x(2)+1.x(3) = 0.

Null hypothesis in matrix form: Rb = c, where
R =
1. 0. 0.
0. 1. 0.
0. 0. 1.
and c =
1.
1.
0.
Wald test statistic:                    0.98
Asymptotic null distribution:  Chi-square(3)
p-value = 0.80630
Significance levels:        10%         5%
Critical values:           6.25       7.81
Conclusions:             accept     accept
```

(*) See TOBIT.PDF for the definition of pseudo R-square.

### An inappropriate attempt to conduct Tobit analysis

As an example of a case for which EasyReg refuses to conduct Tobit analysis, select the variables Z, X1, X2 and the constant 1 for the intercept, and declare Z the dependent variable. Then you will get stuck here:

The problem is that Z is discrete, because I have generated it as

Z = Int(100*Y)

where the "Int" function trucates its argument to an integer, by cutting off all the digits after the decimal symbol (a dot "." in the US, a comma "," in Europe). But the Tobit model assumes that Z has a continuous distribution, conditional on Z > 0 and X1 and X2, so that the assumptions of the Tobit model do not hold. Therefore, in order to prevent you from doing bad econometrics, EasyReg will not allow you to continue.

In view of the queries I have gotten about this issue, the message in this window may not be clear enough. If so, click the "Yes" button, which opens a PDF file:

However, the same explanation, and more, can be found in TOBIT.PDF.

## What to do if the dependent variable Y is confined to a bounded interval?

### The case Y Î (a,b]

If the observed dependent variable Y is confined to an interval (a,b], where -¥ < a < b < ¥, with P[Y = b] > 0, it is possible to transform Y to a new dependent variable Z, say, such that Z Î [0,¥) and P[Z = 0] = P[Y = b] > 0, namely Z = -ln[(Y - a)/(b - a)]. Next, assume that Z = max(0,Z*), where Z* = b'X + U. Then

Y = min(b,a + (b - a)exp(-Z*)) = min(b,a + (b - a)exp(-b'X - U)).

To create this variable Z, open Menu > Input > Transform variables, and conduct the following transformations:

1. Click the "Constant = 1" button. Then a new variable "1" is created, which has the value 1 for all observations.
2. Click the "Linear combination of variables" button, select "1" and use the value of a as coefficient. Then a new variable with name "ax1" is created, which has the value a for all observations. I will assume that you have renamed the variable "ax1" as variable A.
3. Click the "Linear combination of variables" button, select "1" and use the value of b as coefficient. Then a new variable with name "bx1" is created, which has the value b for all observations. I will assume that you have renamed the variable "bx1" as variable B.
4. Click the "Linear combination of variables" button, select the variables Y and A, and create the linear combination Y-A. I will assume that you have renamed Y-A as YminA. Note that now YminA Î (0,b-a].
5. Click the "Linear combination of variables" button, select the variables B and A, and create the linear combination B-A. I will assume that you have renamed B-A as BminA. Note that BminA is a contant with value b-a for all observations.
6. Click the "Multiplicative transformation of variables" button, select the variables YminA and BminA and use the powers 1 and -1, respectively, to create the new variable "YminA x BminA^-1". I will assume that you have renamed this new variable as YminA/BminA. Note that YminA/BminA Î (0,1].
7. Click the "LOG transformation: x -> ln(x)" button, and select the variable YminA/BminA. Then the new variable LN[YminA/BminA] will be created. Note that LN[YminA/BminA] Î (-¥,0].
8. Click the "Linear combination of variables" button, select the variable LN[YminA/BminA], and use the coefficient -1 to create the variable -LN[YminA/BminA]. I will assume that you have renamed this variable as Z. Thus, Z = -LN[YminA/BminA]. Now Z Î [0,¥), and P[Z = 0] = P[Y = b] > 0.

The new variable Z in step 8 can now be used as dependent variable in a Tobit model. However, keep in mind that in this case a negative coefficient of an X variable implies a positive effect on the original dependent variable Y, because Z/Y = -1/(Y-a) < 0, hence Y/Z < 0.

Although needless to say (but I will say it anyhow), if a = 0 and b = 1 then you can skip the steps 1 to 6, and use Y instead of YminA/BminA in step 7.

### The case Y Î [a,b)

If Y Î [a,b), where -¥ < a < b < ¥, with P[Y = a] > 0, then Z = -ln[(b - Y)/(b - a)] Î [0,¥), with P[Z = 0] = P[Y = a] > 0. This variable Z can be created similarly to the previous steps 1 to 8, and can be used as the new dependent variable in a Tobit model. Since now Y/Z > 0, a positive coefficient of an X variable implies a positive effect of this X variable on Y.

Note that now we model the conditional distribution of Y by

Y = max(a,b - (b - a)exp(-Z*)) = max (a,b - (b - a)exp(-b'X - U)).

### The case Y Î [a,b]

This case cannot be handled by standard Tobit analysis.