Hurdle Models for Count Data Regression (2024)

hurdle {pscl}

R Documentation

Description

Fit hurdle regression models for count data via maximum likelihood.

Usage

hurdle(formula, data, subset, na.action, weights, offset, dist = c("poisson", "negbin", "geometric"), zero.dist = c("binomial", "poisson", "negbin", "geometric"), link = c("logit", "probit", "cloglog", "cauchit", "log"), control = hurdle.control(...), model = TRUE, y = TRUE, x = FALSE, ...)

Arguments

`formula`	symbolic description of the model, see details.
`data`, `subset`, `na.action`	arguments controlling formula processingvia `model.frame`.
`weights`	optional numeric vector of weights.
`offset`	optional numeric vector with an a priori known component to beincluded in the linear predictor of the count model. See below for moreinformation on offsets.
`dist`	character specification of count model family.
`zero.dist`	character specification of the zero hurdle model family.
`link`	character specification of link function in the binomialzero hurdle (only used if `zero.dist = "binomial"`.
`control`	a list of control arguments specified via`hurdle.control`.
`model`, `y`, `x`	logicals. If `TRUE` the corresponding componentsof the fit (model frame, response, model matrix) are returned.
`...`	arguments passed to `hurdle.control` in thedefault setup.

Details

Hurdle count models are two-component models with a truncated countcomponent for positive counts and a hurdle component that models thezero counts. Thus, unlike zero-inflation models, there are not twosources of zeros: the count model is only employed if the hurdle formodeling the occurrence of zeros is exceeded. The count model is typicallya truncated Poisson or negative binomial regression (with log link).The geometric distribution is a special case of the negative binomial withsize parameter equal to 1. For modeling the hurdle, either a binomial modelcan be employed or a censored count distribution. The outcome of the hurdlecomponent of the model is the occurrence of a non-zero (positive) count.Thus, for most models, positive coefficients in the hurdle component indicatethat an increase in the regressor increases the probability of a non-zero count.Binomial logit and censored geometric models as the hurdle part both lead to the same likelihood function and thus to the same coefficient estimates.A censored negative binomial model for the zero hurdle is only identifiedif there is at least one non-constant regressor with (true) coefficient differentfrom zero (and if all coefficients are close to zero the model can be poorlyconditioned).

The formula can be used to specify both components of the model:If a formula of type y ~ x1 + x2 is supplied, then the sameregressors are employed in both components. This is equivalent toy ~ x1 + x2 | x1 + x2. Of course, a different set of regressorscould be specified for the zero hurdle component, e.g.,y ~ x1 + x2 | z1 + z2 + z3 giving the count data model y ~ x1 + x2conditional on (|) the zero hurdle model y ~ z1 + z2 + z3.

Offsets can be specified in both parts of the model pertaining to count andzero hurdle model: y ~ x1 + offset(x2) | z1 + z2 + offset(z3), wherex2 is used as an offset (i.e., with coefficient fixed to 1) in thecount part and z3 analogously in the zero hurdle part. By the rulestated above y ~ x1 + offset(x2) is expanded toy ~ x1 + offset(x2) | x1 + offset(x2). Instead of using theoffset() wrapper within the formula, the offset argumentcan also be employed which sets an offset only for the count model. Thus,formula = y ~ x1 and offset = x2 is equivalent toformula = y ~ x1 + offset(x2) | x1.

Value

An object of class "hurdle", i.e., a list with components including

`coefficients`	a list with elements `"count"` and `"zero"`containing the coefficients from the respective models,
`residuals`	a vector of raw residuals (observed - fitted),
`fitted.values`	a vector of fitted means,
`optim`	a list (of lists) with the output(s) from the `optim` call(s) forminimizing the negative log-likelihood(s),
`control`	the control arguments passed to the `optim` call,
`start`	the starting values for the parameters passed to the `optim` call(s),
`weights`	the case weights used,
`offset`	a list with elements `"count"` and `"zero"`containing the offset vectors (if any) from the respective models,
`n`	number of observations (with weights > 0),
`df.null`	residual degrees of freedom for the null model (= `n - 2`),
`df.residual`	residual degrees of freedom for fitted model,
`terms`	a list with elements `"count"`, `"zero"` and`"full"` containing the terms objects for the respective models,
`theta`	estimate of the additional `\theta` parameter of thenegative binomial model(s) (if negative binomial component is used),
`SE.logtheta`	standard error(s) for `\log(\theta)`,
`loglik`	log-likelihood of the fitted model,
`vcov`	covariance matrix of all coefficients in the model (derived from theHessian of the `optim` output(s)),
`dist`	a list with elements `"count"` and `"zero"` with characterstrings describing the respective distributions used,
`link`	character string describing the link if a binomial zero hurdle modelis used,
`linkinv`	the inverse link function corresponding to `link`,
`converged`	logical indicating successful convergence of `optim`,
`call`	the original function call,
`formula`	the original formula,
`levels`	levels of the categorical regressors,
`contrasts`	a list with elements `"count"` and `"zero"`containing the contrasts corresponding to `levels` from therespective models,
`model`	the full model frame (if `model = TRUE`),
`y`	the response count vector (if `y = TRUE`),
`x`	a list with elements `"count"` and `"zero"`containing the model matrices from the respective models(if `x = TRUE`).

Author(s)

Achim Zeileis <Achim.Zeileis@R-project.org>

References

Cameron, A. Colin and Pravin K. Trivedi. 1998. Regression Analysis of Count Data. New York: Cambridge University Press.

Cameron, A. Colin and Pravin K. Trivedi 2005. Microeconometrics: Methods and Applications.Cambridge: Cambridge University Press.

Mullahy, J. 1986. Specification and Testing of Some Modified Count Data Models.Journal of Econometrics. 33:341–365.

Zeileis, Achim, Christian Kleiber and Simon Jackman 2008.“Regression Models for Count Data in R.” Journal of Statistical Software, 27(8).URL https://www.jstatsoft.org/v27/i08/.

Examples

## datadata("bioChemists", package = "pscl")## logit-poisson## "art ~ ." is the same as "art ~ . | .", i.e.## "art ~ fem + mar + kid5 + phd + ment | fem + mar + kid5 + phd + ment"fm_hp1 <- hurdle(art ~ ., data = bioChemists)summary(fm_hp1)## geometric-poissonfm_hp2 <- hurdle(art ~ ., data = bioChemists, zero = "geometric")summary(fm_hp2)## logit and geometric model are equivalentcoef(fm_hp1, model = "zero") - coef(fm_hp2, model = "zero")## logit-negbinfm_hnb1 <- hurdle(art ~ ., data = bioChemists, dist = "negbin")summary(fm_hnb1)## negbin-negbin## (poorly conditioned zero hurdle, note the standard errors)fm_hnb2 <- hurdle(art ~ ., data = bioChemists, dist = "negbin", zero = "negbin")summary(fm_hnb2)

[Package pscl version 1.5.9 Index]

Hurdle Models for Count Data Regression (2024)

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

References