Title: | Binary Regression Model |
---|---|
Description: | Fits novel models for the conditional relative risk, risk difference and odds ratio. |
Authors: | Linbo Wang, Mark Clements, Thomas Richardson |
Maintainer: | Mark Clements <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1 |
Built: | 2024-11-01 11:27:40 UTC |
Source: | https://github.com/mclements/brm |
The function brm
in this package provides an alternative to generalized linear models for fitting binary regression models, in which both the response and the primary exposure of interest
are binary. This is especially useful if the interest lies in estimating the association between
and
, and how that association varies as a function of (other) covariates
.
Unlike glm
, which uses a single link function for the outcome, brm
separates the nuisance model from the target model. This separation provides opportunities to choose nuisance models independently of the target model. To see why this is important, we may contrast it with the use of a GLM to model the log relative risk. In this setting one might use a Poisson regression (with interaction term) (though such a model ignores the fact that
is binary); here
and
are subsets of
. Such a Poisson model can be seen as a combination of two parts: a target model
and a nuisance model
. However, this nuisance model is variation dependent of the target model so that predicted probabilities may go outside of
. Furthermore, one cannot solve this problem under a GLM framework as with a GLM, the target model and nuisance model are determined simultaneously through a link function.
More specifically, if the target model is a linear model on the conditional log Relative Risk (log RR) or ('logistically' transformed) conditional Risk Difference (atanh RD), brm
fits a linear nuisance model for the conditional log Odds Product (log OP). If the target model is a linear model on the conditional log Odds Ratio (log OR), brm
fits a linear nuisance model on the conditional logit baseline risk, logit P(y = 1|x = 0, vb). Note in this case the target and nuisance models combine to form a simple logistic regression model (which is fitted using glm
).
brm
fits the three target models described above as they are simple and the parameter space is unconstrained. brm
fits the nuisance models above as they are variation independent of the corresponding target model. This variation independence greatly facilitates parameter estimation and interpretation.
brm
also provides doubly robust fitting as an option such that the estimates for are still consistent and asymptotically normal even when the nuisance model is misspecified, provided that we have a correctly specified logistic model for the exposure probability
. Such doubly robust estimation is only possible for the Relative Risk and Risk Difference, but not the Odds Ratio.
See Richardson et al. (2017) for more details.
Thomas S. Richardson, James M. Robins and Linbo Wang. "On Modeling and Estimation for the Relative Risk and Risk Difference." Journal of the American Statistical Association: Theory and Methods (2017).
brm
is used to estimate the association between two binary variables, and how that varies as a function of other covariates.
brm( y, x, va, vb = NULL, param, est.method = "MLE", vc = NULL, optimal = TRUE, weights = NULL, subset = NULL, max.step = NULL, thres = 1e-08, alpha.start = NULL, beta.start = NULL, message = FALSE )
brm( y, x, va, vb = NULL, param, est.method = "MLE", vc = NULL, optimal = TRUE, weights = NULL, subset = NULL, max.step = NULL, thres = 1e-08, alpha.start = NULL, beta.start = NULL, message = FALSE )
y |
The response vector. Should only take values 0 or 1. |
x |
The exposure vector. Should only take values 0 or 1. |
va |
The covariates matrix for the target model. It can be specified via an object of class " |
vb |
The covariates matrix for the nuisance model. It can be specified via an object of class " |
param |
The measure of association. Can take value 'RD' (risk difference), 'RR' (relative risk) or 'OR' (odds ratio) |
est.method |
The method to be used in fitting the model. Can be 'MLE' (maximum likelihood estimation, the default) or 'DR' (doubly robust estimation). |
vc |
The covariates matrix for the probability of exposure, often called the propensity score. It can be specified via an object of class " |
optimal |
Use the optimal weighting function for the doubly robust estimator? Ignored if the estimation method is 'MLE'. The default is TRUE. |
weights |
An optional vector of 'prior weights' to be used in the fitting process. Should be NULL or a numeric vector. |
subset |
An optional vector specifying a subset of observations to be used in the fitting process. |
max.step |
The maximal number of iterations to be passed into the |
thres |
Threshold for judging convergence. The default is 1e-6. |
alpha.start |
Starting values for the parameters in the target model. |
beta.start |
Starting values for the parameters in the nuisance model. |
message |
Show optimization details? Ignored if the estimation method is 'MLE'. The default is FALSE. |
brm
contains two parts: the target model for the dependence measure (RR, RD or OR) and the nuisance model; the latter is required for maximum likelihood estimation.
If param="RR"
then the target model is .
If
param="RD"
then the target model is .
If
param="OR"
then the target model is .
For RR and RD, the nuisance model is for the log Odds Product:
.
For OR, the nuisance model is for the baseline risk:
In each case the nuisance model is variation independent of the target model, which ensures that the predicted probabilities lie in
.
See Richardson et al. (2016+) for more details.
If est.method="DR"
then given a correctly specified logistic regression model for the propensity score , estimation of the RR or RD is consistent, even if the log Odds Product model is misspecified. This estimation method is not available for the OR. See Tchetgen Tchetgen et al. (2014) for more details.
When estimating RR and RD, est.method="DR"
is recommended unless it is known that the log Odds Product model is correctly specified. Optimal weights (optimal=TRUE
) are also recommended to increase efficiency.
For the doubly robust estimation method, MLE is used to obtain preliminary estimates of ,
and
. The estimate of
is then updated by solving a doubly-robust estimating equation. (The estimate for
remains the MLE.)
A list consisting of
param |
the measure of association. |
point.est |
the point estimates. |
se.est |
the standard error estimates. |
cov |
estimate of the covariance matrix for the estimates. |
conf.lower |
the lower limit of the 95% (marginal) confidence interval. |
conf.upper |
the upper limit of the 95% (marginal) confidence interval. |
p.value |
the two sided p-value for testing zero coefficients. |
coefficients |
the matrix summarizing key information: point estimate, 95% confidence interval and p-value. |
param.est |
the fitted RR/RD/OR. |
va |
the matrix of covariates for the target model. |
vb |
the matrix of covariates for the nuisance model. |
converged |
Did the maximization process converge? |
Linbo Wang <[email protected]>, Mark Clements <[email protected]>, Thomas Richardson <[email protected]>
Thomas S. Richardson, James M. Robins and Linbo Wang. "On Modeling and Estimation for the Relative Risk and Risk Difference." Journal of the American Statistical Association: Theory and Methods (2017).
Eric J. Tchetgen Tchetgen, James M. Robins and Andrea Rotnitzky. "On doubly robust estimation in a semiparametric odds ratio model." Biometrika 97.1 (2010): 171-180.
getProbScalarRD
, getProbRD
(vectorised), getProbScalarRR
and getProbRR
(vectorised) for functions calculating risks P(y=1|x=1) and P(y=1|x=0) from (atanh RD, log OP) or (log RR, log OP);
predict.blm
for obtaining fitted probabilities from brm
fits.
set.seed(0) n = 100 alpha.true = c(0,-1) beta.true = c(-0.5,1) gamma.true = c(0.1,-0.5) params.true = list(alpha.true=alpha.true, beta.true=beta.true, gamma.true=gamma.true) v.1 = rep(1,n) # intercept term v.2 = runif(n,-2,2) v = cbind(v.1,v.2) pscore.true = exp(v %*% gamma.true) / (1+exp(v %*% gamma.true)) p0p1.true = getProbRR(v %*% alpha.true,v %*% beta.true) x = rbinom(n, 1, pscore.true) pA.true = p0p1.true[,1] pA.true[x==1] = p0p1.true[x==1,2] y = rbinom(n, 1, pA.true) fit.mle = brm(y,x,v,v,'RR','MLE',v,TRUE) fit.drw = brm(y,x,v,v,'RR','DR',v,TRUE) fit.dru = brm(y,x,v,v,'RR','DR',v,FALSE) fit.mle2 = brm(y,x,~v.2, ~v.2, 'RR','MLE', ~v.2,TRUE) # same as fit.mle
set.seed(0) n = 100 alpha.true = c(0,-1) beta.true = c(-0.5,1) gamma.true = c(0.1,-0.5) params.true = list(alpha.true=alpha.true, beta.true=beta.true, gamma.true=gamma.true) v.1 = rep(1,n) # intercept term v.2 = runif(n,-2,2) v = cbind(v.1,v.2) pscore.true = exp(v %*% gamma.true) / (1+exp(v %*% gamma.true)) p0p1.true = getProbRR(v %*% alpha.true,v %*% beta.true) x = rbinom(n, 1, pscore.true) pA.true = p0p1.true[,1] pA.true[x==1] = p0p1.true[x==1,2] y = rbinom(n, 1, pA.true) fit.mle = brm(y,x,v,v,'RR','MLE',v,TRUE) fit.drw = brm(y,x,v,v,'RR','DR',v,TRUE) fit.dru = brm(y,x,v,v,'RR','DR',v,FALSE) fit.mle2 = brm(y,x,~v.2, ~v.2, 'RR','MLE', ~v.2,TRUE) # same as fit.mle
Calculate risks from arctanh RD and log OP (vectorised)
getProbRD(atanhrd, logop)
getProbRD(atanhrd, logop)
atanhrd |
arctanh of risk difference |
logop |
log of odds product |
The is defined as
.
The inverse hyperbolic tangent function
arctanh
is defined as .
a matrix with two columns
getProbRD(0,0) set.seed(0) logrr = rnorm(10,0,1) logop = rnorm(10,0,1) probs = getProbRD(logrr, logop) colnames(probs) = c("P(y=1|x=0)","P(y=1|x=1)") probs
getProbRD(0,0) set.seed(0) logrr = rnorm(10,0,1) logop = rnorm(10,0,1) probs = getProbRD(logrr, logop) colnames(probs) = c("P(y=1|x=0)","P(y=1|x=1)") probs
Calculate risks from log RR and log OP (vectorised)
getProbRR(logrr, logop = NA)
getProbRR(logrr, logop = NA)
logrr |
log of relative risk |
logop |
log of odds product |
The is defined as
.
a matrix with two columns
getProbRR(0,0) set.seed(0) logrr = rnorm(10,0,1) logop = rnorm(10,0,1) probs = getProbRR(logrr, logop) colnames(probs) = c("P(y=1|x=0)","P(y=1|x=1)") probs
getProbRR(0,0) set.seed(0) logrr = rnorm(10,0,1) logop = rnorm(10,0,1) probs = getProbRR(logrr, logop) colnames(probs) = c("P(y=1|x=0)","P(y=1|x=1)") probs
Calculate risks from arctanh RD and log OP
getProbScalarRD(atanhrd, logop)
getProbScalarRD(atanhrd, logop)
atanhrd |
arctanh of risk difference |
logop |
log of odds product |
The is defined as
.
The inverse hyperbolic tangent function
arctanh
is defined as .
a vector
getProbScalarRD(0,0) set.seed(0) logrr = rnorm(10,0,1) logop = rnorm(10,0,1) probs = mapply(getProbScalarRD, logrr, logop) rownames(probs) = c("P(y=1|x=0)","P(y=1|x=1)") probs
getProbScalarRD(0,0) set.seed(0) logrr = rnorm(10,0,1) logop = rnorm(10,0,1) probs = mapply(getProbScalarRD, logrr, logop) rownames(probs) = c("P(y=1|x=0)","P(y=1|x=1)") probs
Calculate risks from log RR and log OP
getProbScalarRR(logrr, logop = NA)
getProbScalarRR(logrr, logop = NA)
logrr |
log of relative risk |
logop |
log of odds product |
The is defined as
.
a vector
getProbScalarRR(0,0) set.seed(0) logrr = rnorm(10,0,1) logop = rnorm(10,0,1) probs = mapply(getProbScalarRR, logrr, logop) rownames(probs) = c("P(y=1|x=0)","P(y=1|x=1)") probs
getProbScalarRR(0,0) set.seed(0) logrr = rnorm(10,0,1) logop = rnorm(10,0,1) probs = mapply(getProbScalarRR, logrr, logop) rownames(probs) = c("P(y=1|x=0)","P(y=1|x=1)") probs
brm
fitsCalculate fitted probabilities from a fitted binary regression model object.
## S3 method for class 'brm' predict(object, x.new = NULL, va.new = NULL, vb.new = NULL, ...)
## S3 method for class 'brm' predict(object, x.new = NULL, va.new = NULL, vb.new = NULL, ...)
object |
A fitted object from function |
x.new |
An optional vector of x. |
va.new |
An optional covariate matrix to make predictions with. If omitted, the original matrix va is used. |
vb.new |
An optional covariate matrix to make predictions with. If vb.new is omitted but va.new is not, then vb.new is set to be equal to va.new. If both vb.new and va.new are omitted, then the original matrix vb is used. |
... |
affecting the predictions produced. |
If x.new is omitted, a matrix consisting of fitted probabilities for p0 = P(y=1|x=0,va,vb) and p1 = P(y=1|x=1,va,vb).
If x.new is supplied, a vector consisting of fitted probabilities px = P(y=1|x=x.new,va,vb).
Ancillary function for printing
## S3 method for class 'brm' print(x, ...)
## S3 method for class 'brm' print(x, ...)
x |
a list obtained with the function 'brm' |
... |
additional arguments affecting the output |