In multiparameter problems, information sharing across parameters can be used to improve the power of statistical hypothesis tests, thereby providing smaller p-values and narrower confidence intervals, on average across parameters. The FABInference
package provides information sharing in linear and generalized linear regression models using a syntax similar to the built-in R functions lm
and glm
.
Suppose you want to get FAB p-values for the predictors xi, 1, …, xi, p in the linear model
yi = α0 + α1wi, 1 + α2wi, 2 + β1xi, 1 + ⋯ + βpxi, p + ϵi,
where wi, 1 and wi, 2 (and potentially other wi, j’s) are additional control variables you’d like to have in the model. Then you need to
column-bind the x-variables into an n × p matrix X
, e.g. X<-cbind(x1,x2,x3)
;
run the command fit<-lmFAB(y~w1+w2,X)
.
The output is similar to the output of the lm
command, so you can type summary(fit)
to see the FAB p-values. The FAB p-values and confidence intervals are stored in fit$FABpv
and fit$FABci
.
If β1, …, βp correspond to p objects about which you have additional covariate information (say attributes {(vj, 1, vj, 2), j = 1, …, p} you might be interested in fitting the model fit<-lmFAB(y~w1+w2,X,~v1+v2)
, where v1
and v2
are p-dimensional vectors giving the attributes associated with β1, …, βp. The additional term specifies a linking model for β1, …, βp. Importantly, the linking model doesn’t have to be correct in any way for the FAB p-values of confidence intervals to be valid. However, the better the linking model, the smaller the p-values and the narrower the intervals.
FAB inference for generalized linear models can be obtained similarly using the command glmFAB
. In this case, the p-values and confidence intervals are valid asymptotically (just like the standard p-values and intervals). Fitting a normal linear regression with glmFAB
is much faster than using lmFAB
because the former uses an asymptotic approximation.
In the simplest case of a normally distributed estimator θ̂ of θ such that θ̂ ∼ N(θ, σ2), a standard p-value and confidence interval are based on the test statistic |θ̂|. A FAB p-value and confidence interval is based on the statistic |θ̂ + a|, where a is determined from indirect information about the sign and magnitude of θ. The functional form of the FAB p-value is extremely simple:
pFAB(θ̂, a) = 1 − |Φ(θ̂ + 2a) − Φ( − θ̂)|,
where Φ is the standard normal CDF. The FAB confidence interval is a bit more complicated. In multiparameter settings, the optimal choice for a for one parameter may be estimated from data on the other parameters, using a linking model that relates the parameters to each other. Importantly, the FAB confidence intervals and p-values have correct frequentist error rates, even if the linking model is incorrect.
# Release version on CRAN
install.packages("FABInference")
# Development version on GitHub
devtools::install_github("pdhoff/FABInference")
“Smaller p-values via indirect information”. P.D. Hoff. arXiv:1907.12589 Journal of the American Statistical Association, to appear.
“Smaller p-values in genomics studies using distilled historical information”. arXiv:2004.07887 J.G. Bryan and P.D. Hoff. Biostatistics, to appear.
“Exact adaptive confidence intervals for small areas”. K. Burris and P.D. Hoff. arXiv:1809.09159 Journal of Survey Statistics and Methodology, 8(2):206–230, 2020.
“Exact adaptive confidence intervals for linear regression coefficients”. P.D. Hoff and C. Yu. arXiv:1705.08331 Electronic Journal of Statistics, 13(1):94–119, 2019.
“Adaptive multigroup confidence intervals with constant coverage”. arXiv:1612.08287 C. Yu and P.D. Hoff. Biometrika, 105(2):319–335, 2018.
Small area estimation Replication file for Hoff(2019)
Hidden Markov model Replication file for Hoff(2019)
Linear model interactions Replication file for Hoff(2019)
Logistic regression Replication file for Hoff(2019)