FABInference

Construction of FAB p-values and confidence intervals

Package website

About

In multiparameter problems, information sharing across parameters can be used to improve the power of statistical hypothesis tests, thereby providing smaller p-values and narrower confidence intervals, on average across parameters. The FABInference package provides information sharing in linear and generalized linear regression models using a syntax similar to the built-in R functions lm and glm.

Usage

Suppose you want to get FAB p-values for the predictors xi, 1, …, xi, p in the linear model


yi = α0 + α1wi, 1 + α2wi, 2 + β1xi, 1 + ⋯ + βpxi, p + ϵi,

where wi, 1 and wi, 2 (and potentially other wi, j’s) are additional control variables you’d like to have in the model. Then you need to

  1. column-bind the x-variables into an n × p matrix X, e.g. X<-cbind(x1,x2,x3);

  2. run the command fit<-lmFAB(y~w1+w2,X).

The output is similar to the output of the lm command, so you can type summary(fit) to see the FAB p-values. The FAB p-values and confidence intervals are stored in fit$FABpv and fit$FABci.

If β1, …, βp correspond to p objects about which you have additional covariate information (say attributes {(vj, 1, vj, 2), j = 1, …, p} you might be interested in fitting the model fit<-lmFAB(y~w1+w2,X,~v1+v2), where v1 and v2 are p-dimensional vectors giving the attributes associated with β1, …, βp. The additional term specifies a linking model for β1, …, βp. Importantly, the linking model doesn’t have to be correct in any way for the FAB p-values of confidence intervals to be valid. However, the better the linking model, the smaller the p-values and the narrower the intervals.

FAB inference for generalized linear models can be obtained similarly using the command glmFAB. In this case, the p-values and confidence intervals are valid asymptotically (just like the standard p-values and intervals). Fitting a normal linear regression with glmFAB is much faster than using lmFAB because the former uses an asymptotic approximation.

Theoretical details

In the simplest case of a normally distributed estimator θ̂ of θ such that θ̂ ∼ N(θ, σ2), a standard p-value and confidence interval are based on the test statistic |θ̂|. A FAB p-value and confidence interval is based on the statistic |θ̂ + a|, where a is determined from indirect information about the sign and magnitude of θ. The functional form of the FAB p-value is extremely simple:


pFAB(θ̂, a) = 1 − |Φ(θ̂ + 2a) − Φ( − θ̂)|,

where Φ is the standard normal CDF. The FAB confidence interval is a bit more complicated. In multiparameter settings, the optimal choice for a for one parameter may be estimated from data on the other parameters, using a linking model that relates the parameters to each other. Importantly, the FAB confidence intervals and p-values have correct frequentist error rates, even if the linking model is incorrect.

Documentation and citation

“Smaller p-values via indirect information”. P.D. Hoff. arXiv:1907.12589 Journal of the American Statistical Association, to appear.

“Smaller p-values in genomics studies using distilled historical information”. arXiv:2004.07887 J.G. Bryan and P.D. Hoff. Biostatistics, to appear.

“Exact adaptive confidence intervals for small areas”. K. Burris and P.D. Hoff. arXiv:1809.09159 Journal of Survey Statistics and Methodology, 8(2):206–230, 2020.

“Exact adaptive confidence intervals for linear regression coefficients”. P.D. Hoff and C. Yu. arXiv:1705.08331 Electronic Journal of Statistics, 13(1):94–119, 2019.

“Adaptive multigroup confidence intervals with constant coverage”. arXiv:1612.08287 C. Yu and P.D. Hoff. Biometrika, 105(2):319–335, 2018.

Some examples