trtools.associaTR module
- class trtools.associaTR.OLS(endog, exog=None, missing='none', hasconst=None, **kwargs)
Bases:
statsmodels.regression.linear_model.WLSOrdinary Least Squares
- Parameters
endog (array_like) – A 1-d endogenous response variable. The dependent variable.
exog (array_like) – A nobs x k array where nobs is the number of observations and k is the number of regressors. An intercept is not included by default and should be added by the user. See
statsmodels.tools.add_constant().missing (str) – Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘none’.
hasconst (None or bool) – Indicates whether the RHS includes a user-supplied constant. If True, a constant is not checked for and k_constant is set to 1 and all result statistics are calculated as if a constant is present. If False, a constant is not checked for and k_constant is set to 0.
**kwargs – Extra arguments that are used to set model properties when using the formula interface.
- weights
Has an attribute weights = array(1.0) due to inheritance from WLS.
- Type
scalar
See also
WLSFit a linear model using Weighted Least Squares.
GLSFit a linear model using Generalized Least Squares.
Notes
No constant is added by the model unless you are using formulas.
Examples
>>> import statsmodels.api as sm >>> import numpy as np >>> duncan_prestige = sm.datasets.get_rdataset("Duncan", "carData") >>> Y = duncan_prestige.data['income'] >>> X = duncan_prestige.data['education'] >>> X = sm.add_constant(X) >>> model = sm.OLS(Y,X) >>> results = model.fit() >>> results.params const 10.603498 education 0.594859 dtype: float64
>>> results.tvalues const 2.039813 education 6.892802 dtype: float64
>>> print(results.t_test([1, 0])) Test for Constraints ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ c0 10.6035 5.198 2.040 0.048 0.120 21.087 ==============================================================================
>>> print(results.f_test(np.identity(2))) <F test: F=array([[159.63031026]]), p=1.2607168903696672e-20, df_denom=43, df_num=2>
- _fit_ridge(alpha)
Fit a linear model using ridge regression.
- Parameters
alpha (scalar or array_like) – The penalty weight. If a scalar, the same penalty weight applies to all variables in the model. If a vector, it must have the same length as params, and contains a penalty weight for each coefficient.
Notes
Equivalent to fit_regularized with L1_wt = 0 (but implemented more efficiently).
- fit_regularized(method='elastic_net', alpha=0.0, L1_wt=1.0, start_params=None, profile_scale=False, refit=False, **kwargs)
Return a regularized fit to a linear regression model.
- Parameters
method (str) – Either ‘elastic_net’ or ‘sqrt_lasso’.
alpha (scalar or array_like) – The penalty weight. If a scalar, the same penalty weight applies to all variables in the model. If a vector, it must have the same length as params, and contains a penalty weight for each coefficient.
L1_wt (scalar) – The fraction of the penalty given to the L1 penalty term. Must be between 0 and 1 (inclusive). If 0, the fit is a ridge fit, if 1 it is a lasso fit.
start_params (array_like) – Starting values for
params.profile_scale (bool) – If True the penalized fit is computed using the profile (concentrated) log-likelihood for the Gaussian model. Otherwise the fit uses the residual sum of squares.
refit (bool) – If True, the model is refit using only the variables that have non-zero coefficients in the regularized fit. The refitted model is not regularized.
**kwargs – Additional keyword arguments that contain information used when constructing a model using the formula interface.
- Returns
The regularized results.
- Return type
statsmodels.base.elastic_net.RegularizedResults
Notes
The elastic net uses a combination of L1 and L2 penalties. The implementation closely follows the glmnet package in R.
The function that is minimized is:
\[0.5*RSS/n + alpha*((1-L1\_wt)*|params|_2^2/2 + L1\_wt*|params|_1)\]where RSS is the usual regression sum of squares, n is the sample size, and \(|*|_1\) and \(|*|_2\) are the L1 and L2 norms.
For WLS and GLS, the RSS is calculated using the whitened endog and exog data.
Post-estimation results are based on the same data used to select variables, hence may be subject to overfitting biases.
The elastic_net method uses the following keyword arguments:
- maxiterint
Maximum number of iterations
- cnvrg_tolfloat
Convergence threshold for line searches
- zero_tolfloat
Coefficients below this threshold are treated as zero.
The square root lasso approach is a variation of the Lasso that is largely self-tuning (the optimal tuning parameter does not depend on the standard deviation of the regression errors). If the errors are Gaussian, the tuning parameter can be taken to be
alpha = 1.1 * np.sqrt(n) * norm.ppf(1 - 0.05 / (2 * p))
where n is the sample size and p is the number of predictors.
The square root lasso uses the following keyword arguments:
- zero_tolfloat
Coefficients below this threshold are treated as zero.
The cvxopt module is required to estimate model using the square root lasso.
References
- *
Friedman, Hastie, Tibshirani (2008). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33(1), 1-22 Feb 2010.
- †
A Belloni, V Chernozhukov, L Wang (2011). Square-root Lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98(4), 791-806. https://arxiv.org/pdf/1009.5689.pdf
- hessian(params, scale=None)
Evaluate the Hessian function at a given point.
- Parameters
params (array_like) – The parameter vector at which the Hessian is computed.
scale (float or None) – If None, return the profile (concentrated) log likelihood (profiled over the scale parameter), else return the log-likelihood using the given scale value.
- Returns
The Hessian matrix.
- Return type
ndarray
- hessian_factor(params, scale=None, observed=True)
Calculate the weights for the Hessian.
- Parameters
params (ndarray) – The parameter at which Hessian is evaluated.
scale (None or float) – If scale is None, then the default scale will be calculated. Default scale is defined by self.scaletype and set in fit. If scale is not None, then it is used as a fixed scale.
observed (bool) – If True, then the observed Hessian is returned. If false then the expected information matrix is returned.
- Returns
A 1d weight vector used in the calculation of the Hessian. The hessian is obtained by (exog.T * hessian_factor).dot(exog).
- Return type
ndarray
- loglike(params, scale=None)
The likelihood function for the OLS model.
- Parameters
params (array_like) – The coefficients with which to estimate the log-likelihood.
scale (float or None) – If None, return the profile (concentrated) log likelihood (profiled over the scale parameter), else return the log-likelihood using the given scale value.
- Returns
The likelihood function evaluated at params.
- Return type
float
- score(params, scale=None)
Evaluate the score function at a given point.
The score corresponds to the profile (concentrated) log-likelihood in which the scale parameter has been profiled out.
- Parameters
params (array_like) – The parameter vector at which the score function is computed.
scale (float or None) – If None, return the profile (concentrated) log likelihood (profiled over the scale parameter), else return the log-likelihood using the given scale value.
- Returns
The score vector.
- Return type
ndarray
- trtools.associaTR.main(args)
- trtools.associaTR.perform_gwas(outfname, tr_vcf, phenotype_name, traits_fnames, vcftype, same_samples, sample_fname, region, non_major_cutoff, beagle_dosages, plotting_phenotype_fname, paired_genotype_plot, plot_phenotype_residuals, plotting_ci_alphas, imputed_ukb_strs_paper_period_check)
- trtools.associaTR.perform_gwas_helper(outfile, all_samples, get_genotype_iter, phenotype_name, trait_fnames, same_samples, sample_fname, beagle_dosages, plotting_phenotype_fname, paired_genotype_plot, plot_phenotype_residuals, plotting_ci_alphas)
- trtools.associaTR.run()