P-values: What they are and what they are not, will look in detail at good examples of using p-values and how to interpret them. After reviewing widely understood problems with p-values, attention is drawn to regularly encountered use of p-values where it is less clear what their correct interpretation actually is. Furthermore, we demonstrate why p-values are not meaningful measures of support for specific hypotheses. Guidance on good statistical data analyses is based on a statement on p-values by the American Statistical Association. A hierarchy of scientific evidence compiled by The Oxford Center for Evidence-based Medicine is reviewed to re-emphasize the role of thoughtful statistical analyses in scientific and medical discovery.

The positive and negative predictive values of a dignostic test depend not only on its sensitivity and specificity, but also the prevalence of the disease (or the pre-test probability of disease). This interactive display illustrates that inter-dependence.

Move the sensitivity, specificity, and disease prevalence sliders, and watch the positive and negative predictive value sliders change in response. The mosaic plot on the right shows the positive tests (T+, in maroon) and negative tests (T-, in orange) in a population.

Diseased individuals (D+) are on the right column and non-diseased (D-) on the left column.

If you’re an avid SAS user, you’re likely very familiar with SAS macros. SAS macros are a key component to creating efficient and concise code. Although you cannot use macros in R, R offers other features like functions and loops that can perform the same tasks as SAS macros.

Using apply() to loop over variables

In SAS, if we wanted to run multiple linear regressions using different predictor variables, we could use a simple SAS macro to iterate over the independent variables. In R, we can simplify this even more by making use of the apply() function. The apply() function comes from the R base package and is one of many members of the apply() family. The family (which also contains lapply(), sapply(), mapply(), etc) differ in the data structures of the inputs and outputs.

apply(X, Margin, Fun,…) takes three main arguments.

X is an array or matrix.

Margin indicates if the function should be applied over rows (Margin = 1) or columns (Margin = 2)

Fun indicates what function should be applied. Any R function can be used even those created by the user.

In this example, we will use the R dataset mtcars (first 6 rows shown below).

Back to the question presented earlier, how can we iterate over variables to run multiple regressions with different predictor variables?

The following apply function takes the dataset mtcars and subsets the variables cyl (number of cylinders), disp (displacement), and wt (weight) as the variables we want to apply the function to. We specify the margin as 2 so it iterates over the 3 columns. Finally, we specify a user defined function that takes the independent variable as a parameter and outputs the summary statistics of a linear model where mpg (miles per gallon) is the outcome variable.

## $cyl
##
## Call:
## lm(formula = mpg ~ ind, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.9814 -2.1185 0.2217 1.0717 7.5186
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.8846 2.0738 18.27 < 2e-16 ***
## ind -2.8758 0.3224 -8.92 6.11e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.206 on 30 degrees of freedom
## Multiple R-squared: 0.7262, Adjusted R-squared: 0.7171
## F-statistic: 79.56 on 1 and 30 DF, p-value: 6.113e-10
##
##
## $disp
##
## Call:
## lm(formula = mpg ~ ind, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.8922 -2.2022 -0.9631 1.6272 7.2305
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.599855 1.229720 24.070 < 2e-16 ***
## ind -0.041215 0.004712 -8.747 9.38e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.251 on 30 degrees of freedom
## Multiple R-squared: 0.7183, Adjusted R-squared: 0.709
## F-statistic: 76.51 on 1 and 30 DF, p-value: 9.38e-10
##
##
## $wt
##
## Call:
## lm(formula = mpg ~ ind, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5432 -2.3647 -0.1252 1.4096 6.8727
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
## ind -5.3445 0.5591 -9.559 1.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10

Using a for loop to iterate over variable names

Let’s say we want to examine product sales of 3 products sold over 100 days. Our goal is to have sold 45 units of each product. We can use a for loop to create new dummy variables that indicate if we sold 45 or more units that day.

First we create a sample data set from randomly generated integers

Then we can use a for loop to iterate over the variable names in the dataset. The paste function allows us to create a new variable name containing the old variable name and the condition. We then can assign a 0 or 1 to the new variable depending on if the sales goal of 45 or more was met.

for (p in names(sales)) {
sales[[paste(p, ">45", sep = "")]] <- as.numeric(sales[[p]] >= 45)
}
print(head(sales))

In general, it is more efficient to use one of the apply() functions when possible instead of using a for loop. For loops in R are generally slower for large data sets, especially if you are consistently adding new values to a dataframe using functions like cbind. It is better to preallocate a new matrix or dataframe for the loop to fill. By preallocating space, you are preventing R from having to copy and expand the vector for every iteration.

Ifelse Functions

One way of getting around using a for loop in our previous example is by using ifelse functions. The benefit of using the ifelse function is that it is vectorized meaning the condition is applied to a whole vector at once compared to only one value at a time.

The ifelse function will read in a vector, check a condition, and then assign one value if the condition is true and a different value if false.

This is a short comparison of SAS and R code in the context of generating multiple datasets. In this example, the mpg dataset from R is used to show how one might use a SAS Macro to subset the data by car class, followed by the R equivalent. You will see that in SAS, the macro is a little complicated; however, in R, the macro only takes a few lines to accomplish the same task.

First, the SAS Code.

*Get the Unique list of car classes;
proc sql;
create table class as select distinct class
from mpg;
quit;
%macro Subset_Data();
*Open the list of car classes, and go from the first item to the last in the list;%do i= 1 %to&SYSNOBS.;
*&SYSNOBS takes on the value of the length last dataset processed by SAS;
data null;
set class;
if n=&i.;
call symput(‘name’, class);
*symput saves the value of class as the macro variable &name.;
run;
*Subset the data;
data dataset_&name.;
set mpg;
where class=“&name.”;
run;
%end;
%mend Subset_Data;
%Subset_Data;

Now the R Code

Here the R equivalent to the do loop is a for loop. The proc sql code used to generate a unique list of car classes is replaced by simply unique(mpg$class).

library(ggplot2)
data(mpg)
for (var in unique(mpg$class)) {
assign(paste("dataset",var,sep="_"), mpg[which(mpg$class == var), ])
}

With the SAS code, the naming of the datasets was handled by loading a dataset and passing the value for a specific observation to the macro. In R, the assign function handles naming the dataset generated by the code mpg[which(mpg$class == var), ].

Mild cognitive impairment causes difficulty with cognition, thinking, remembering, and reasoning that is greater than expected with normal aging. Dementia is a more severe form of loss in cognitive functions that reduces a person’s ability to perform everyday tasks. Hypertension, or high blood pressure, is very common in adults age 50 and older and is a leading risk factor for heart disease, stroke, and kidney failure. A growing body of research suggests that hypertension has been identified as a potentially modifiable risk factor for MCI and dementia.

Launched in 2010 by the National Heart, Lung, and Blood Institute (NHLBI) of the National Institutes of Health (NIH), SPRINT enrolled more than 9,300 adults age 50 and older with hypertension who were at a high risk for cardiovascular disease. They were recruited from approximately 102 medical centers and clinical practices throughout the United States and Puerto Rico.

Memphis hosts two SPRINT study sites, one at UTHSC and one at the Memphis VA Medical Center. UTHSC’s site, which followed 175 participants, was led by Karen C. Johnson, MD, MPH, principal investigator, College of Medicine Endowed Professor in Women’s Health, and professor of Preventive Medicine at UTHSC, and Catherine Womack, MD, associate professor in the Department of Preventive Medicine and co-chief of the Division of Internal Medicine in the College of Medicine at UTHSC.

The Memphis VA Medical Center, which served as a VA Network hub and clinical site, followed 1,660 participants at 25 VA medical centers within the SPRINT Veterans Affairs Clinical Center Network (CCN). William Cushman, MD, chief of Preventive Medicine at the Memphis VA, and professor of Preventive Medicine, Medicine, and Physiology at UTHSC, serves as the principal investigator for the VA Network. Dr. Barry Wall, also from the Memphis VA, is the co-principal investigator for the SPRINT VA Clinical Center Network (CCN) and principal investigator for the VA Memphis SPRINT clinical site that recruited 80 veterans for SPRINT. Linda Nichols PhD, professor, and Jennifer Martindale-Adams, EdD, associate professor in the Department of Preventive Medicine at UTHSC, were VA CCN consultants and also co-principal investigators for the SPRINT MIND study.

The SPRINT MIND study, an essential component of the umbrella SPRINT study, aimed to address whether aggressive blood pressure control would also reduce the risk of developing dementia and cognitive impairment. The study results show that treating blood pressure to a goal of less than 120 mm Hg does not statistically reduce the risk of dementia, but does significantly reduce the risk of developing MCI. Authors of the SPRINT MIND study conclude that this result may have been due to fewer cases of dementia occurring during the study.

“The SPRINT MIND study has shown for the first time that intensive control of blood pressure in older people significantly reduced the risk of developing mild cognitive impairment, a precursor of early dementia,” said Dr. Johnson, who also served as the vice chair of the National Steering Committee for the SPRINT study. “This is a very important finding, as it may reduce concerns that many clinicians had that lower systolic blood pressure in older persons might be harmful to their brain.”

The Alzheimer’s Association has agreed to fund additional follow-up of SPRINT MIND participants in the hope that sufficient dementia cases will accrue, allowing for a more definitive statement on these study outcomes. Dr. Nichols is encouraged by the association’s commitment to help provide more conclusive study results.
“We are thrilled that the Alzheimer’s Association will be working with us to continue to follow SPRINT MIND participants,” Dr. Nichols said. “We may be able to determine if intensive blood pressure control will reduce dementia in addition to mild cognitive impairment.”

In August 2015, the SPRINT trial was stopped earlier than planned when the beneficial effects of intensive blood pressure management on mortality and cardiovascular disease were discovered. The SPRINT MIND findings provide promise that individuals can take steps to lower their risk of mild cognitive impairment and dementia, and it could be as easy as lowering their blood pressure.

“The fact that cognition and dementia were not worsened and there were even some improvements is very encouraging in light of the impressive improvement in cardiovascular outcomes with intensive blood pressure lowering in SPRINT,” Dr. Cushman said.

“These findings provide hope to anyone who is concerned about developing memory problems,” Dr. Martindale-Adams said.
SPRINT study findings have already had a world-wide impact on how people define hypertension and how doctors treat hypertension. The American College of Cardiology and the American Heart Association published new blood pressure guidelines in 2017 based on SPRINT data. The SPRINT MIND study results were reported in the Jan. 28, 2019 edition of the Journal of the American Medical Association.