Anatomy of a Diagnostic Test – an R Shiny Example by Saunak Sen

|

The positive and negative predictive values of a dignostic test depend not only on its sensitivity and specificity, but also the prevalence of the disease (or the pre-test probability of disease).  This interactive display illustrates that inter-dependence.

Move the sensitivity, specificity, and disease prevalence sliders, and watch the positive and negative predictive value sliders change in response.  The mosaic plot on the right shows the positive tests (T+, in maroon) and negative tests (T-, in orange) in a population.

Diseased individuals (D+) are on the right column and non-diseased (D-) on the left column.

Equivalent of SAS Macros in R – Loops and Functions by Courtney Gale

|

If you’re an avid SAS user, you’re likely very familiar with SAS macros. SAS macros are a key component to creating efficient and concise code. Although you cannot use macros in R, R offers other features like functions and loops that can perform the same tasks as SAS macros.

Using apply() to loop over variables

In SAS, if we wanted to run multiple linear regressions using different predictor variables, we could use a simple SAS macro to iterate over the independent variables. In R, we can simplify this even more by making use of the apply() function. The apply() function comes from the R base package and is one of many members of the apply() family. The family (which also contains lapply(), sapply(), mapply(), etc) differ in the data structures of the inputs and outputs.

apply(X, Margin, Fun,…) takes three main arguments.

  • X is an array or matrix.
  • Margin indicates if the function should be applied over rows (Margin = 1) or columns (Margin = 2)
  • Fun indicates what function should be applied. Any R function can be used even those created by the user.

In this example, we will use the R dataset mtcars (first 6 rows shown below).

data(mtcars)
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Back to the question presented earlier, how can we iterate over variables to run multiple regressions with different predictor variables?

The following apply function takes the dataset mtcars and subsets the variables cyl (number of cylinders), disp (displacement), and wt (weight) as the variables we want to apply the function to. We specify the margin as 2 so it iterates over the 3 columns. Finally, we specify a user defined function that takes the independent variable as a parameter and outputs the summary statistics of a linear model where mpg (miles per gallon) is the outcome variable.

apply(mtcars[, c("cyl", "disp", "wt")], 2, 
      function(ind) {summary(lm(mpg ~ ind, data = mtcars))})
## $cyl
## 
## Call:
## lm(formula = mpg ~ ind, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.9814 -2.1185  0.2217  1.0717  7.5186 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.8846     2.0738   18.27  < 2e-16 ***
## ind          -2.8758     0.3224   -8.92 6.11e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.206 on 30 degrees of freedom
## Multiple R-squared:  0.7262, Adjusted R-squared:  0.7171 
## F-statistic: 79.56 on 1 and 30 DF,  p-value: 6.113e-10
## 
## 
## $disp
## 
## Call:
## lm(formula = mpg ~ ind, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.8922 -2.2022 -0.9631  1.6272  7.2305 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 29.599855   1.229720  24.070  < 2e-16 ***
## ind         -0.041215   0.004712  -8.747 9.38e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.251 on 30 degrees of freedom
## Multiple R-squared:  0.7183, Adjusted R-squared:  0.709 
## F-statistic: 76.51 on 1 and 30 DF,  p-value: 9.38e-10
## 
## 
## $wt
## 
## Call:
## lm(formula = mpg ~ ind, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## ind          -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Using a for loop to iterate over variable names

Let’s say we want to examine product sales of 3 products sold over 100 days. Our goal is to have sold 45 units of each product. We can use a for loop to create new dummy variables that indicate if we sold 45 or more units that day.

First we create a sample data set from randomly generated integers

set.seed(22)
product1 <- sample(30:50, 100, replace = TRUE)
product2 <- sample(40:60, 100, replace = TRUE)
product3 <- sample(35:48, 100, replace = TRUE)

sales <- as.data.frame(cbind(product1, product2, product3))
head(sales)
##   product1 product2 product3
## 1       36       43       35
## 2       39       43       41
## 3       50       54       37
## 4       40       52       48
## 5       47       41       36
## 6       45       46       44

Then we can use a for loop to iterate over the variable names in the dataset. The paste function allows us to create a new variable name containing the old variable name and the condition. We then can assign a 0 or 1 to the new variable depending on if the sales goal of 45 or more was met.

for (p in names(sales)) {
  sales[[paste(p, ">45", sep = "")]] <- as.numeric(sales[[p]] >= 45)
}

print(head(sales))
##   product1 product2 product3 product1>45 product2>45 product3>45
## 1       36       43       35           0           0           0
## 2       39       43       41           0           0           0
## 3       50       54       37           1           1           0
## 4       40       52       48           0           1           1
## 5       47       41       36           1           0           0
## 6       45       46       44           1           1           0

Problems with using for loops in R

In general, it is more efficient to use one of the apply() functions when possible instead of using a for loop. For loops in R are generally slower for large data sets, especially if you are consistently adding new values to a dataframe using functions like cbind. It is better to preallocate a new matrix or dataframe for the loop to fill. By preallocating space, you are preventing R from having to copy and expand the vector for every iteration.

Ifelse Functions

One way of getting around using a for loop in our previous example is by using ifelse functions. The benefit of using the ifelse function is that it is vectorized meaning the condition is applied to a whole vector at once compared to only one value at a time.

The ifelse function will read in a vector, check a condition, and then assign one value if the condition is true and a different value if false.

sales$product1Met <- ifelse(sales$product1 >= 45, 1, 0)
sales$product2Met <- ifelse(sales$product2 >= 45, 1, 0)
sales$product3Met <- ifelse(sales$product3 >= 45, 1, 0)

head(sales)
##   product1 product2 product3 product1Met product2Met product3Met
## 1       36       43       35           0           0           0
## 2       39       43       41           0           0           0
## 3       50       54       37           1           1           0
## 4       40       52       48           0           1           1
## 5       47       41       36           1           0           0
## 6       45       46       44           1           1           0

While this can get repetitive if you are creating many new variables, in many cases, the ifelse function may be a sufficient option.

A SAS Macro/R Code Comparison: Generating Multiple Datasets by Tristan Hayes

|

This is a short comparison of SAS and R code in the context of generating multiple datasets. In this example, the mpg dataset from R is used to show how one might use a SAS Macro to subset the data by car class, followed by the R equivalent. You will see that in SAS, the macro is a little complicated; however, in R, the macro only takes a few lines to accomplish the same task.

First, the SAS Code.

*Get the Unique list of car classes;
proc sql;
 create table class as select distinct class
 from mpg;
 quit;

%macro Subset_Data();
*Open the list of car classes, and go from the first item to the last in the list;
%do i= 1 %to &SYSNOBS.;
 *&SYSNOBS takes on the value of the length last dataset processed by SAS;
 data null;
  set class;
  if n=&i.;
  call symput(‘name’, class);
  *symput saves the value of class as the macro variable &name.;
  run;

*Subset the data;
 data dataset_&name.;
  set mpg;
  where class=“&name.”;
  run;
%end;

%mend Subset_Data;
%Subset_Data;

Now the R Code

Here the R equivalent to the do loop is a for loop. The proc sql code used to generate a unique list of car classes is replaced by simply unique(mpg$class).

library(ggplot2)
data(mpg)
for (var in unique(mpg$class)) {
  assign(paste("dataset",var,sep="_"), mpg[which(mpg$class == var), ])
}

 

With the SAS code, the naming of the datasets was handled by loading a dataset and passing the value for a specific observation to the macro. In R, the assign function handles naming the dataset generated by the code mpg[which(mpg$class == var), ].

UTHSC and Memphis VA Key Participants in SPRINT MIND Study Showing Link Between Low Blood Pressure and Reduced Risk of Mild Cognitive Impairment

|
Leaders of the Memphis sites for the national SPRINT MIND clinical trial are, from left, Karen C. Johnson, MD, MPH; William Cushman, MD; Barry Wall, MD; Catherine Womack, MD; Linda Nichols, PhD; and Jennifer Martindale-Adams, EdD. (Photo by Connor Bran/UTHSC)

 

Researchers at the University of Tennessee Health Science Center (UTHSC) and the Memphis Veterans Affairs (VA) Medical Center were part of the SPRINT MIND (Systolic Blood Pressure Intervention Trial Memory and Cognition IN Decreased Hypertension) multisite clinical trial, which released study findings today showing that intensive lowering of blood pressure reduced the risk of mild cognitive impairment (MCI), a known risk factor for dementia.

Mild cognitive impairment causes difficulty with cognition, thinking, remembering, and reasoning that is greater than expected with normal aging. Dementia is a more severe form of loss in cognitive functions that reduces a person’s ability to perform everyday tasks. Hypertension, or high blood pressure, is very common in adults age 50 and older and is a leading risk factor for heart disease, stroke, and kidney failure. A growing body of research suggests that hypertension has been identified as a potentially modifiable risk factor for MCI and dementia.

Launched in 2010 by the National Heart, Lung, and Blood Institute (NHLBI) of the National Institutes of Health (NIH), SPRINT enrolled more than 9,300 adults age 50 and older with hypertension who were at a high risk for cardiovascular disease. They were recruited from approximately 102 medical centers and clinical practices throughout the United States and Puerto Rico.

Memphis hosts two SPRINT study sites, one at UTHSC and one at the Memphis VA Medical Center. UTHSC’s site, which followed 175 participants, was led by Karen C. Johnson, MD, MPH, principal investigator, College of Medicine Endowed Professor in Women’s Health, and professor of Preventive Medicine at UTHSC, and Catherine Womack, MD, associate professor in the Department of Preventive Medicine and co-chief of the Division of Internal Medicine in the College of Medicine at UTHSC.

The Memphis VA Medical Center, which served as a VA Network hub and clinical site, followed 1,660 participants at 25 VA medical centers within the SPRINT Veterans Affairs Clinical Center Network (CCN). William Cushman, MD, chief of Preventive Medicine at the Memphis VA, and professor of Preventive Medicine, Medicine, and Physiology at UTHSC, serves as the principal investigator for the VA Network. Dr. Barry Wall, also from the Memphis VA, is the co-principal investigator for the SPRINT VA Clinical Center Network (CCN) and principal investigator for the VA Memphis SPRINT clinical site that recruited 80 veterans for SPRINT. Linda Nichols PhD, professor, and Jennifer Martindale-Adams, EdD, associate professor in the Department of Preventive Medicine at UTHSC, were VA CCN consultants and also co-principal investigators for the SPRINT MIND study.

The SPRINT MIND study, an essential component of the umbrella SPRINT study, aimed to address whether aggressive blood pressure control would also reduce the risk of developing dementia and cognitive impairment. The study results show that treating blood pressure to a goal of less than 120 mm Hg does not statistically reduce the risk of dementia, but does significantly reduce the risk of developing MCI. Authors of the SPRINT MIND study conclude that this result may have been due to fewer cases of dementia occurring during the study.

“The SPRINT MIND study has shown for the first time that intensive control of blood pressure in older people significantly reduced the risk of developing mild cognitive impairment, a precursor of early dementia,” said Dr. Johnson, who also served as the vice chair of the National Steering Committee for the SPRINT study. “This is a very important finding, as it may reduce concerns that many clinicians had that lower systolic blood pressure in older persons might be harmful to their brain.”

The Alzheimer’s Association has agreed to fund additional follow-up of SPRINT MIND participants in the hope that sufficient dementia cases will accrue, allowing for a more definitive statement on these study outcomes. Dr. Nichols is encouraged by the association’s commitment to help provide more conclusive study results.
“We are thrilled that the Alzheimer’s Association will be working with us to continue to follow SPRINT MIND participants,” Dr. Nichols said. “We may be able to determine if intensive blood pressure control will reduce dementia in addition to mild cognitive impairment.”

In August 2015, the SPRINT trial was stopped earlier than planned when the beneficial effects of intensive blood pressure management on mortality and cardiovascular disease were discovered. The SPRINT MIND findings provide promise that individuals can take steps to lower their risk of mild cognitive impairment and dementia, and it could be as easy as lowering their blood pressure.

“The fact that cognition and dementia were not worsened and there were even some improvements is very encouraging in light of the impressive improvement in cardiovascular outcomes with intensive blood pressure lowering in SPRINT,” Dr. Cushman said.

“These findings provide hope to anyone who is concerned about developing memory problems,” Dr. Martindale-Adams said.
SPRINT study findings have already had a world-wide impact on how people define hypertension and how doctors treat hypertension. The American College of Cardiology and the American Heart Association published new blood pressure guidelines in 2017 based on SPRINT data. The SPRINT MIND study results were reported in the Jan. 28, 2019 edition of the Journal of the American Medical Association.

This is a reprint of the original article by Sarah Ashley Fenderson.

Two Teams of Multi-State Investigators Chosen as Collaborative Health Disparities Research Award Winners

|

The Delta Clinical and Translational Science Consortium has chosen two research teams as the 2018 Collaborative Research Network (CORNET) Awards in Health Disparities Research winners. The CORNET Awards in Health Disparities Research were created by leaders within the consortium to stimulate innovative, interdisciplinary, team-based health disparities research that involves investigators from the University of Tennessee Health Science Center (UTHSC), Tulane University, and the University of Mississippi Medical Center (UMMC). The purpose of the awards is to give seed funding, up to $75,000 per project, to collaborative research teams working to combat regional health inequities faced by those living in the Delta South.

“Through the work that the Delta Consortium is doing, we know there is an increased interest among our researchers to find solutions to complex health issues faced by people in our region,” said Steven R. Goodman, PhD, vice chancellor for Research at UTHSC. “The high response to the CORNET Awards opportunity and the potential impact the two winning projects will have for our community members is overwhelming. I congratulate all the selected collaborative team members.”

Continue reading…