OR in an OB World: July 2015

Tuesday, July 28, 2015

ORiginals - Videos About Research

ORiginals is a YouTube channel co-hosted by Dr. Banafsheh Behzad (@banafsheh_b) of CSU Long Beach and my colleague Dr. David Morrison (@drmorr0). They present short (five or six minute) videos featuring researchers describing their research to a general (non-expert) audience. Their tag line is "Outstanding research in everyday language", and I think the first two installments have lived up to that mantra.

The first two videos, by Dr. Behzad and the net-biquitous Dr. Laura McLay (@lauramclay) of the University of Wisconsin, fall into the category of operations research. The aim of the channel, however, is more general. Quoting Dr. Behzad:

The goal of ORiginals is to promote science and engineering topics among the general public, using everyday language. We are featuring a diverse selection of scientists doing cutting-edge research. This is the first season of ORiginals and even though we aren't specifically OR/MS-focused, we'll have a slight bias in that direction with our guest selection, as David and I are both OR people.

If you're interested in seeing quality research explained in lay terms, I highly recommend subscribing to the channel. If you're doing scientific/engineering research that has measurable impact (or the potential for measurable impact) in the real world (sorry, boson-chasers), and you'd like to spread the gospel, I suggest you contact one of the co-hosts. (They're millennials, so a DM on Twitter is probably more effective than an email message.

)

Saturday, July 25, 2015

Shiny Hack: Vertical Scrollbar

I bumped into a scrolling issue while writing a web-based application in Shiny, using the shinydashboard package. Actually, there were two separate problems.

The browser apparently cannot discern page height. In Firefox and Chrome, this resulted in vertical scrollbars that could scroll well beyond the bottom of a page. That's mildly odd, but not a problem as far as I'm concerned. In Internet Exploder, however, the page height was underestimated, and as a result in some cases it was not possible to reach the bottom of the page (at least not with the vertical scrollbar).
In Internet Exploder only, the viewport scrollbar, on the right side of the window, behaves intermittently. If I click on the "elevator car" (handle) while it is at the top of the bar, it jumps to the bottom of the track, and the spot where I clicked gains a duplicate copy of the up arrow icon that appears just above the handle. If the handle is at the bottom of the bar, it behaves symmetrically. The down arrow icon on the vertical scrollbar lets you scroll downward, but not fully to the bottom of the page.

I have only seen the second problem on one machine, so I don't know if it is specific to a particular version of IE, but the first problem was reported by two different users (and I saw it myself).

As a kludge to get around the first problem, which in my app is triggered by extensive help text (and some input controls) in the sidebar that makes the sidebar taller than the main body, I decided to introduce a separate vertical scrollbar in the sidebar. That turned out to be tricky, or at least I could not find an easy, documented method. I thought I would share the code that ultimately did the job for me. I goes in the ui.R file.

  dashboardSidebar(
    tags$head(
      tags$style(HTML("
                      .sidebar { height: 90vh; overflow-y: auto; }
                      " )
      )
    ),
    ...

Created by Pretty R at inside-R.org

The height: 90vh style attribute sets the height of the sidebar at 90% of the viewport height, so that it adjusts automatically if the user maximizes or resizes the window, opens or closes a tool bar, etc. You need to pick a percentage that works for your particular application. Make it too large and the inability to scroll to the bottom of the sidebar will persist. Make it too small and the sidebar will be noticeably shorter than the main body, leaving a gap at the bottom of the sidebar (and introducing a vertical scrollbar when the entire sidebar is already visible).

Three last notes on the scrolling issue:

In my application, the scrolling problem only appeared on pages where the sidebar was taller than the main body (as far as I know).
Although the vertical scrollbar in IE is balky, scrolling via the mouse wheel (if you have one) or the arrow keys seems to work fine.
This is as yet untested on Safari.

Thursday, July 23, 2015

Autocorrupt in R

You know that "autocomplete" feature on your smart phone or tablet that occasionally (or, in my case, frequently) turns into an "autocorrupt" feature? I just ran into it in an R script.

I wrote a web-based application for a colleague that lets students upload data, run a regression, ponder various outputs and, if they wish, export (download) selected results. In the server script, I created an empty list named "export". As users generated various outputs, they would be added to the list for possible download (to avoid having to regenerate them at download time). For instance, if the user generated a histogram of the residuals, then the plot would be stored in export$hist. Similarly, if the user looked at the adjusted R-squared, it would be parked in export$adjr2.

All was well until, in beta testing, I bumped into a bug involving the p-value for the F test of overall fit (you know, the test where failure to reject the null hypothesis would signal that your model contended for the worst regression model in the history of statistics). Rather than getting a single number between 0 and 1, in one test it printed out as a vector of numbers well outside that range. Huh???

I beat my head against an assortment of flat surfaces before I found the bug. The following chunk of demonstration code sums it up.

export <- list()               # create an empty export list
print(export$f)                # result: NULL
export$fitted <- c(2, 3, 1, 7) # (simulated) fitted values
print(export$f)                # result: [1] 2 3 1 7

Created by Pretty R at inside-R.org

The intent was to store the p-value of the test of overall fit in export$f, and the fitted values in export$fitted. If the user never checked the F test, I wanted export$f to be null, which would signal the export subroutine to skip it. Instead, the export subroutine autocompleted export$f (which did not exist) to export$fitted (which did exist) and spat out the mystery vector. There are multiple ways to avoid the bug, the simplest being to rename export$f to something like export$fprob, where "fprob" is not a substring of the name of any other entry of export.

I do my R coding inside RStudio, which provides autocompletion suggestions. Somewhere along the line, I think I came across the fact that the R interpreter autocompletes some things. It never occurred to me that this would happen when a script ran. When running commands interactively, I suppose the autocomplete feature saves some keystrokes. That's not generally an issue when running scripts, so I don't know why autocomplete is not turned off when "sourcing" a script.

At any rate, letting the betting commence on how long it will take me to forget this (and trip over it again).

Thursday, July 2, 2015

Tabulating Prediction Intervals in R

I just wrapped up (knock on wood!) a coding project using R and Shiny. (Shiny, while way cool, is incidental to this post.) It was a favor for a friend, something she intends to use teaching an online course. Two of the tasks, while fairly mundane, generated code that was just barely obscure enough to be possibly worth sharing. It's straight R code, so you need not use (or have installed) Shiny to use it.

The first task was to output, in tabular form, the coefficients of a linear regression model, along with their respective confidence intervals. The second task was to output, again in tabular form, the fitted values, confidence intervals and prediction intervals for the same model. Here is the function I wrote to do the first task (with Roxygen comments):

#'
#' Summarize a fitted linear model, displaying both coefficient significance
#' and confidence intervals.
#'
#' @param model an instance of class lm
#' @param level the confidence level (default 0.95)
#'
#' @return a matrix combining the coefficient summary and confidence intervals
#'
model.ctable <- function(model, level = 0.95) {
  cbind(summary(model)$coefficients, confint(model, level = level))
}

To demonstrate its operation, I'll generate a small sample with random data and run a linear regression on it.

x <- rnorm(20)
y <- rnorm(20)
z <- 6 + 3 * x - 5 * y + rnorm(20)
m <- lm(z ~ x + y)

I'll generate the coefficient table using confidence level 0.9, rather than the default 0.95, for the coefficients.

model.ctable(m, level = 0.9)

The output is as follows:

             Estimate Std. Error   t value     Pr(>|t|)       5 %      95 %
(Intercept)  6.039951  0.2285568  26.42648 3.022477e-15  5.642352  6.437550
x            3.615331  0.2532292  14.27691 6.763279e-11  3.174812  4.055850
y           -5.442428  0.3072587 -17.71285 2.156161e-12 -5.976937 -4.907918

The code for the second table (fits, confidence intervals and prediction intervals) is a bit longer:

#'
#' Compute a table of fitted values, confidence intervals and
#' prediction intervals from a regression model.
#'
#' @param model a fitted regression model
#' @param level the desired confidence level (default 0.95)
#' @param names the names to assign to the columns (after
#' resequencing if necessry)
#' @param order the order in which to list the columns
#' (1 = fitted, 2 = lower c.i. limit, 3 = upper c.i. limit,
#' 4 = lower p.i. limit, 5 = upper p.i. limit)
#'
#' @return a matrix with one row per observation and five
#' columns (fitted value, lower/upper c.i. bounds, lower/upper
#' p.i. bounds) in the order specified by the user
#'
intervals <- function(model,
                      level = 0.95,
                      names = c("Fitted", "CI Low", "CI High",
                                "PI Low", "PI High"),
                      order = c(4, 2, 1, 3, 5)) {
  # generate fits and confidence intervals
  temp <- predict(model,
                  interval = "confidence",
                  level = level)
  # generate fits and prediciton intervals (suppressing
  # the warning about predicting past values)
  temp2 <- suppressWarnings(
    predict(model,
            interval = "prediction",
            level = level)
  )
  # drop the redundant fit column
  temp2 <- temp2[,2:3]
  # merge the tables and reorder the columns
  temp <- cbind(temp, temp2)[, order]
  # rename the columns
  colnames(temp) <- names[order]
  temp
}

Here is the call with default arguments (using head() to limit the amount of output):

head(intervals(m))

The output is this:

      PI Low     CI Low     Fitted   CI High   PI High
1 -0.7928115 0.65769280  1.5196870  2.381681  3.832185
2  7.9056270 9.40123642 10.1928094 10.984382 12.479992
3  4.9125024 6.61897662  7.1149000  7.610823  9.317298
4  7.3386447 8.66123993  9.7406923 10.820145 12.142740
5 -1.4295587 0.05464529  0.8637503  1.672855  3.157059
6  4.1962493 5.84893725  6.4156619  6.982387  8.635074

Finally, I'll run it again, changing the confidence level to 0.9, tweaking the column headings a bit, and reordering them:

head(intervals(m,
               level = 0.9,
               names = c("Fit", "CI_l", "CI_u", "PI_l", "PI_u"),
               order = 1:5
               )
     )

The output is:

         Fit      CI_l      CI_u       PI_l      PI_u
1  1.5196870 0.8089467  2.230427 -0.3870379  3.426412
2 10.1928094 9.5401335 10.845485  8.3069584 12.078660
3  7.1149000 6.7059962  7.523804  5.2989566  8.930843
4  9.7406923 8.8506512 10.630733  7.7601314 11.721253
5  0.8637503 0.1966188  1.530882 -1.0271523  2.754653
6  6.4156619 5.9483803  6.882943  4.5856891  8.245635

By the way, all syntax highlighting was Created by Pretty R at inside-R.org.