← Back to context

Comment by gjf

4 hours ago

Author here; I think I understand where you might be coming from. I find functional nature of R combined with pipes incredibly powerful and elegant to work with.

OTOH in a pipeline, you're mutating/summarising/joining a data frame, and it's really difficult to look at it and keep track of what state the data is in. I try my best to write in a way that you understand the state of the data (hence the tables I spread throughout the post), but I do acknowledge it can be inscrutable.

A "pipe" is simply a composition of functions. Tidyverse adds a different syntax for doing function composition, using the pipe operator, which I don't particularly like. My general objection to Tidyverse is that it tries to reinvent everything but the end result is a language that is less practical and less transparent than standard R.

  • Can you rewrite some of those snippets in standard R w/o Tidyverse? Curious what it would look like

    • I didn't rewrite the whole thing. But here's the first part. It uses the lattice package for graphics which is standard in R.

          population_data <- data.frame(
              uniform = runif(10000, min = -20, max = 20),
              normal = rnorm(10000, mean = 0, sd = 4),
              binomial = rbinom(10000, size = 1, prob = .5),
              beta = rbeta(10000, shape1 = .9, shape2 = .5),
              exponential = rexp(10000, .4),
              chisquare = rchisq(10000, df = 2)
          )
          
          histogram(~ values|ind, stack(population_data),
                    layout = c(6, 1),
                    scales = list(x = list(relation="free")),
                    breaks = NULL)
          
          take_random_sample_mean <- function(data, sample_size) {
              x <- sample(data, sample_size)
              c(mean = mean(x), sd = sqrt(var(x)))
          }
          
          sample_statistics <- replicate(20000, sapply(population_data, take_random_sample_mean, 60))
          
          sample_mean <- as.data.frame(t(sample_statistics[1, , ]))
          sample_sd <- as.data.frame(t(sample_statistics[2, , ]))
          
          histogram(~values|ind, stack(sample_mean["uniform"]))
          histogram(~values|ind, stack(sample_mean["binomial"]))
          
          histogram(~values|ind, stack(sample_mean), layout = c(6, 1),
                    scales = list(x = list(relation="free")),
                    breaks = NULL)

    • I mean, for the main simulation I would do it like this:

          set.seed(10)
          n <- 10000; samp_size <- 60
          df <- data.frame(
              uniform = runif(n, min = -20, max = 20),
              normal = rnorm(n, mean = 0, sd = 4),
              binomial = rbinom(n, size = 1, prob = .5),
              beta = rbeta(n, shape1 = .9, shape2 = .5),
              exponential = rexp(n, .4),
              chisquare = rchisq(n, df = 2)
          )
          
          sf <- function(df,samp_size){
              sdf <- df[sample.int(nrow(df),samp_size),]
              colMeans(sdf)
          }
          
          sim <- t(replicate(20000,sf(df,samp_size)))
      

      I am old, so I do not like tidyverse either -- I can concede it is of personal preference though. (Personally do not agree with the lattice vs ggplot comment for example.)