Tidyverse summary

12/15/2023

The goal is to provide an intermediate data structure that you can convert into a regular R data frame.įor the time being, I recommend you convert the variables with value labels to factors with as_factor(df) (can be run on the entire data frame) to convert the haven labelled data to factors. The goal of haven is not to provide a labelled vector that you can use everywhere in your analysis. This is from a tidyverse blogpost about the haven labelled class of variables. Rather, it was created as an in-between when importing data from other languages where the data types don't have a one-to-one relationship with R.

a data frame with value labels), it was never meant to be a class that was used in analysis or data exploration. In the case of the haven_labelled data set (i.e. I need to update the documentation to be more clear: " Variable label attributes from the data set are automatically printed." this does not, in fact, apply the value labels. When there are multiple functions, they create new # variables instead of modifying the variables in place: by_species %>% summarise_all ( list ( min, max ) ) #> # A tibble: 3 × 9 #> Species Sepal.Length_fn1 Sepal.Width_fn1 Petal.Length_fn1 #> #> 1 setosa 4.3 2.3 1 #> 2 versicolor 4.9 2 3 #> 3 virginica 4.9 2.2 4.5 #> # ℹ 5 more variables: Petal.Width_fn1, Sepal.Length_fn2, #> # Sepal.Width_fn2, Petal.Length_fn2, Petal.Width_fn2 # -> by_species %>% summarise ( across ( everything ( ), list (min = min, max = max ) ) ) #> # A tibble: 3 × 9 #> Species Sepal.Length_min Sepal.Length_max Sepal.Width_min #> #> 1 setosa 4.3 5.8 2.3 #> 2 versicolor 4.9 7 2 #> 3 virginica 4.9 7.9 2.2 #> # ℹ 5 more variables: Sepal.Width_max, Petal.Length_min, #> # Petal.Length_max, Petal.Width_min, Petal.Thank you for the thoughtful post. 97.3 87.6 by_species % group_by ( Species ) # If you want to apply multiple transformations, pass a list of # functions. x, na.rm = TRUE ) ) ) #> # A tibble: 1 × 3 #> height mass birth_year #> #> 1 174. 97.3 87.6 starwars %>% summarise ( across ( where ( is.numeric ), ~ mean (. Here we apply mean() to the numeric columns: starwars %>% summarise_if ( is.numeric, mean, na.rm = TRUE ) #> # A tibble: 1 × 3 #> height mass birth_year #> #> 1 174. 97.3 # The _if() variants apply a predicate function (a function that # returns TRUE or FALSE) to determine the relevant subset of # columns. The filter removes rows that have multiple fields with the word Total. In janitor the totals normally come after the group that is totaled but we can use the name '0' in place of Total and sort so that the totals sort first and then at the end replace '0' with the word Total. x, na.rm = TRUE ) ) ) #> # A tibble: 1 × 2 #> height mass #> #> 1 174. 1) We can use adorntotals from the janitor package. 97.3 # -> starwars %>% summarise ( across ( height : mass, ~ mean (. 97.3 # You can also supply selection helpers to _at() functions but you have # to quote them with vars(): starwars %>% summarise_at ( vars ( height : mass ), mean, na.rm = TRUE ) #> # A tibble: 1 × 2 #> height mass #> #> 1 174. 97.3 # -> starwars %>% summarise ( across ( c ( "height", "mass" ), ~ mean (. # The _at() variants directly support strings: starwars %>% summarise_at ( c ( "height", "mass" ), mean, na.rm = TRUE ) #> # A tibble: 1 × 2 #> height mass #> #> 1 174. Name collisions in the new columns are disambiguated using a unique suffix. vars is named, a new column by that name will be created. Similarly, vars() accepts named and unnamed arguments. If a function is unnamed and the name cannot be derived automatically, funs argument can be a named or unnamed list.

The names of the functions are used to name the new columns Ĭoncatenating the names of the input variables and the names of theįunctions, separated with an underscore "_". vars is of the form vars(a_single_column)) and. The names of the input variables are used to name the new columns įor _at functions, if there is only one unnamed variable (i.e., If there is only one unnamed function (i.e. Input variables and the names of the functions. The names of the new columns are derived from the names of the

0 Comments

Tidyverse summary

Leave a Reply.

Author

Archives

Categories