Hi! Welcome to my data journalism R cheat sheet for cleaning and wrangling data. You may have seen other R cheat sheet resources of package-based cheat sheets. Journalists have also put out cheat sheets before, like MaryJo Webster’s R cheat sheet. Hat tip to her and her amazing collection of data journalism resources! This is what I use for quick references to data cleaning functions!

I use this cheat sheet as an roadmap to functions I will likely encounter and use, so I am hoping that it can save you some time googling around. Most of the functions listed here are what I collected when cleaning data at the NICAR data library and for the accountability project at the Investigative Reporting Workshop.

What’s this for?

Much as I use data cleaning functions on a regular basis, the syntax or function names sometimes get fuzzy every once in a while. So I have organized some most-frequently-used functions for certain goals in data processing and also pitfalls when using them! I have included useful links to discussions about the functions by other people. Thanks, internet!

A word about programming

I always think the most important thing to know about functions and to tap into their strengths is to understand what you’re dealing with, i.e. data structures. When I can’t write clean code and get it to work, I always take out my pen and pad and jot down what I want to achieve and the end products I wish to have. This basically helps me get a grasp of the objects at hand, and break them down into the smallest unit possible, and how these units can form the end products you want. For example, are you trying to modify a column (rewrite) or add a new column based on an existing column, which are zipped vectors made up of individual elements (That’s probably why the length(df)returns the number of columns)? If you’re modifying a column, you can assign a new vector to df$col. If you’re adding a column, you’ll probably use mutate on the dataframe itself with mutate from the dyplyr package.

There are, of course, more than one way to let’s not skin a cat! For example, picture a jumbled field of last name, first name. If you wish to separate the names into different columns, you can either use the separate() function to split up the column by a certain character, or you can use regular expression to capture whatever goes before the comma with str_match("^(.+),")[,2], and whatever comes after as str_match(",(.+)$")[,2]. Pick and choose as you wish!

This cheat sheet doesn’t cover the nuts and bolts of R, for which I highly recommend Andrew Ba Tran’s amazing tutorial Journalism with R.

Something about libraries

Since other people have spent all this time writing functions and packaged them, we use their functions by calling the package names (which may take developers a lot of time to come up with) first in the R console or scripts. It’s important to declare the packages used at the top of your R Markdown or R scripts file, especially if you need to knit the file (in order to generate a standalone html for the Rmd in a readable format), otherwise it will throw a bunch of errors. This also assumes that you have the packages installed already. If you wish to install new packages, use install.packages("packagename") first or use the packages pane to manage your packages (by default, the lower right-hand side pane). If you are trying to install a package that is not on CRAN yet and couldn’t be installed by using the install.packages("packagename") function, use the remotes package’s remotes::install_github("r-lib/remotes"). The double colons help you run a function from a certain package, use {package_name}::{function_name}. I am sure that you have figured out that remotes is on CRAN, so that it can be installed, with the command install.packages('remotes'). The tidyverse package solves most of the problems related to data cleaning. It includes all these packages:

library(tidyverse)
tidyverse_packages()
#>  [1] "broom"       "cli"         "crayon"      "dplyr"       "dbplyr"     
#>  [6] "forcats"     "ggplot2"     "haven"       "hms"         "httr"       
#> [11] "jsonlite"    "lubridate"   "magrittr"    "modelr"      "purrr"      
#> [16] "readr"       "readxl\n(>=" "reprex"      "rlang"       "rstudioapi" 
#> [21] "rvest"       "stringr"     "tibble"      "tidyr"       "xml2"       
#> [26] "tidyverse"

Something about functions

Functions are just objects. They exist in those packages! So if you want ot know more about functions themselves, call them by their names, without parathenses in the console. For the documentation, type “?” followed by the function name, will give you its documentation laid out in the help window. Try ?str_split. You can also see popups in the console when you type in function name, press F1 (Fn + F1) will also generate the function help view in the Help pane.
For how the function was written to make sense of how it’s executed, simply type the name of the function. Like str_split.
Further Readings: Introduction to the R Language Functions, Berkeley Workshop

Writing fnctions and loops

Here’s some function help. The basic syntax of a function is

f <- function(<arguments>) {
## Do something interesting
}

Trying to write for loops? R for data science has a great chapter on it.

output <- vector("double", ncol(df))  # 1. output
for (i in seq_along(df)) {            # 2. sequence
  output[[i]] <- median(df[[i]])      # 3. body
}

Function execution(order, arguments)

Computer reads scripts in a certain order and it can get confused when you have many arguments. To help organize things, and make your code less confusing for machine and humans to read, you can bracket code chunks you wish to execute first with () or {}.

A common usage: you have a numeric variable x=1, and you wish to do some calculations on it, like x+1, and use the : to populate all the numbers between x+1 and 15. x+1:15 won’t give you the result you want, but return anything between 2 and 16. That’s because R treats x:15 as a vector and applies the +1 to everything in this vector. You can solve this by specifying {x+1}:15, so that R knows to treat everything wrapped between the brackets as a whole.

There are certain operators in R that streamline code executation and saves you some typing. For example, . can represent the object being passed into a function. For example, if you wish to concatenate a vector with other strings, you can use the syntax df$col %>% str_c("before",.,"after") to specify the order of the string placement. Learn more about the magrittr’s . placeholder here.

Data import

Very useful references for understanding data type and data structure.
Andrew’s data import/export tutorial shows you how to import and later export most types of data files. R’s tidyverse cheat sheet also offers a comprehensive view of reading data, and the how you can tap into the functionalities of tibble (a type of data frame) for your tables.

You’re now ready to use functions to solve problems like these:

Better printing & Inspection

function package syntax notes references
Convert tow index to column tibble wy <- tibble::rowid_to_column(wy, “id”)
print truncated elements in a vertical layout dplyr wy %>% filter() %>% glimpse()

Dealing with NAs

function package syntax notes references
filter out NA rows dplyr, stats filter(!is.na(colname)) complete.cases(df$col) This has the effect of filter(!is.na(colname)==TRUE), this function returns a dataframe. complete.cases() directly applies to a number of the vectors and returns a corresponding logical vector after testing if ALL specified columns are complete. When it is applied to a vector, the result equals to !is.na(). To get a dataframe whose entries were removed if the column “col” evaluates to NA , see below. https://statisticsglobe.com/na-omit-r-example/
filter out NA fields in vectors stats unique(na.omit(ct[“column_name”]) na.omit() will create an "NA’ class https://github.com/irworkshop/accountability_datacleaning/blob/campfin/R_campfin/ct/expends/docs/ct_expends_diary.md
drop rows for columns containing NA values tidyr drop_na(col_name) drop_na() is column agnostic so any columns containing NA values will be dropped! The link shows how to drop NAs for a specific column too. df[complete.cases(df$col),] almost achieves the same as df %>% drop_na(col) https://stackoverflow.com/questions/26665319/removing-na-in-dplyr-pipe https://stackoverflow.com/questions/4862178/remove-rows-with-all-or-some-nas-missing-values-in-data-frame
turn values into NA dplyr na_if(df$col, y) OR wv <- wv %>% mutate(city_clean = case_when( city_clean %in% c(“WV”,“WEB BASED”,“A”,“PO BOX”,“ANYWHERE USA”,“VARIES”,“COUNTY”) ~ NA_character_, TRUE ~ as.character(city_clean))) The value to be replaced is not a regex. To replace multiple values with NA, try case_when()

Modify Strings

function package syntax notes references
change case stringr, base str_to_upper()/toupper() str_to_lower()/tolower()
ignore case stringr fixed(‘toyota’,ignore_case=TRUE)
replace matching strings in a dataframe base, stringr RETA2016_negative <- mutate_if(as_tibble(reta2016_clean), is.character, str_replace_all, pattern = “J”, replacement = “-1”) la_lobby <- la_lobby %>% mutate(lobbyist_name_clean = str_remove(LobbyistName1, “MRS.\\s|MR.\\s|MS.\\s|MISS\\s|DR.\\s”)) gsub() is the base R version. When trying to replace or remove with multiple patterns, use regex “|” . Otherwise, the function matches the first element in the pattern against the first element of string vector, etc. Therefore, you could “subtract” the content of one column from another in effect with . df <- df %>% mutate(statezip = str_remove(df$citystatezip, df$city)). str_replace_all will replace all the matches while str_replace will only replace the first match.

https://community.rstudio.com/t/understanding-the-use-of-str-replace-all-with-multiple-patterns/1849

https://stackoverflow.com/questions/29036960/remove-multiple-patterns-from-text-vector-r

https://community.rstudio.com/t/understanding-the-use-of-str-replace-all-with-multiple-patterns/1849
concatenate (Concatenate vectors after converting to character.) base

mutate(ZIP=paste0(“0”, as.character(ZIP))) %>%

mutate(location = paste0(ADDRESS, “,”, CITY, “, CT”, ZIP))
paste0 is different from paste() in that paste() takes a default separator of " " with space and paste0 takes "" with no space. A way to remember this is that paste0 concatenates the 0 without any space. http://learn.r-journalism.com/en/mapping/geolocating/geolocating/
concatenate stringr str_c(“Letter”,letters,sep=“:”) stri_c() concatenate individual strings. If you want to pass in vectors and concatenate the result vector, use the collapse = "" to achieve similar effects. One way to remember it is that collapse = "" executes after a vector is returned and collapse the vector into one single string.
extract the complete match of a string with regex stringr str_extract(text, “\d{5}(?:-\d{4})?”)
extract part of a string stringr str_match(strings, phone) str_match() returns a character matrix. First column is the complete match, followed by one column for each capture group. str_match_all returns a list of character matrices. Use the [,n] index for the nth capture group
View HTML rendering of regular expression match stringr str_view(c(“abc”, “def”, “fgh”), “d|e”, match = FALSE)
add a zero to make it (#) digits base fipsst <- mutate(fipsst, STATE = sprintf(“%03d”, STBRDG)) three digits of code
make strings containing executable R code glue The first method applies to a vector, basically counting the sum of logical values evaluated to “TRUE”, the filter() method summarizes a datatable separate different parts of the string with commas

Index & subsetting

It is very important to note that negative index does not work the same way as the inverse order in Python syntax. In R, the minus sign (-) means to exclude a certain element. As a result, you can use vector[length(vector)] to access the last element of a vector. R for Data Science has a great chapter on vector and list indexing and used a superb analogy of a pepper shaker.
function package syntax notes references
Extract every nth element of a vector base a <- 1:120 b <- a[seq(1, length(a), 6)]
get the position of a column named “B” in a data frame/vector base grep(“B”,colnames(df)) - containing “B” grep(“^B$”,colnames(df)) - called B OR which(colnames(df)==“B”) returns a vector of indices of the character strings that contains the pattern https://thomasleeper.com/Rcourse/Tutorials/vectorindexing.html
Get the index of a string in a vector s match(“CONTRIBUTOR”, pa_col_names) match returns a vector of the positions of (first) matches of its first argument in its second. https://stackoverflow.com/questions/27556353/subset-columns-based-on-list-of-column-names-and-bring-the-column-before-it

Making a dataframe

function package syntax notes references
Take a sequence of vector, matrix or data-frame arguments and combine by columns or rows base stations <- cbind(stations, geo) OR bind_cols()
string together two dataframes/vectors vertically by binding rows dplyr, base bind_rows() OR rbind(a = 1, b = 1:3) two dataframes need to have the same number of columns, or the longer vector should be a multiple of short vector’s length
create a dataframe from vectors base, dplyr

tibble(), data.frame(),

tibble(col1 = c(1,2,3), name = c(“A”,“B”,“C”))
for tibble(), stringAsFactors is set to F! On the left is the column names, and on the right is the vector that will be assigned to the column. Data frames are essentially columns (vectors) bound together.

Columns

function package syntax notes references
select a column whose name is… dplyr df$colname, or df[[“colname”]] OR pull(df, col) pull() can help turn a one-column data frame into a vector. To select multiple, use df$[c(“col1”, “col2”]
select a column whose index is n df[[n]] OR pull(df,n) without double brackets, the index will slice the tibble into another tibble
filtering a dataframe based on certain conditions of columns dplyr my_data%>%select(starts_with(“Petal”)) starts_with() is a syntax that is part of the dplyr package that works with select() or vars() when used with mutate_at/summarize_at(). See batch processing session for usage of mutate_at().
change one column name base MO_offense <- rename(MO_offense, Off.Age = Offender.Age.at.Time.of.Offense)

define column names

Change all column names to lowercases
base

colnames(bridges17) <- names_field_17 (I have the vector needed)

colnames(bridges17) <-tolower(colnames(bridges17))
Clean and column names that contain spaces and such to lower case names separated with underscore janitor read_csv(df) %>% clean_names() make_clean_names() operates on character vectors and can be used during data import. e.g. wi_principals <- list.files(principals, pattern = “.xls”, full.names = TRUE) %>% map(read_xls, skip = 3, .name_repair = make_clean_names) https://github.com/sfirke/janitor
Specify column types when reading readr

col_types = cols(

x = col_double(),

y = col_date(format = ""),

z=col_character()
https://blog.rstudio.com/2015/04/09/readr-0-1-0/
change to date lubridate

parse_usa_date<-function(x,…) {

parse_date(x,format=“%m/%d/%Y”,…)

}

OR

as_date(x, …)
https://github.com/irworkshop/accountability_datacleaning/blob/campfin/R_campfin/ct/expends/docs/ct_expends_diary.md
convert to column types dplyr as.numeric(df$col) as.character(df$col)
convert excel numbers to date janitor excel_numeric_to_date(df$col, date_system = “modern”)

Ordering

function package syntax notes references
sort a vector or factor into a descending or ascending order base sort(x, decreasing = FALSE, …)
Sorting column of a data frame with descending order dplyr arrange(desc(total_spent)) arrange() takes a dataframe as argument
reposition columns by index or column names base data <- data[c(“A”, “B”, “C”)]
Get the first n rows of a dataframe dplyr df %>% top_n(10)
Get the first 10 and bottom 10 elements, quick inspection utils head()/tail()

Reshape data frame

function package syntax notes references
turn wide tables to long tables dplyr gather() Andrew’s tutorial says it all. https://learn.r-journalism.com/en/wrangling/tidyr_joins/tidyr-joins/
turn long tables to wide tables dplyr spread() https://learn.r-journalism.com/en/wrangling/tidyr_joins/tidyr-joins/
seperat one columns into two dplyr separate(data, col, into, sep = “[^[:alnum:]] +”, remove = TRUE, convert = FALSE, extra = “warn”, fill = “warn”, …) https://rstudio.com/wp-content/uploads/2019/01/Cheatsheets_2019.pdf#page=9
unite two columns into one dplyr unite() https://rstudio.com/wp-content/uploads/2019/01/Cheatsheets_2019.pdf#page=9
separate elements in a column into rows with each individual element dplyr separate_rows()
Makes each element of the list-column into its own row. janitor ia_lobby_cl <- ia_lobby_cl %>% mutate(new_lobbyists = str_split(lobbyists, pattern = “,”)) %>% unnest_longer(new_lobbyists)

Data table summary

function package syntax notes references
Get all the unique/distinct values of a column dplyr, base unique(df$column) OR distinct(df$column)
count the number of instances for each distinct value dplyr n_distinct(df$col, na.rm = TRUE) many functions have na.rm = T/F arguments. Here it’s set to TRUE as a demo, but in many cases you would want to set it to F too, depending on if you wish to include the NAs.
A gilmpse of your data tibble,base glimpse(df) OR str(df) glimpse() usually offers better and more complete printout.
find out unique values and frequencies of a vector. janitor, base table()/tabyl() basically gives you a frequency table. See count() for application in a data frame
find out unique values and frequencies of a column in a dataframe. dplyr count(df, column, sort = T) When sort = T, will return the list in descending order
min, max, mean, quantiles base summary()
add a column counting the observations of another column dplyr mtcars %>% add_count(cyl) add_count() is a short-hand for group_by() + add_tally() Also you won’t need mutate()
Get the means and sums of rows and columns base filter(rowSums(!is.na(wi_lobby))>=3) This syntax helps filter columns that contain a certain number of NAs.
pivot table dplyr pivot_table <- df %>% group_by(column) %>% summarize (mean = mean(another_column), count = n()) group_by(df, column) %>% summarize( count = n(), mean = mean()) frequently is combined with %>% arrange(desc()) and specify the new summary column you created. the n() achieves similar effect as df %>% count(column).

ggplot2

function package syntax notes references
ggplot_point() for geolocating with some groupings ggplot2 geom_point(data=stations, aes( x= lon, y = lat, size=staff, color=DESCRIPTION), fill=“white”, shape=1)

shape=1: dots

shape=2: triangles

shape=3: cross

shape=4: X

shape=5: square
http://learn.r-journalism.com/en/mapping/geolocating/geolocating/
display integer only on axes ggplot2 scale_y_continuous(breaks=c(1,3,7,10))

my-project

scientific=F, in 04_chunk, Rmd
https://stackoverflow.com/questions/15622001/how-to-display-only-integer-values-on-an-axis-using-ggplot2
x axis labels too long! Auto wrapping of labels str_wrap + ggplot2/scales

scale_x_discrete(labels = function(x) str_wrap(x, width = 10)

or

scale_x_discrete(labels = wrap_format(10))
https://stackoverflow.com/questions/21878974/auto-wrapping-of-labels-via-labeller-label-wrap-in-ggplot2

File Paths

function package syntax notes references
List all the files under a directory fs, base

zip_files <- dir_ls(raw_dir, glob = "*.zip" regexp = “expends.+”)

OR

contrib_files <- list.files(raw_dir, pattern = “.txt”, recursive = TRUE, full.names = TRUE)

Recursive is very important! It determines if the search will go deeper into sub-directories.

glob is also an important notion. A wildcard aka globbing pattern (e.g. *.csv) passed on to grep() to filter paths

The products of dir_ls() are “fs_path” named“character” objects while list.files() returns vectors. The defaults are also different.
https://github.com/irworkshop/accountability_datacleaning/blob/campfin/R_campfin/pa/expends/pa_expends_diary.md
Make a new directory here, fs

raw_dir <- here(“pa”, “contribs”, “data”, “raw”)

dir_create(raw_dir)
dir_create() is powerful when combined with the here package. here() doesn’t really create the directory https://github.com/r-lib/here
Construct the path to a file from components in a platform-independent way base file.path(‘testdir2’, ‘testdir3’) That way you don’t really care about the “" or”/"
Get the file info fs file.info(“your_file”)$mtime This syntax gives you the last modified time of a file

Set operations & joins

function package syntax notes references
Test if an element is in a vector base x %in% valid_city passing in two vectors will give you a logial vector. It is useul in specifying conditions for filtering. Similar to the IN statement in SQL.
Test if an element is not in a vector campfin x %out% valid_city it is !%in%
find out how many elements of a vector are in or out of another vector campfin count_in(x,y, na.rm = T) count_out(x,y, na.rm = F) prop_in(x,y, na.rm = T) prop_out(x,y, na.rm = T) campfin is a package written by my colleague Kiernan Nicholls at IRW, yay! count_in() returns the number, and prop_in() returns the percentage. The package incorporate smany data insepction functionalities wrapped in easy functions like this one. It also does a lot of heavy lifting for normalizing address, city, state, zip and so on.
find out elements in x that are not in y base, dplyr setdiff(x, y) OR anti_join(x,y, by = (“col1”,“col2”)) setdiff() removes duplicates! Order is important!
join two dataframes based on matching column or columns dplyr ia_lobby_cl <- ia_lobby_cl %>%left_join(zipcodes, by = c(“zip_norm” = “zip”, “city_norm” = “city”)) left_join(), right_join(), inner_join(), full_join(). Order is important. if the columns in two dfs have the same name, the by statement can just be by = “state”. http://www.datasciencemadesimple.com/join-in-r-merge-in-r/ https://learn.r-journalism.com/en/wrangling/tidyr_joins/tidyr-joins/
vlookup - to look for corresponding values in a separate vector or data frame lookup(df1$term, df2[c(‘term’,‘key’)]) OR df1$term %l% df2[c(‘term’,‘key’)] The lookup() / %l% (percent-letter L -percent)functions have a lot of restrictions, including the dataframe to be matched (by default, should be two columns). Watch out for multiple keys in df2 matching a single term in df1.reassign let you map the results to whatever vector you wish to assign to the results. The example here appears to be more about reassigning and mapping than looking up for matching values. See documentation for more info. https://www.rdocumentation.org/packages/qdapTools/versions/1.3.3/topics/lookup
Join two dataframes based on fuzzy matching fuzzyjoin stringdist_inner_join()

https://cran.r-project.org/web/packages/fuzzyjoin/README.html

https://www.r-bloggers.com/fuzzy-string-matching-a-survival-skill-to-tackle-unstructured-information/
https://github.com/dgrtwo/fuzzyjoin

Conditions

function package syntax notes references
For a set of logical vectors, evaluate if at least one condition is met/evaluated to TRUE? base locality_position <- lapply(list, unlist,recursive = T) %>% map(str_detect, “locality”) %>% map_lgl(any) Similar to the “OR” operator. The syntax returns locality_position, aka the index of a list to finds out the position of the string “locality” was in the list.
Change columns based on certain conditions (data type) dplyr mutate_if(is_character, str_to_upper) The mutate_if() variants apply a predicate function (a function that # returns TRUE or FALSE) to determine the relevant subset of # columns. https://stackoverflow.com/questions/42052078/correct-syntax-for-mutate-if
Test if characters are part of a string base grepl(value,chars, fixed=TRUE) https://stackoverflow.com/questions/10128617/test-if-characters-are-in-a-string
count how many cases satisfy a condition dplyr

sum(pa$STATE != “PA”, na.rm = TRUE)

or

pa %>% filter(STATE != “PA”) %>% count()
basically counting logical vector values. count cases usually rely on the tallying of a logical vector.
find elements in vector X that are not in Y base x[! (x %in% y)] Be especially cautious with NAs! The rows evaluated to NAs would be retained! https://statisticsglobe.com/setdiff-r-function/ https://www.youtube.com/watch?v=8hSYEXIoFO8
Change variable strings based on conditions dplyr mutate (variables = case_when (variable ==, ~)) case_when() only supports turning certain cells into a set string, but if the cell is dependent on other variables, use an if_else() or ifelse() function is probably the best bet. http://learn.r-journalism.com/en/mapping/static_maps/static-maps/
Modify character strings based on variable conditions dplyr, base wy <- wy %>% mutate(city_clean = if_else(condition = match_distance <=2, true = city_swap, false = city_raw)) OR ifelse()

Batch processing

function package syntax notes references
Read multiple csv into one master csv vroom vroom::vroom(files)
Mutate multiple columns at once purrr aklr <- aklr %>% mutate_at( .vars = vars(ends_with(“address”)), .funs = list(norm = normal_address), add_abbs = usps_street, na_rep = TRUE ) mutate_at() lets you deal with multiple columns at the same time, very useful when combined with the vars() function with similar semantics to select().
apply a function to multiple elements purrr dir_ls(path = raw_dir, glob = "*.csv") %>% map( read_delim, args) map() returns a list, but you can specify vector types with map_dbl() or map_char(). Map_depth() is very helpful, especially if you are working with nested list and wish to flatten lists https://r4ds.had.co.nz/iteration.html#mapping-over-multiple-arguments https://rstudio.com/wp-content/uploads/2019/01/Cheatsheets_2019.pdf#page=14

R Studio Shortcuts

Here’s a comprehensive R Studio IDE cheat sheet.

what_it_does on_the_screen you_will_type
switch between source editor and console ctrl + 1 - source ctrl + 2 - console ctrl+3 - help/viewer/plots/packages/files ctrl+4 - history/environment
access shortcuts cheatsheet in R Studio shift + option/alt + K
pipe %>% shift + cmd/CTRL + M
assignment operator -> option/Alt + (-)
multiline comment

#
CTRL/cmd + SHIFT + C
run current chunk shift + cmd/CTRL + Enter
run selected lines of code cmd/ctrl + Enter. This shortcut moves the cursor to the next line. To execute without moving the cursor, press alt/opt + Enter
stop current command esc
access a list of previous commands in the console CTRL/cmd + uparrow (this also work when you have type a few words and this shortcut will let you search in the history
move the selected code up/down alt/opt + uparrow/downarrow
rename variables in this scope cmd/ctrl + alt/opt + shift + M
replace with search results shift + cmd/ctrl + J
auto fill arguments tab. Move the cursor after function name will give generate a popup of function documentaion. Press F1 (for Mac users, Fn + F1) at this time has the same effect of typing ?function anew in the console and the documentatoin will show up in the Help window
search for file or function ctrl + .
search for tab >> in the top right corner of the source pane shift + ctrl+ .
switch between tabs ctrl + tab goes forward. to go backward, + shift
Unfold/fold outlines (the Rmd structure) shift + cmd/CTRL + O
fold comments option/Alt + cmd/CTRL+ L. To uncollapse, + shift
Jump to chunk (start of line) cmd/ctrl + shift + option/alt + J
collapse all headers ## Header 2 <––> option/Alt + cmd/CTRL+ O. To uncollapse, + shift
Next chunk cmd/ctrl+pagedown
Knit shift + cmd/ctrl + K

Good reads

Feel free to use the cheat sheet however you want. It is definitely imperfect and not remotely all-encompassing or error-free. Please don’t hesitate to point out any mistakes in the explanation.If there’s any R function you wish to add that could be helpful for data cleaning, please fill out this google form and I’ll add them to this file.

Many thanks to Kiernan Nicholls and Prof. Michael Kearney for teaching me their R skills, and Yan Wu for helping me with cutomized CSS of this webpage.