The Beginner's R Survival Guide

I’m not an expert in R, but while working on my project on microbial ecology, I often felt overwhelmed managing different datasets. There was chaos. Most times, I’d forget what each file was for or how I did certain steps. Eventually, I lost track of so much that I decided to collect tips and best practices to make my RStudio workflow smoother. This blog post is a beginner-friendly guide to help you write better R code, organise your projects, and share your work more effectively.

NOTE: I used examples from microbial ecology, but the approach is similar for any R-based data analysis. Please don’t focus too much on terms like OTU or alpha diversity. You can replace them with general names like sales_data or apple_counts if that makes it easier to follow.


Beginner Essentials: Writing, Running, and Organising R Code

(Focus: Clean workflows, avoiding common pitfalls, setting up projects)
This guide helps new R users establish good habits from the start. Whether you’re setting up a project, writing scripts, or debugging errors, these tips will save you time and frustration.


I. Project Setup & File Management

Problem: Saving scripts randomly (e.g., on your desktop) leads to lost work and confusion.
Fix: Use a consistent folder system and RStudio Projects to keep everything organised and easy to find.

1. Basic Project Structure

A clear folder setup is the foundation of a reproducible project. Here’s a simple structure to follow:

Microbial_Diversity_Project/
├── data/            # Raw data files (never edit these directly!)
│   ├── otu_table.csv
│   └── metadata.txt
├── scripts/         # Your R code, numbered for order
│   ├── 01_data_clean.R
│   └── 02_alpha_diversity.R
├── outputs/         # Results like plots and tables
│   ├── figures/
│   └── tables/
├── backups/         # Optional: Save workspace or script backups here
└── README.txt       # Quick notes about the project
  • data/: Store untouched raw data here to preserve the originals.
  • scripts/: Organise scripts by task with numbers (e.g., 01* for cleaning, 02* for analysis).
  • outputs/: Keep results separate for easy access (subfolders like figures/ for plots).
  • backups/: (Optional) Save workspace or script backups for extra safety.
  • README.txt: Write down what the project does and any key details.

2. Always Use RStudio Projects (.Rproj)

Why?
RStudio Projects automatically set your working directory to the project folder, so your code will “just work” on any computer.
How?

  • In RStudio: Go to File > New Project > New Directory and create your project in a new folder.
  • This creates a .Rproj file—always open this to work on your project.

3. Set Your Working Directory Safely

Never do this (absolute path, only works on your computer):

setwd("C:/Users/YourName/Desktop/Project")

Do this instead:

  • Best: Use RStudio Projects (recommended, see above).
  • Or: Use the {here} package for robust, portable paths:

    # Install once: install.packages("here")
    library(here)
    otu_data <- read.csv(here("data", "otu_table.csv"))
    
  • Tip: Always use relative paths (e.g., "data/otu_table.csv") so your code works anywhere.

II. Script Writing Fundamentals

Problem: A messy, giant script is hard to read or fix.
Fix: Structure your script logically and adopt good habits early.

1. Script Structure Template

A well-organised script is like a recipe: it’s clear what each part does. Here’s a starter template:

# ---- HEADER ----
# Project: Microbial Diversity Study
# Author: Your Name
# Date: 2023-08-01
# Description: Cleans OTU table and calculates alpha diversity

# ---- SETUP ----
rm(list = ls())  # Clear memory to avoid old variables causing issues
library(readr)   # Load tools for reading data
library(here)    # For safe file paths

input_path <- here("data", "otu_table.csv")
output_dir <- here("outputs", "figures")

# ---- LOAD DATA ----
otu_data <- read_csv(input_path)

# ---- CLEAN DATA ----
otu_data_clean <- otu_data[complete.cases(otu_data), ]  # Remove rows with missing values

# ---- SAVE RESULTS ----
write_csv(otu_data_clean, here("data", "otu_table_clean.csv"))
  • Header: Notes about the project (who, when, why).
  • Setup: Clear out old data and load packages.
  • Load Data: Bring in your raw data.
  • Clean Data: Make it usable.
  • Save Results: Store the output.

Note: Using rm(list = ls()) at the start of your script does not truly reset your R session. It only removes objects, but loaded packages, options, and the working directory remain unchanged. For truly reproducible scripts, always restart R (e.g., with Ctrl+Shift+F10 in RStudio) and ensure your script loads all needed packages and sets options explicitly.

2. Life-Saving Habits

  • Save often: Hit Ctrl+S in RStudio to avoid losing work.
  • Comment clearly: Explain tricky steps for your future self:
    # Convert sample dates from MM/DD/YY to YYYY-MM-DD for consistency
    metadata$Date <- as.Date(metadata$Date, format = "%m/%d/%y")
    
  • Use section breaks: In RStudio, Ctrl+Shift+R adds collapsible sections (e.g., # ---- LOAD DATA ----) for easy navigation.

III. Running Code Without Panic

Problem: Running a huge script all at once creates a mess of errors.
Fix: Take it step-by-step to catch issues early.

1. Safe Execution Workflow

  • Open your script in RStudio.
  • Run lines one-by-one with Ctrl+Enter (or Cmd+Enter on Mac).
  • After each step, peek at the Environment pane (top-right) to see what’s in memory.
  • Fix problems right away before moving on.

2. When Things Break

  • Check your data:
    print(head(otu_data))  # Look at the first few rows
    
  • List variables:
    ls()  # See everything in memory
    
  • Test snippets:
    mean(otu_data_clean$Abundance, na.rm = TRUE)  # Try a small piece in the console
    
  • Tip: Restart R (Session > Restart R, Ctrl+Shift+F10 (Windows and Linux), or Command+Shift+F10 (Mac OS)) if you’re stuck—it clears memory and lets you start fresh.
    • Even if you’re fine, it’s always best to restart R very often and re-run your under-development script from the top.

IV. Saving & Exporting Results

Problem: Manually saving plots or tables isn’t repeatable.
Fix: Use code to save everything automatically.

1. Save Plots Automatically

Programmatic saving ensures consistency:

png(file = here("outputs", "figures", "shannon_histogram.png"), width = 800, height = 600)
hist(otu_data_clean$Shannon, main = "Shannon Diversity Index Distribution")
dev.off()  # Closes the file—don’t skip this!

dev.off() is critical; without it, R might crash.

2. Save Tables

Export data to CSV for later use:

write.csv(otu_data_clean, here("outputs", "tables", "otu_table_clean.csv"), row.names = FALSE)

row.names = FALSE keeps the file clean by skipping row numbers.

3. Never Lose Work

Save your workspace as a backup:

save.image(here("backups", "workspace_after_cleaning.RData"))
# Reload later with:
load(here("backups", "workspace_after_cleaning.RData"))

V. Avoiding 10 Common Beginner Traps

Here’s how to dodge frequent headaches:

1. Path Errors

Problem: Wrong file paths break your code.

read.csv("otu_table.csv")  # Assumes script is in the root

Fix: Use relative paths or here() from your project directory.

read.csv(here("data", "otu_table.csv"))

2. Overwriting Variables

Problem: Reusing names wipes out data.

result <- calculate_diversity(otu_data)
result <- plot_diversity(otu_data)  # Oops, diversity results are gone!

Fix: Use unique, descriptive names.

diversity_result <- calculate_diversity(otu_data)
diversity_plot <- plot_diversity(otu_data)

3. Not Closing Plots

Problem: Open plot devices cause errors.

png("my_plot.png")
plot(x, y)  # No dev.off() = trouble

Fix: Always close.

dev.off()

4. Case Sensitivity

Problem: otu_data$abundance fails if it’s otu_data$Abundance.
Fix: Check names with names(otu_data).

5. Ignoring Warnings

Problem: Skipping warnings like “NAs introduced” hides issues.
Fix: Read the console for yellow messages and investigate.

6. Spaces in File Paths

Problem: Spaces break paths.

read.csv("otu table.csv")  # Fails

Fix: Rename files (otu_table.csv) or quote paths.

7. Missing Packages

Problem: library(vegan) fails if it’s not installed.
Fix: Add a check.

if (!require("vegan")) install.packages("vegan")
library(vegan)

8. Ruining Raw Data

Problem: Editing raw data directly.

otu_data$Abundance <- otu_data$Abundance * 2

Fix: Copy first.

otu_data_clean <- otu_data
otu_data_clean$Abundance <- otu_data_clean$Abundance * 2

9. Unreadable Code

Problem: Cramming too much into one line.

x <- otu_data[otu_data$Abundance > 100 & !is.na(otu_data$Abundance), c(1, 3, 5)]

Fix: Break it up.

high_abundance <- otu_data[otu_data$Abundance > 100 & !is.na(otu_data$Abundance), ]
selected_cols <- high_abundance[, c("SampleID", "OTU", "Abundance")]

10. No Backups

Fix: Save versions (script_v1.R), use cloud storage, or email yourself.


VI. Debugging 101: Fix Errors Like a Pro

Errors happen—here’s how to handle them:

1. Read Error Messages

They point you to the problem.

Example:

Error in mean(otu_data$Abudance) : object 'Abudance' not found

Steps:

  • Check spelling (Abundance vs. Abudance).
  • Confirm columns with names(otu_data).

2. Ultimate Debugging Trick

Add print() to peek inside:

otu_data_clean <- otu_data[otu_data$Abundance > 0, ]
print(nrow(otu_data_clean))  # How many samples left?

3. Isolate the Issue

  • Copy the broken part to a new script.
  • Simplify it (e.g., test with 5 rows).
  • Run it to find the culprit.

Tip: Search errors on Stack Overflow or use ?function_name for help.


VII. Essential Shortcuts & Tools

1. RStudio Shortcuts

  • Ctrl+Enter: Run a line.
  • Ctrl+Alt+R: Run the whole script.
  • Ctrl+Shift+C: Comment/uncomment lines.
  • Tab: Auto-complete names.

2. Handy Functions

View(otu_data)     # Spreadsheet view
str(otu_data)      # Data structure
summary(otu_data)  # Quick stats
dir()              # List files
getwd()            # Current directory

3. Memory Management

rm(temp_data)      # Remove one object
rm(list = ls())    # Clear everything (careful!)
gc()               # Free memory

VIII. Reproducibility Checklist

Before closing R:

  • Save scripts.
  • Save workspace if needed:
    save.image(here("final.RData"))
    
  • Log your setup:
    writeLines(capture.output(sessionInfo()), here("session_info.txt"))
    

Before sharing:

  • Test in a fresh session (Ctrl+Shift+F10).
  • Zip the project folder.
  • Include session_info.txt.

Final Tip: The 30-Second Rule

Ask yourself: “If I reopen this in 6 months, will I get it in 30 seconds?”
Good organisation makes this a “yes”!


Level Up: Pro Tips for Efficient R Coding and Organisation

For Beginners Ready to Level Up

This section is for users comfortable with R basics who want to work smarter, not harder. Learn pro techniques to write cleaner code, speed up tasks, and share your work effectively.


I. Writing R Code Like a Pro

1. Avoid the Global Environment Trap

Problem: Typing commands in the console loses them forever.
Fix: Use scripts (.R files) and RStudio Projects (.Rproj):

# File > New Project > New Directory
# Keeps files tidy and sets the working directory

2. Meaningful Names & Snake Case

Problem: x or df1 means nothing later.
Fix: Use clear, snake_case names:

soil_otu_table <- read_csv("soil_otu_table_2023.csv")  # Not `d`

3. Comment Strategically

Explain why you’re doing something:

# ---- Data Cleaning ----
# Remove samples with fewer than 1000 reads (low quality)
filtered_otu <- filter(soil_otu_table, TotalReads >= 1000)

4. Functions Over Repeated Code

Problem: Copy-pasting code is error-prone.
Fix: Write functions:

calculate_shannon <- function(abundance_vector) {
  prop <- abundance_vector / sum(abundance_vector)
  -sum(prop * log(prop), na.rm = TRUE)
}
otu_data$Shannon <- apply(otu_data[, -1], 1, calculate_shannon)

5. Never Hardcode Paths

Problem: Absolute paths fail on other machines.
Fix: Use here::here():

library(here)
metadata <- read_csv(here("data", "metadata.csv"))

II. Executing Code Efficiently

1. Debugging Like a Detective

Pause and inspect with browser():

calculate_richness <- function(abundance) {
  browser()  # Stops here—explore variables
  sum(abundance > 0)
}

2. Speed Up Code

  • Vectorise: Skip loops for faster operations.
    otu_data$RelativeAbundance <- otu_data$Abundance / sum(otu_data$Abundance)
    
  • Use data.table: For big data:
    library(data.table)
    setDT(otu_data)
    otu_data[, TotalAbundance := sum(Abundance), by = SampleID]
    

3. Handle Memory Wisely

rm(unused_otu_table)
gc()  # Garbage collection

III. Changing Code Safely

1. Version Control with Git

Track changes in RStudio:

Project > Version Control > Git
Commit: Ctrl+Alt+M

If you’re not ready for Git:

  • Save versions of scripts as script_v1.R, script_v2.R, etc.
  • Use cloud storage or email yourself backups.

2. Refactoring with Confidence

Test changes with testthat:

library(testthat)
test_that("calculate_shannon works", {
  test_abund <- c(10, 20, 30)
  expect_equal(round(calculate_shannon(test_abund), 2), 1.01)
})

3. Dependency Management

Use renv for package consistency:

library(renv)
renv::init()
renv::snapshot()  # Save versions

IV. Visualising Data Effectively

1. ggplot2 Shortcuts

Define once, reuse:

base_plot <- ggplot(otu_data, aes(x = SampleType, y = Shannon))
base_plot + geom_boxplot() + labs(title = "Shannon Diversity by Sample Type")

2. Interactive Plots with plotly

Add interactivity:

library(plotly)
static_plot <- ggplot(otu_data, aes(x = SampleType, y = Shannon)) + geom_boxplot()
ggplotly(static_plot)

3. Colorblind-Friendly Palettes

Use viridis:

library(viridis)
ggplot(otu_data, aes(x = SampleType, y = Shannon, fill = SampleType)) +
  geom_boxplot() +
  scale_fill_viridis(discrete = TRUE)

V. Sharing & Reproducibility

1. R Markdown/Quarto Reports

Mix code and text for reproducible reports:

---
title: "Microbial Diversity Report"
output: html_document
---

```{r}
library(here)
otu_data <- read_csv(here("data", "otu_table.csv"))
summary_stats <- otu_data %>%
  group_by(SampleType) %>%
  summarise(mean_shannon = mean(Shannon, na.rm = TRUE))
print(summary_stats)
```

2. Share Interactive Apps with Shiny

Quick app:

library(shiny)
ui <- fluidPage(
  selectInput("sample_type", "Sample Type", unique(otu_data$SampleType)),
  plotOutput("shannon_plot")
)
server <- function(input, output) {
  output$shannon_plot <- renderPlot({
    otu_data %>%
      filter(SampleType == input$sample_type) %>%
      ggplot(aes(x = SampleID, y = Shannon)) +
      geom_bar(stat = "identity") +
      labs(title = paste("Shannon Diversity for", input$sample_type))
  })
}
shinyApp(ui, server)

3. Reproducible Sessions

Log your setup:

sessionInfo()  # R version, packages, etc.

VI. Lesser-Known Life Savers

1. Pipe Debugging

Break pipes with browser():

otu_data %>%
  filter(SampleType == "Soil") %T>%
  { browser() } %>%
  group_by(SampleID)

2. Fast Row-Wise Operations

Use purrr::pmap():

library(purrr)
otu_data <- otu_data %>%
  mutate(ratio = pmap_dbl(list(Abundance, TotalReads), ~ ..1 / ..2))

3. Secret Shortcuts

  • Alt + -: Insert <-
  • Ctrl + Shift + F10: Restart R

VII. Final Checklist Before Sharing

  • Remove absolute paths (use here::here()).
  • Test in a fresh session (Ctrl+Shift+F10).
  • Include sessionInfo().

Knowing this will make your life much easier!


FAQ & Troubleshooting

Q: My script can’t find a file!
A: Make sure you’re using relative paths or the here() package, and that you’re working inside an RStudio Project.

Q: I get “object not found” errors.
A: Check your spelling and run code step-by-step to see where things break.

Q: How do I get help?
A: Use ?function_name in R, or search your error message on Stack Overflow or RStudio Community.


If you follow these practices, your R life will be much easier, your projects will be reproducible, and your future self will thank you!




    Enjoy Reading This Article?

    Here are some more articles you might like to read next:

  • Pollution, Populations, and the Role of Mathematical Modelling
  • Statistics for Scientific Research
  • Microplastics in Ovaries: A Silent Threat to Fertility
  • Tackling Urban Debris: Insights from Australia’s Metropolitan Clean-Up Efforts
  • Microplastics in Soil - Findings from a Two-Year Study