R For Beginners: R You Ready?

The Background Story

R is an object oriented programming language that had its first stable beta version in 2000. The objects package data and procedures that operate on that data. For example, you can define a variable as a data frame, a table-like structure containing data, and attach instructions that change the data frame.

Still not getting it? Alright.

You receive a bunch of ingredients and a recipe for a cake. The ingredients and the recipe are objects. These two objects are packaged together in your grocery bag. The ingredients need to be transformed according to a set procedure (i.e. the recipe). Once you apply these changes, the ingredients will change into your desired end product (i.e. the cake).

Who Should Learn R?

TL;DR: Learn R if you like data and work in Academia or Healthcare.

R is primarily used for data analysis, statistics, and machine learning. It stores data in different structures, including matrices, vectors, data frames, among several others. As a prerequisite to learning R, you should be comfortable with statistical concepts, matrices, algebra, and databases.

The most common industries utilizing R tend to be Academia and Healthcare (Source 1). However, anyone interested in general programming would stand to benefit from learning R. Understanding any object-oriented programming language is a powerful thing in the 21st century. The concepts for any object-oriented language are translatable to others.

Why Should I Learn R?

TL; DR: R has many advantages over Excel.

  1. Reproducibility- easy to reproduce analyses or procedures that you want to apply repeatedly (i.e. automation)
  2. Visualization- capable of creating interactive, complex visualizations through libraries such as Shiny
  3. Price- the software is free
  4. Popularity- ranks as #3 most popular programming language for data scientists
  5. Community- R is open source, has a dedicated community that continues to develop new packages and post them to the package repository called CRAN, and there is plenty of documentation online to help you learn (See list of resources at the bottom)
  6. Big data- R can handle data at a much larger scale than Excel
  7. Statistics- the range of statistical analyses available far outnumber what is possible in Excel and most other programming languages.

Where Do I Begin?

TL; DR: Install Rstudio

When it comes to programming, the first suggestion I would have for anyone is to download an Integrated Development Environment (IDE). IDEs are applications that handhold you throughout the process of writing code. It is akin to the spell check and grammatical help that Microsoft Word provides. Personally, I use Rstudio for my R programming, but I know Jupyter notebook is fantastic as well. You could install BOTH at the same time if you install Anaconda; however, I would only suggest that if you know you will be pursuing a more data-centric job, such as a data analyst or data scientist.

What are Packages and Libraries?

Packages are a collection of R functions that simplify various procedures. I would highly suggest installing the most popular packages prior to any coding , assuming they are relevant to your goals. Below are the top 6 I would recommend considering to get you started along your path.

Package NamePurpose
DplyrData manipulation
Ggplot2Graphics visualization, static plots
RmarkdownReproducible reporting
ShinyDevelop interactive, web apps
CarCalculates analysis of variance tables (ANOVA)
CaretTools for training classification and regression models

Ready to Start Coding?

I hope that you now have a clearer understanding of R. I would recommend checking out some of the resources listed below, including my blog about ANOVA and simple linear regression.

Now go start coding!

~ The Data Generalist

Recommended Resources

NameTypeSummaryLink/Handle
R CheatsheetsCheatsheetsSummary guides for different R topics, including data imports, transformations, and machine learning.RstudioCheatsheets
R Notes for ProfessionalsDigital BookIntro to R, including variables, matrices, classes, lists, etc.RNotesforProfessionals
R for Data ScienceDigital BookImporting, transforming, modeling, and visualizing data.RforDataScience
ANOVA and Simple Linear RegressionThe Data Generalist BlogR problem sets walking through ANOVA and simple linear regression.https://thedatageneralist.com/rtutorial-anova-linear-regression
R Functions for Regression AnalysisCheatsheetUseful functions for regression analysis, including modeling, variable selection, diagnostics, transformation, trees, etc.https://cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdf
Hadley Wickham, chief scientist at RstudioTwitterHadley builds tools (computational and cognitive) that make data science easier, faster, and more fun.https://twitter.com/hadleywickham
Garrett Grolemund, data scientist and instructor for RstudioTwitterHelpful tweets on learning R with an emphasis on data sciencehttps://twitter.com/StatGarrett
R CheatsheetsCheatsheetsSummary guides for different R topics, including data frames, useful functions, etc.RCheatsheetsGdrive
Hands on Machine Learning with RDigital BookHands on machine learning models, such as regression and clustering.HandsonMLwithR
R ViewsBlogsR community blogs with up to date information.https://rviews.rstudio.com/
A ModernDive into R and the tidyverseDigital BookAn introduction to R, the tidyverse, and statistical inference via data science.https://moderndive.com/

Sources
https://www.guru99.com/r-programming-introduction-basics.html
https://doc.rust-lang.org/1.24.0/book/second-edition/ch17-01-what-is-oo.html
https://en.wikipedia.org/wiki/R_(programming_language)#History


2 Comments

Leave a Reply