R For Beginners: R You Ready?
The Background Story
R is an object oriented programming language that had its first stable beta version in 2000. The objects package data and procedures that operate on that data. For example, you can define a variable as a data frame, a table-like structure containing data, and attach instructions that change the data frame.
Still not getting it? Alright.
You receive a bunch of ingredients and a recipe for a cake. The ingredients and the recipe are objects. These two objects are packaged together in your grocery bag. The ingredients need to be transformed according to a set procedure (i.e. the recipe). Once you apply these changes, the ingredients will change into your desired end product (i.e. the cake).
Who Should Learn R?
TL;DR: Learn R if you like data and work in Academia or Healthcare.
R is primarily used for data analysis, statistics, and machine learning. It stores data in different structures, including matrices, vectors, data frames, among several others. As a prerequisite to learning R, you should be comfortable with statistical concepts, matrices, algebra, and databases.
The most common industries utilizing R tend to be Academia and Healthcare (Source 1). However, anyone interested in general programming would stand to benefit from learning R. Understanding any object-oriented programming language is a powerful thing in the 21st century. The concepts for any object-oriented language are translatable to others.
Why Should I Learn R?
TL; DR: R has many advantages over Excel.
- Reproducibility- easy to reproduce analyses or procedures that you want to apply repeatedly (i.e. automation)
- Visualization- capable of creating interactive, complex visualizations through libraries such as Shiny
- Price- the software is free
- Popularity- ranks as #3 most popular programming language for data scientists
- Community- R is open source, has a dedicated community that continues to develop new packages and post them to the package repository called CRAN, and there is plenty of documentation online to help you learn (See list of resources at the bottom)
- Big data- R can handle data at a much larger scale than Excel
- Statistics- the range of statistical analyses available far outnumber what is possible in Excel and most other programming languages.
Where Do I Begin?
TL; DR: Install Rstudio
When it comes to programming, the first suggestion I would have for anyone is to download an Integrated Development Environment (IDE). IDEs are applications that handhold you throughout the process of writing code. It is akin to the spell check and grammatical help that Microsoft Word provides. Personally, I use Rstudio for my R programming, but I know Jupyter notebook is fantastic as well. You could install BOTH at the same time if you install Anaconda; however, I would only suggest that if you know you will be pursuing a more data-centric job, such as a data analyst or data scientist.
What are Packages and Libraries?
Packages are a collection of R functions that simplify various procedures. I would highly suggest installing the most popular packages prior to any coding , assuming they are relevant to your goals. Below are the top 6 I would recommend considering to get you started along your path.
Package Name | Purpose |
---|---|
Dplyr | Data manipulation |
Ggplot2 | Graphics visualization, static plots |
Rmarkdown | Reproducible reporting |
Shiny | Develop interactive, web apps |
Car | Calculates analysis of variance tables (ANOVA) |
Caret | Tools for training classification and regression models |
Ready to Start Coding?
I hope that you now have a clearer understanding of R. I would recommend checking out some of the resources listed below, including my blog about ANOVA and simple linear regression.
Now go start coding!
Recommended Resources
Name | Type | Summary | Link/Handle |
---|---|---|---|
R Cheatsheets | Cheatsheets | Summary guides for different R topics, including data imports, transformations, and machine learning. | RstudioCheatsheets |
R Notes for Professionals | Digital Book | Intro to R, including variables, matrices, classes, lists, etc. | RNotesforProfessionals |
R for Data Science | Digital Book | Importing, transforming, modeling, and visualizing data. | RforDataScience |
ANOVA and Simple Linear Regression | The Data Generalist Blog | R problem sets walking through ANOVA and simple linear regression. | https://thedatageneralist.com/rtutorial-anova-linear-regression |
R Functions for Regression Analysis | Cheatsheet | Useful functions for regression analysis, including modeling, variable selection, diagnostics, transformation, trees, etc. | https://cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdf |
Hadley Wickham, chief scientist at Rstudio | Hadley builds tools (computational and cognitive) that make data science easier, faster, and more fun. | https://twitter.com/hadleywickham | |
Garrett Grolemund, data scientist and instructor for Rstudio | Helpful tweets on learning R with an emphasis on data science | https://twitter.com/StatGarrett | |
R Cheatsheets | Cheatsheets | Summary guides for different R topics, including data frames, useful functions, etc. | RCheatsheetsGdrive |
Hands on Machine Learning with R | Digital Book | Hands on machine learning models, such as regression and clustering. | HandsonMLwithR |
R Views | Blogs | R community blogs with up to date information. | https://rviews.rstudio.com/ |
A ModernDive into R and the tidyverse | Digital Book | An introduction to R, the tidyverse, and statistical inference via data science. | https://moderndive.com/ |
Sources
https://www.guru99.com/r-programming-introduction-basics.html
https://doc.rust-lang.org/1.24.0/book/second-edition/ch17-01-what-is-oo.html
https://en.wikipedia.org/wiki/R_(programming_language)#History