R Programming (P1008)


Introduction

R is the language of big data - a statistical programming language that helps describe, mine, and test relationships between large amounts of data. The trainer will use R to model statistical relationships using graphs, calculations, tests, and other analysis tools.

What you will learn

Learn how to enter and modify data; create charts, scatter plots, and histograms; examine outliers; calculate correlations; and compute regressions, bivariate associations, and statistics for three or more variables. Some of the core topics are as follows:

  • Installing R on your computer
  • Using the built-in datasets
  • Importing data
  • Creating charts for association
  • Calculating correlations
  • Creating bar and pie charts for categorical variables
  • Understand base R graphic
  • Focus on GGplot2 graphics for R

Course Duration : 3 days

 

1.  Overview

1.1  History of R
1.2  Advantages and disadvantages
1.3  Downloading and installing
1.4  How to find documentation
 

2.  Introduction

2.1    Using the R console
2.2    Getting help
2.3    Learning about the environment
2.4    Writing and executing scripts
2.5    Object oriented programming
2.6    Introduction to vectorized calculations
2.7    Introduction to data frames
2.8    Installing packages
2.9    Working directory
2.10  Saving your work
 

3.  Variable types and data structures

3.1   Variables and assignment
3.2   Data types
3.3   Numeric, character, boolean, and factors
3.4   Data structures
3.5   Vectors, matrices, arrays, dataframes, lists
3.6   Indexing, subsetting
3.7   Assigning new values
3.8   Viewing data and summaries
3.9   Naming conventions
3.10 Objects
 

4.  Getting data into the R environment

4.1   Built-in data
4.2  Reading data from structured text files
4.3  Reading data using ODBC

5.  Dataframe manipulation with dplyr

5.1  Renaming columns
5.2  Adding new columns
5.3  Binning data (continuous to categorical)
5.4  Combining categorical values
5.5  Transforming variables
5.6  Handling missing data
5.7  Long to wide and back
5.8  Merging datasets together
5.9  Stacking datasets together (concatenation)
 

6.  Handling dates in R

6.1  Date and date-time classes in R
6.2  Formatting dates for modeling
 

7.  Control flow

7.1  Truth testing
7.2  Branching
7.3  Looping
 

8.  Functions in depth

8.1  Parameters
8.2  Return values
8.3  Variable scope
8.4  Exception handling
 

9.  Applying functions across dimensions

9.1  Sapply, lapply, apply
 

10.Exploratory data analysis (descriptive statistics)

10.1   Continuous data
10.2   Distributions
10.3   Quantiles, mean
10.4   Bi-modal distributions
10.5   Histograms, box-plots
10.6   Categorical data
10.7   Tables
10.8   Barplots
10.9   Group by calculations with dplyr
10.10  Split-apply-combine
10.11  Melting and casting data
 

11.  Inferential statistics

11.1   Bivariate correlation
11.2   T-test and non-parametric equivalents
11.3   Chi-squared test
 

12.  Base graphics

12.1   Base graphics system in R
12.2   Scatterplots, histograms, barcharts, box and whiskers, dotplots
12.3   Labels, legends, titles, axes
12.4   Exporting graphics to different formats
 

13.  Advanced R graphics: ggplot2

13.1   Understanding the grammar of graphics
13.2   Quick plots (qplot function)
13.3   Building graphics by pieces (ggplot function)
 

14.  General linear regression

14.1   Linear and logistic models
14.2   Regression plots
14.3   Confounding / interaction in regression
14.4   Scoring new data from models (prediction)