### R - Notes

The following are basically my notes while studying R and is meant as a reference point for myself

Just a few pointers to anyone preparing for R or studying R:

- Take a quick look at your statistical math basics before proceeding
- Before applying any formula on your base data, try to understand what the formula is and how it was derived (this will make it easier for one to understand)
- Use it in tangent with the Data Analysis in Excel
- Refer to the cheat sheets available on https://www.rstudio.com/resources/cheatsheets/
- Segregate the workbench for each module
- There are best practices that can be incorporated while programming in R
- Try and jot notes when and where one can...
- Refer to existing data-sets embedded in R before jumping into a data.gov file
- Refer to R programs written already in Azure ML

rnorm() by default
has mean 0 and variance 1

head() has its own
built in precision

*default settings in
R can be modified by the options() function

example:

options(digits = 15)

#will display 15
digits (Max digit for option display --> 22 and min digit --> 0): Error
if > 22 --> Error in options(digits = 30) :

#invalid 'digits'
parameter, allowed 0...22

#Infinity Operations

Inf/0 --> Inf

Inf * 0 --> Inf

Inf + 0 + (0/0)
--> NaN

Inf + 0 --> Inf

*The ls() lists all
the variable stored in R memory at a given point in time

*rm() will remove
contents from the list

*To figure out the
commands in R use the following command ? followed by the function that needs
to be leveraged:

?c()

?rand()

?max()

*Functions and
Datastructures

sin()

integrate()

plot()

paste()

*Again single valued
functions and multi valued functions

*A special vector is
called a factor

gl() --> generate
levels

*creating a function
in R

test<-function p="" x="">

{

x=5

return (x*x+(x^2))

}

*for loop in R

l*apply() vs
sapply()

*Binding elements

rbind() --> bind
elements in a matrix in a row manner

cbind() --> bind
elements in a matrix in a columnar manner

*Every vector/matrix
has a data mode....

logical

numerical

*Can be found using
mode()

*dimensions in
matrices

=defines the number
of rows and columns in a matrix

*can be used with
dimnames(),rownames(),columnnames()

*Navigating through
R package libraries really bad....

*HMISC -->
Harrell misc... Contains many functions useful for data analysis, high-level
graphics, utility operations, functions for computing sample size and power,
importing and annotating datasets, imputing missing values, advanced table
making, variable clustering, character string manipulation, conversion of R
objects to LaTeX code, and recoding variables.

*R search path is
the R working directory

getwd() --> get
working directory

setwd()

*to read in a table
format:

testfile <- filename="" p="" read.table="">

read.csv

read.csv2

read.fwf (fixed
width file)

*readLines()

scan()--> reads a
content of a file into a list or vector

f*ile() connections
can create connections to files for read/write purposes

write.table(line,file="myfile",append=TRUE)

f1<-file p="">

close(f1)-->
close the file connection

write.table(dataFieldName,filename)

write.csv

write.csv2

base::sink Send
R Output to a File

dump()

dput() --> save
complicated R objects (in ASCII format)

dget() -->
inverse of dput()

*file in conjunction
with open="w" option

R has its own
internal binary object

use save() &
load() for binary format

*RODBC Package

Common Functions

odbcDriverConnect(Connection)

sqlQuery()

sqlTable()

sqlFetch()

sqlColumns()

close(Connection)

*specify the version
of the driver TDS_Version=8.0 and which port to use default:1433.

Ex:

sqlFetch(conn,"Tablename")

query<- from="" p="" selet="" t1="" t2="" test="">

sqlQuery(conn,Query)

sqlColumns(conn,"Tablename")

sqlColumns(conn,"Tablename")[c("COLUMN_NAME"),c("TYPE_NAME")]

check dimensions of
a table using dim()

*summary() ->
gives a range of stats on the underlying vector,list,matrix

Which function
should you use to display the structure of an R object?

Str()

Log(dataframe) to
investigate the data

Calculate Groups

tapply()

aggregate()

by()

Attach()

Detach()

Convert to frequency
using prop.table()

Simulations in R

MCMC (Markov Chain
Monte Carlo)

Encryption

Performance Testing

Drawback -->
Uncertainity

Pseudo Random Number
Generator - The Mersenne Twister

Mersenne Prime

set.seed(number)

rnorm(3)

Uniform distribution
- runif(5,min=1,max=2)

Normal distribution
- rnorm(5,mean=2,sd=1)

Gamma distribution -
rgamma(5,shape=2,rate=1)

Binomial
distribution -rbinom(5,size=100,prob=.3)

Multinomial
Distribution - rmultinom(5,size=100,prob=c(.2,.4,.7))

Regression:

eruption.lm = lm(eruptions ~ waiting, data=faithful)

coeffs = coefficients(eruption.lm)

coeffs

coeffs[1]

coeffs[2]

waiting = 80 # the waiting time

duration = coeffs[1] + coeffs[2]*waiting

duration --> Predicted value

loadd ggplot2 or ggplot using load("gplot")

Compare models using ANOVA

X1 <- nbsp="" span="" style="font-family: 'Lucida Console', 'courier new', monospace; font-size: 13px; line-height: 19.5px;">lm(y ~ x1 + x2 + x3 + x4, data=mydata)

Y1 <- lm="" span="" x1="" x2="" y="">

anova(X1, Y1)

anova(X1, Y1)

## Comments