R - Notes

The following are basically my notes while studying R and is meant as a reference point for myself
Just a few pointers to anyone preparing for R or studying R:
  • Take a quick look at your statistical math basics before proceeding
  • Before applying any formula on your base data, try to understand what the formula is and how it was derived (this will make it easier for one to understand)
  • Use it in tangent with the Data Analysis in Excel
  • Refer to the cheat sheets available on https://www.rstudio.com/resources/cheatsheets/
  • Segregate the workbench for each module
  • There are best practices that can be incorporated while programming in R
  • Try and jot notes when and where one can... 
  • Refer to existing data-sets embedded in R before jumping into a data.gov file
  • Refer to R programs written already in Azure ML

rnorm() by default has mean 0 and variance 1
head() has its own built in precision
*default settings in R can be modified by the options() function
example:
options(digits = 15)
#will display 15 digits (Max digit for option display --> 22 and min digit --> 0): Error if > 22 --> Error in options(digits = 30) :
#invalid 'digits' parameter, allowed 0...22

#Infinity Operations
Inf/0 --> Inf
Inf * 0 --> Inf
Inf + 0 + (0/0) --> NaN
Inf + 0  --> Inf

*The ls() lists all the variable stored in R memory at a given point in time
*rm() will remove contents from the list

*To figure out the commands in R use the following command ? followed by the function that needs to be leveraged:
?c()
?rand()
?max()

*Functions and Datastructures
sin()
integrate()
plot()
paste()

*Again single valued functions and multi valued functions

*A special vector is called a factor
gl() --> generate levels

*creating a function in R
test<-function p="" x="">
{
x=5
return (x*x+(x^2))
}

*for loop in R

l*apply() vs sapply()

*Binding elements
rbind() --> bind elements in a matrix in a row manner
cbind() --> bind elements in a matrix in a columnar manner

*Every vector/matrix has a data mode....
logical
numerical

*Can be found using mode()

*dimensions in matrices
=defines the number of rows and columns in a matrix

*can be used with dimnames(),rownames(),columnnames()

*Navigating through R package libraries really bad....

*HMISC --> Harrell misc... Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX code, and recoding variables.

*R search path is the R working directory

getwd() --> get working directory
setwd()

*to read in a table format:
testfile <- filename="" p="" read.table="">
read.csv
read.csv2
read.fwf (fixed width file)

*readLines()
scan()--> reads a content of a file into a list or vector

f*ile() connections can create connections to files for read/write purposes
write.table(line,file="myfile",append=TRUE)
f1<-file p="">
close(f1)--> close the file connection

write.table(dataFieldName,filename)
write.csv
write.csv2
base::sink                Send R Output to a File
dump()
dput() --> save complicated R objects (in ASCII format)
dget() --> inverse of dput()

*file in conjunction with open="w" option
R has its own internal binary object
use save() & load() for binary format

*RODBC Package
Common Functions
odbcDriverConnect(Connection)
sqlQuery()
sqlTable()
sqlFetch()
sqlColumns()
close(Connection)

*specify the version of the driver TDS_Version=8.0 and which port to use default:1433.
Ex:
sqlFetch(conn,"Tablename")
query<- from="" p="" selet="" t1="" t2="" test="">
sqlQuery(conn,Query)
sqlColumns(conn,"Tablename")
sqlColumns(conn,"Tablename")[c("COLUMN_NAME"),c("TYPE_NAME")]
check dimensions of a table using dim()

*summary() -> gives a range of stats on the underlying vector,list,matrix









Which function should you use to display the structure of an R object?
Str()

Log(dataframe) to investigate the data


Calculate Groups
tapply()
aggregate()
by()


Attach()
Detach()


Convert to frequency using prop.table()

Simulations in R
MCMC (Markov Chain Monte Carlo)
Encryption
Performance Testing
Drawback --> Uncertainity

Pseudo Random Number Generator - The Mersenne Twister
Mersenne Prime

set.seed(number)
rnorm(3)


Uniform distribution - runif(5,min=1,max=2)
Normal distribution - rnorm(5,mean=2,sd=1)
Gamma distribution - rgamma(5,shape=2,rate=1)
Binomial distribution -rbinom(5,size=100,prob=.3)

Multinomial Distribution - rmultinom(5,size=100,prob=c(.2,.4,.7))

Regression:
eruption.lm = lm(eruptions ~ waiting, data=faithful)
coeffs = coefficients(eruption.lm)
coeffs
coeffs[1]
coeffs[2]
waiting = 80           # the waiting time 
duration = coeffs[1] + coeffs[2]*waiting 
duration --> Predicted value

loadd ggplot2 or ggplot using load("gplot")

Compare models using ANOVA
X1 <- nbsp="" span="" style="font-family: 'Lucida Console', 'courier new', monospace; font-size: 13px; line-height: 19.5px;">lm(y ~ x1 + x2 + x3 + x4, data=mydata)
Y1 <- lm="" span="" x1="" x2="" y="">
anova(X1, Y1)




Comments

Popular posts from this blog

Rhino - ETL

Microsoft acquires LinkedIn

Redshift Experience