SlideShare a Scribd company logo
1 of 60
An Interactive Introduction to R November 2009 Michael E. Driscoll, Ph.D. med@dataspora.com http://www.dataspora.com Daniel Murphy FCAS, MAAA dmurphy@trinostics.com
January 6, 2009
R is a tool for… Data Manipulation connecting to data sources slicing & dicing data Modeling & Computation statistical modeling numerical simulation Data Visualization visualizing fit of models composing statistical graphics
R is an environment
Its interface is plain
Let’s take a tour of some claim datain R
Let’s take a tour of some claim datain R ## load in some Insurance Claim data library(MASS) data(Insurance) Insurance <- edit(Insurance) head(Insurance) dim(Insurance) ## plot it nicely using the ggplot2 package library(ggplot2) qplot(Group, Claims/Holders,       data=Insurance, geom="bar",       stat='identity',       position="dodge",       facets=District ~ .,       fill=Age, ylab="Claim Propensity", xlab="Car Group") ## hypothesize a relationship between Age ~ Claim Propensity ## visualize this hypothesis with a boxplot x11() library(ggplot2) qplot(Age, Claims/Holders,       data=Insurance, geom="boxplot",       fill=Age) ## quantify the hypothesis with linear model m <- lm(Claims/Holders ~ Age + 0, data=Insurance) summary(m)
R is “an overgrown calculator” sum(rgamma(rpois(1,lambda=2),shape=49,scale=.2)))
R is “an overgrown calculator” ,[object Object],> 2+2 4 ,[object Object],> x <- 2+2    ## ‘<-’ is R syntax for ‘=’ or assignment > x^2  16 ,[object Object],> weight <- c(110, 180, 240)      ## three weights > height <- c(5.5, 6.1, 6.2)      ## three heights > bmi <- (weight*4.88)/height^2   ## divides element-wise 17.7  23.6  30.4
R is “an overgrown calculator” ,[object Object],mean(weight) 	   sd(weight)		sqrt(var(weight)) 176.6             65.0			65.0  # same as sd ,[object Object],union		   intersect		 setdiff ,[object Object],   > pbinom(40, 100, 0.5)  ##  P that a coin tossed 100 times    0.028##  that comes up 40 heads is ‘fair’    > pshare <- pbirthday(23, 365, coincident=2)    0.530  ## probability that among 23 people, two share a birthday
Try It! #1 Overgrown Calculator ,[object Object],> 2 + 2       [Hit ENTER] > log(100)    [Hit ENTER] ,[object Object],> 100 * exp(0.05*10) [Hit ENTER] ,[object Object],> year <- (1,2,5,10,25)  [Hit ENTER]   this returns an error.  why? > year <- c(1,2,5,10,25) [Hit ENTER] > 100 * exp(0.05*year)   [Hit ENTER]
R is a numerical simulator  ,[object Object]
let’s simulate 10,000 trials of 100 coin flips.  what’s the distribution of heads?> heads <- rbinom(10^5,100,0.50) > hist(heads)
Functions for Probability Distributions > pnorm(0) 	0.05  > qnorm(0.9) 	1.28 > rnorm(100) 	vector of length 100
Functions for Probability Distributions How to find the functions for lognormal distribution?   1) Use the double question mark ‘??’ to search > ??lognormal 2) Then identify the package  > ?Lognormal 3) Discover the dist functions  dlnorm, plnorm, qlnorm, rlnorm
Try It! #2 Numerical Simulation ,[object Object],> numclaims <- rpois(n, lambda) (hint: use ?rpoisto understand the parameters) ,[object Object],> mean(numclaims) > var(numclaims) ,[object Object],> hist(numclaims)
Getting Data In 	- from Files > Insurance <- read.csv(“Insurance.csv”,header=TRUE) 	  from Databases > con <- dbConnect(driver,user,password,host,dbname) > Insurance <- dbSendQuery(con, “SELECT * FROM claims”) 	  from the Web > con <- url('http://labs.dataspora.com/test.txt') > Insurance <- read.csv(con, header=TRUE) 	   from R objects > load(‘Insurance.RData’)
Getting Data Out to Files write.csv(Insurance,file=“Insurance.csv”) to Databases con <- dbConnect(dbdriver,user,password,host,dbname) dbWriteTable(con, “Insurance”, Insurance)      to R Objects save(Insurance, file=“Insurance.RData”)
Navigating within the R environment listing all variables > ls() examining a variable ‘x’ > str(x) > head(x) > tail(x) > class(x) removing variables > rm(x) > rm(list=ls())    # remove everything
Try It! #3 Data Processing  ,[object Object],library(MASS) head(Insurance)  ## the first 7 rows dim(Insurance)   ## number of rows & columns ,[object Object],write.csv(Insurance,file=“Insurance.csv”, row.names=FALSE) getwd()  ## where am I? ,[object Object],	 remove the first district ,[object Object],Insurance <- read.csv(file=“Insurance.csv”) plot(Claims/Holders ~ Age, data=Insurance)
A Swiss-Army Knife for Data
A Swiss-Army Knife for Data Indexing Three ways to index into a data frame array of integer indices array of character names array of logical Booleans Examples: df[1:3,] df[c(“New York”, “Chicago”),] df[c(TRUE,FALSE,TRUE,TRUE),] df[city == “New York”,]
A Swiss-Army Knife for Data ,[object Object],subset(Insurance, District==1) subset(Insurance, Claims < 20) ,[object Object],transform(Insurance, Propensity=Claims/Holders) cut – cut a continuous value into groups    cut(Insurance$Claims, breaks=c(-1,100,Inf), labels=c('lo','hi'))   Put it all together: create a new, transformed data frame transform(subset(Insurance, District==1),   ClaimLevel=cut(Claims, breaks=c(-1,100,Inf),       labels=c(‘lo’,’hi’)))
A Statistical Modeler R’s has a powerful modeling syntax Models are specified with formulae, like  			y ~ x 	growth ~ sun + water model relationships between continuous and categorical variables. Models are also guide the visualization of relationships in a graphical form
A Statistical Modeler ,[object Object],m <- lm(Claims/Holders ~ Age, data=Insurance) ,[object Object],	summary(m) ,[object Object],	plot(m)
A Statistical Modeler Logistic model 	m <- glm(Age ~ Claims/Holders, data=Insurance,          family=binomial(“logit”)) ) Examine it 	summary(m) Plot it 	plot(m)
Try It! #4 Statistical Modeling ,[object Object],m <- lm(Claims/Holders ~ Age + 0, data=Insurance)  ,[object Object],summary(m) ,[object Object],plot(m)
Visualization:  Multivariate Barplot library(ggplot2) qplot(Group, Claims/Holders,       data=Insurance,       geom="bar",       stat='identity',       position="dodge",       facets=District ~ .,        fill=Age)
Visualization:  Boxplots library(ggplot2) qplot(Age, Claims/Holders,    data=Insurance,   geom="boxplot“) library(lattice) bwplot(Claims/Holders ~ Age,    data=Insurance)
Visualization: Histograms library(ggplot2) qplot(Claims/Holders,   data=Insurance,   facets=Age ~ ., geom="density") library(lattice) densityplot(~ Claims/Holders | Age, data=Insurance, layout=c(4,1)
Try It! #5 Data Visualization ,[object Object],> x <- 1:10 > y <- x^2 > plot(y ~ x) ,[object Object],> library(lattice) > boxplot(Claims/Holders ~ Age, data=Insurance) ,[object Object],> abline()
Getting Help with R Help within R itself for a function > help(func) > ?func For a topic > help.search(topic) > ??topic search.r-project.org Google Code Search  www.google.com/codesearch Stack Overflow  http://stackoverflow.com/tags/R R-help list http://www.r-project.org/posting-guide.html
Six Indispensable Books on R Learning R Data Manipulation Visualization Statistical Modeling
Extending R with Packages Over one thousand user-contributed packages are available on CRAN – the Comprehensive R Archive Network http://cran.r-project.org Install a package from the command-line > install.packages(‘actuar’) Install a package from the GUI menu “Packages”--> “Install packages(s)”
Final Try It!Simulate a Tweedie ,[object Object]
For as many claims as were randomly simulated, simulate a severity from a gamma distribution with shape α=49 and scale θ=0.2 (NB: mean gamma = αθ, variance gamma = αθ2)
Is the total simulated claim amount close to expected?
Calculate usual parameterization (μ,p,φ)of this Tweedie distribution
Extra credit:
Repeat the above 10000 times.
Does your histogram look like Glenn Meyers’?http://www.casact.org/newsletter/index.cfm?fa=viewart&id=5756,[object Object]
P&C Actuarial Models Design • Construction Collaboration • Education Valuable • Transparent Daniel Murphy, FCAS, MAAAdmurphy@trinostics.com 925.381.9869 From Data to Decision Big Data • Analytics • Visualization www.dataspora.com Michael E. Driscoll, Ph.D. med@dataspora.com 415.860.4347 37 Contact Us
Appendices R as a Programming Language Advanced Visualization Embedding R in a Server Environment
R as a Programming Language fibonacci <- function(n) {   fib <- numeric(n)   fib [1:2] <- 1   for (i in 3:n) {        fib[i] <- fib[i-1] + fib[i-2]   }   return(fib[n]) } Image from cover of Abelson & Sussman’stextThe Structure and Interpretation of Computer Languages
Assignment x <- c(1,2,6) x		 a variable x <-	 R’s assignment operator, equivalent to ‘=‘  c(	 a function c which combines its arguments into a vector y <- c(‘apples’,’oranges’) z <- c(TRUE,FALSE) 	c(TRUE,FALSE) -> z These are also valid assignment statements.
Function Calls ,[object Object],output <- function(arg1, arg2, …) ,[object Object], +  -  *  /  ^ ,[object Object],		x <- x/3 	works whether x is a one or many-valued vector
Data Structures in R numeric x <- c(0,2:4) vectors Character y <- c(“alpha”, “b”, “c3”, “4”) logical z <- c(1, 0, TRUE, FALSE) > class(x) [1] "numeric" > x2 <- as.logical(x) > class(x2) [1] “logical”
Data Structures in R lists lst <- list(x,y,z) objects matrices M <- matrix(rep(x,3),ncol=3) data frames* df <- data.frame(x,y,z) > class(df) [1] “data.frame"
Summary of Data Structures ? matrices vectors data frames* lists
Advanced Visualization lattice, ggplot2, and colorspace
ggplot2 =grammar of graphics
ggplot2 =grammar ofgraphics
qplot(log(carat), log(price), data = diamonds, alpha=I(1/20)) + facet_grid(. ~ color) Achieving small multiples with “facets”
lattice = trellis (source: http://lmdvr.r-forge.r-project.org )
list of latticefunctions densityplot(~ speed | type, data=pitch)
visualizing six dimensions of MLB pitches with lattice
xyplot(x ~ y | type, data=pitch, fill.color = pitch$color, panel = function(x,y, fill.color, …, subscripts) {   fill <- fill.color[subscripts] panel.xyplot(x, y, fill= fill, …) })
Beautiful Colors with Colorspace library(“Colorspace”) red <- LAB(50,64,64) blue <- LAB(50,-48,-48) mixcolor(10, red, blue)
efficient plotting with hexbinplot hexbinplot(log(price)~log(carat),data=diamonds,xbins=40)

More Related Content

What's hot

Algebraic Data Types for Data Oriented Programming - From Haskell and Scala t...
Algebraic Data Types forData Oriented Programming - From Haskell and Scala t...Algebraic Data Types forData Oriented Programming - From Haskell and Scala t...
Algebraic Data Types for Data Oriented Programming - From Haskell and Scala t...Philip Schwarz
 
Operators in java presentation
Operators in java presentationOperators in java presentation
Operators in java presentationkunal kishore
 
What is keyword in c programming
What is keyword in c programmingWhat is keyword in c programming
What is keyword in c programmingRumman Ansari
 
Variables & Data Types in R
Variables & Data Types in RVariables & Data Types in R
Variables & Data Types in RRsquared Academy
 
Python Programming ppt
Python Programming pptPython Programming ppt
Python Programming pptismailmrribi
 
Data Types, Variables, and Operators
Data Types, Variables, and OperatorsData Types, Variables, and Operators
Data Types, Variables, and OperatorsMarwa Ali Eissa
 
Transpilers Gone Wild: Introducing Hydra
Transpilers Gone Wild: Introducing HydraTranspilers Gone Wild: Introducing Hydra
Transpilers Gone Wild: Introducing HydraJoshua Shinavier
 
An Introduction to Programming in Java: Arrays
An Introduction to Programming in Java: ArraysAn Introduction to Programming in Java: Arrays
An Introduction to Programming in Java: ArraysMartin Chapman
 
Python Basics | Python Tutorial | Edureka
Python Basics | Python Tutorial | EdurekaPython Basics | Python Tutorial | Edureka
Python Basics | Python Tutorial | EdurekaEdureka!
 
Introduction to Rstudio
Introduction to RstudioIntroduction to Rstudio
Introduction to RstudioOlga Scrivner
 
E-R diagram in Database
E-R diagram in DatabaseE-R diagram in Database
E-R diagram in DatabaseFatiha Qureshi
 
Propositional Logic (Descreate Mathematics)
Propositional Logic (Descreate Mathematics)Propositional Logic (Descreate Mathematics)
Propositional Logic (Descreate Mathematics)Abdullah Al Amin
 
Presentation on dbms(relational calculus)
Presentation on dbms(relational calculus)Presentation on dbms(relational calculus)
Presentation on dbms(relational calculus)yourbookworldanil
 
Boolean and conditional logic in Python
Boolean and conditional logic in PythonBoolean and conditional logic in Python
Boolean and conditional logic in Pythongsdhindsa
 

What's hot (20)

C++ programming
C++ programmingC++ programming
C++ programming
 
Operators and Expressions
Operators and ExpressionsOperators and Expressions
Operators and Expressions
 
Algebraic Data Types for Data Oriented Programming - From Haskell and Scala t...
Algebraic Data Types forData Oriented Programming - From Haskell and Scala t...Algebraic Data Types forData Oriented Programming - From Haskell and Scala t...
Algebraic Data Types for Data Oriented Programming - From Haskell and Scala t...
 
Operators in java presentation
Operators in java presentationOperators in java presentation
Operators in java presentation
 
What is keyword in c programming
What is keyword in c programmingWhat is keyword in c programming
What is keyword in c programming
 
Variables & Data Types in R
Variables & Data Types in RVariables & Data Types in R
Variables & Data Types in R
 
Python Programming ppt
Python Programming pptPython Programming ppt
Python Programming ppt
 
Data Types, Variables, and Operators
Data Types, Variables, and OperatorsData Types, Variables, and Operators
Data Types, Variables, and Operators
 
Data types
Data typesData types
Data types
 
Transpilers Gone Wild: Introducing Hydra
Transpilers Gone Wild: Introducing HydraTranspilers Gone Wild: Introducing Hydra
Transpilers Gone Wild: Introducing Hydra
 
An Introduction to Programming in Java: Arrays
An Introduction to Programming in Java: ArraysAn Introduction to Programming in Java: Arrays
An Introduction to Programming in Java: Arrays
 
Python Basics | Python Tutorial | Edureka
Python Basics | Python Tutorial | EdurekaPython Basics | Python Tutorial | Edureka
Python Basics | Python Tutorial | Edureka
 
R programming
R programmingR programming
R programming
 
Introduction to Rstudio
Introduction to RstudioIntroduction to Rstudio
Introduction to Rstudio
 
E-R diagram in Database
E-R diagram in DatabaseE-R diagram in Database
E-R diagram in Database
 
Propositional Logic (Descreate Mathematics)
Propositional Logic (Descreate Mathematics)Propositional Logic (Descreate Mathematics)
Propositional Logic (Descreate Mathematics)
 
Presentation on dbms(relational calculus)
Presentation on dbms(relational calculus)Presentation on dbms(relational calculus)
Presentation on dbms(relational calculus)
 
Boolean and conditional logic in Python
Boolean and conditional logic in PythonBoolean and conditional logic in Python
Boolean and conditional logic in Python
 
Operators in java
Operators in javaOperators in java
Operators in java
 
Python Introduction
Python IntroductionPython Introduction
Python Introduction
 

Viewers also liked

How to Plug a Leaky Sales Funnel With Facebook Retargeting
How to Plug a Leaky Sales Funnel With Facebook RetargetingHow to Plug a Leaky Sales Funnel With Facebook Retargeting
How to Plug a Leaky Sales Funnel With Facebook RetargetingDigital Marketer
 
10 Mobile Marketing Campaigns That Went Viral and Made Millions
10 Mobile Marketing Campaigns That Went Viral and Made Millions10 Mobile Marketing Campaigns That Went Viral and Made Millions
10 Mobile Marketing Campaigns That Went Viral and Made MillionsMark Fidelman
 
The Beginners Guide to Startup PR #startuppr
The Beginners Guide to Startup PR #startupprThe Beginners Guide to Startup PR #startuppr
The Beginners Guide to Startup PR #startupprOnboardly
 
Lean Community Building: Getting the Most Bang for Your Time & Money
Lean Community Building: Getting the Most Bang for  Your Time & MoneyLean Community Building: Getting the Most Bang for  Your Time & Money
Lean Community Building: Getting the Most Bang for Your Time & MoneyJennifer Lopez
 
Some Advanced Remarketing Ideas
Some Advanced Remarketing IdeasSome Advanced Remarketing Ideas
Some Advanced Remarketing IdeasChris Thomas
 
The Science behind Viral marketing
The Science behind Viral marketingThe Science behind Viral marketing
The Science behind Viral marketingDavid Skok
 
Google Analytics Fundamentals: Set Up and Basics for Measurement
Google Analytics Fundamentals: Set Up and Basics for MeasurementGoogle Analytics Fundamentals: Set Up and Basics for Measurement
Google Analytics Fundamentals: Set Up and Basics for MeasurementOrbit Media Studios
 
How Top Brands Use Referral Programs to Drive Customer Acquisition
How Top Brands Use Referral Programs to Drive Customer AcquisitionHow Top Brands Use Referral Programs to Drive Customer Acquisition
How Top Brands Use Referral Programs to Drive Customer AcquisitionKissmetrics on SlideShare
 
LinkedIn Ads Platform Master Class
LinkedIn Ads Platform Master ClassLinkedIn Ads Platform Master Class
LinkedIn Ads Platform Master ClassLinkedIn
 
The Science of Marketing Automation
The Science of Marketing AutomationThe Science of Marketing Automation
The Science of Marketing AutomationHubSpot
 
Mastering Google Adwords In 30 Minutes
Mastering Google Adwords In 30 MinutesMastering Google Adwords In 30 Minutes
Mastering Google Adwords In 30 MinutesNik Cree
 
A Guide to User Research (for People Who Don't Like Talking to Other People)
A Guide to User Research (for People Who Don't Like Talking to Other People)A Guide to User Research (for People Who Don't Like Talking to Other People)
A Guide to User Research (for People Who Don't Like Talking to Other People)Stephanie Wills
 
Brenda Spoonemore - A biz dev playbook for startups: Why, when and how to do ...
Brenda Spoonemore - A biz dev playbook for startups: Why, when and how to do ...Brenda Spoonemore - A biz dev playbook for startups: Why, when and how to do ...
Brenda Spoonemore - A biz dev playbook for startups: Why, when and how to do ...GeekWire
 
The Essentials of Community Building by Mack Fogelson
The Essentials of Community Building by Mack FogelsonThe Essentials of Community Building by Mack Fogelson
The Essentials of Community Building by Mack FogelsonMackenzie Fogelson
 
User experience doesn't happen on a screen: It happens in the mind.
User experience doesn't happen on a screen: It happens in the mind.User experience doesn't happen on a screen: It happens in the mind.
User experience doesn't happen on a screen: It happens in the mind.John Whalen
 
No excuses user research
No excuses user researchNo excuses user research
No excuses user researchLily Dart
 
10 Ways You're Using AdWords Wrong and How to Correct Those Practices
10 Ways You're Using AdWords Wrong and How to Correct Those Practices 10 Ways You're Using AdWords Wrong and How to Correct Those Practices
10 Ways You're Using AdWords Wrong and How to Correct Those Practices Kissmetrics on SlideShare
 
Stop Leaving Money on the Table! Optimizing your Site for Users and Revenue
Stop Leaving Money on the Table! Optimizing your Site for Users and RevenueStop Leaving Money on the Table! Optimizing your Site for Users and Revenue
Stop Leaving Money on the Table! Optimizing your Site for Users and RevenueJosh Patrice
 

Viewers also liked (20)

How to Plug a Leaky Sales Funnel With Facebook Retargeting
How to Plug a Leaky Sales Funnel With Facebook RetargetingHow to Plug a Leaky Sales Funnel With Facebook Retargeting
How to Plug a Leaky Sales Funnel With Facebook Retargeting
 
10 Mobile Marketing Campaigns That Went Viral and Made Millions
10 Mobile Marketing Campaigns That Went Viral and Made Millions10 Mobile Marketing Campaigns That Went Viral and Made Millions
10 Mobile Marketing Campaigns That Went Viral and Made Millions
 
Intro to Facebook Ads
Intro to Facebook AdsIntro to Facebook Ads
Intro to Facebook Ads
 
The Beginners Guide to Startup PR #startuppr
The Beginners Guide to Startup PR #startupprThe Beginners Guide to Startup PR #startuppr
The Beginners Guide to Startup PR #startuppr
 
Lean Community Building: Getting the Most Bang for Your Time & Money
Lean Community Building: Getting the Most Bang for  Your Time & MoneyLean Community Building: Getting the Most Bang for  Your Time & Money
Lean Community Building: Getting the Most Bang for Your Time & Money
 
Some Advanced Remarketing Ideas
Some Advanced Remarketing IdeasSome Advanced Remarketing Ideas
Some Advanced Remarketing Ideas
 
The Science behind Viral marketing
The Science behind Viral marketingThe Science behind Viral marketing
The Science behind Viral marketing
 
Google Analytics Fundamentals: Set Up and Basics for Measurement
Google Analytics Fundamentals: Set Up and Basics for MeasurementGoogle Analytics Fundamentals: Set Up and Basics for Measurement
Google Analytics Fundamentals: Set Up and Basics for Measurement
 
HTML & CSS Masterclass
HTML & CSS MasterclassHTML & CSS Masterclass
HTML & CSS Masterclass
 
How Top Brands Use Referral Programs to Drive Customer Acquisition
How Top Brands Use Referral Programs to Drive Customer AcquisitionHow Top Brands Use Referral Programs to Drive Customer Acquisition
How Top Brands Use Referral Programs to Drive Customer Acquisition
 
LinkedIn Ads Platform Master Class
LinkedIn Ads Platform Master ClassLinkedIn Ads Platform Master Class
LinkedIn Ads Platform Master Class
 
The Science of Marketing Automation
The Science of Marketing AutomationThe Science of Marketing Automation
The Science of Marketing Automation
 
Mastering Google Adwords In 30 Minutes
Mastering Google Adwords In 30 MinutesMastering Google Adwords In 30 Minutes
Mastering Google Adwords In 30 Minutes
 
A Guide to User Research (for People Who Don't Like Talking to Other People)
A Guide to User Research (for People Who Don't Like Talking to Other People)A Guide to User Research (for People Who Don't Like Talking to Other People)
A Guide to User Research (for People Who Don't Like Talking to Other People)
 
Brenda Spoonemore - A biz dev playbook for startups: Why, when and how to do ...
Brenda Spoonemore - A biz dev playbook for startups: Why, when and how to do ...Brenda Spoonemore - A biz dev playbook for startups: Why, when and how to do ...
Brenda Spoonemore - A biz dev playbook for startups: Why, when and how to do ...
 
The Essentials of Community Building by Mack Fogelson
The Essentials of Community Building by Mack FogelsonThe Essentials of Community Building by Mack Fogelson
The Essentials of Community Building by Mack Fogelson
 
User experience doesn't happen on a screen: It happens in the mind.
User experience doesn't happen on a screen: It happens in the mind.User experience doesn't happen on a screen: It happens in the mind.
User experience doesn't happen on a screen: It happens in the mind.
 
No excuses user research
No excuses user researchNo excuses user research
No excuses user research
 
10 Ways You're Using AdWords Wrong and How to Correct Those Practices
10 Ways You're Using AdWords Wrong and How to Correct Those Practices 10 Ways You're Using AdWords Wrong and How to Correct Those Practices
10 Ways You're Using AdWords Wrong and How to Correct Those Practices
 
Stop Leaving Money on the Table! Optimizing your Site for Users and Revenue
Stop Leaving Money on the Table! Optimizing your Site for Users and RevenueStop Leaving Money on the Table! Optimizing your Site for Users and Revenue
Stop Leaving Money on the Table! Optimizing your Site for Users and Revenue
 

Similar to An Interactive Introduction To R (Programming Language For Statistics)

ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066rahulsm27
 
software engineering modules iii & iv.pptx
software engineering  modules iii & iv.pptxsoftware engineering  modules iii & iv.pptx
software engineering modules iii & iv.pptxrani marri
 
Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Charles Givre
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classificationYanchang Zhao
 
Efficient equity portfolios using mean variance optimisation in R
Efficient equity portfolios using mean variance optimisation in REfficient equity portfolios using mean variance optimisation in R
Efficient equity portfolios using mean variance optimisation in RGregg Barrett
 
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Introduction to programming c and data-structures
Introduction to programming c and data-structures Introduction to programming c and data-structures
Introduction to programming c and data-structures Pradipta Mishra
 
Introduction to programming c and data structures
Introduction to programming c and data structuresIntroduction to programming c and data structures
Introduction to programming c and data structuresPradipta Mishra
 
Econometria aplicada com dados em painel
Econometria aplicada com dados em painelEconometria aplicada com dados em painel
Econometria aplicada com dados em painelAdriano Figueiredo
 
Chris Mc Glothen Sql Portfolio
Chris Mc Glothen Sql PortfolioChris Mc Glothen Sql Portfolio
Chris Mc Glothen Sql Portfolioclmcglothen
 
PorfolioReport
PorfolioReportPorfolioReport
PorfolioReportAlbert Chu
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisUniversity of Illinois,Chicago
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with RYanchang Zhao
 
Buenos Aires Drools Expert Presentation
Buenos Aires Drools Expert PresentationBuenos Aires Drools Expert Presentation
Buenos Aires Drools Expert PresentationMark Proctor
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisUniversity of Illinois,Chicago
 

Similar to An Interactive Introduction To R (Programming Language For Statistics) (20)

Rclass
RclassRclass
Rclass
 
R decision tree
R   decision treeR   decision tree
R decision tree
 
ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
software engineering modules iii & iv.pptx
software engineering  modules iii & iv.pptxsoftware engineering  modules iii & iv.pptx
software engineering modules iii & iv.pptx
 
Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2
 
R studio
R studio R studio
R studio
 
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 
Efficient equity portfolios using mean variance optimisation in R
Efficient equity portfolios using mean variance optimisation in REfficient equity portfolios using mean variance optimisation in R
Efficient equity portfolios using mean variance optimisation in R
 
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Introduction to programming c and data-structures
Introduction to programming c and data-structures Introduction to programming c and data-structures
Introduction to programming c and data-structures
 
Introduction to programming c and data structures
Introduction to programming c and data structuresIntroduction to programming c and data structures
Introduction to programming c and data structures
 
Econometria aplicada com dados em painel
Econometria aplicada com dados em painelEconometria aplicada com dados em painel
Econometria aplicada com dados em painel
 
Chris Mc Glothen Sql Portfolio
Chris Mc Glothen Sql PortfolioChris Mc Glothen Sql Portfolio
Chris Mc Glothen Sql Portfolio
 
PorfolioReport
PorfolioReportPorfolioReport
PorfolioReport
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
Buenos Aires Drools Expert Presentation
Buenos Aires Drools Expert PresentationBuenos Aires Drools Expert Presentation
Buenos Aires Drools Expert Presentation
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 

Recently uploaded

Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 

Recently uploaded (20)

Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 

An Interactive Introduction To R (Programming Language For Statistics)

  • 1. An Interactive Introduction to R November 2009 Michael E. Driscoll, Ph.D. med@dataspora.com http://www.dataspora.com Daniel Murphy FCAS, MAAA dmurphy@trinostics.com
  • 3.
  • 4. R is a tool for… Data Manipulation connecting to data sources slicing & dicing data Modeling & Computation statistical modeling numerical simulation Data Visualization visualizing fit of models composing statistical graphics
  • 5. R is an environment
  • 7. Let’s take a tour of some claim datain R
  • 8. Let’s take a tour of some claim datain R ## load in some Insurance Claim data library(MASS) data(Insurance) Insurance <- edit(Insurance) head(Insurance) dim(Insurance) ## plot it nicely using the ggplot2 package library(ggplot2) qplot(Group, Claims/Holders, data=Insurance, geom="bar", stat='identity', position="dodge", facets=District ~ ., fill=Age, ylab="Claim Propensity", xlab="Car Group") ## hypothesize a relationship between Age ~ Claim Propensity ## visualize this hypothesis with a boxplot x11() library(ggplot2) qplot(Age, Claims/Holders, data=Insurance, geom="boxplot", fill=Age) ## quantify the hypothesis with linear model m <- lm(Claims/Holders ~ Age + 0, data=Insurance) summary(m)
  • 9. R is “an overgrown calculator” sum(rgamma(rpois(1,lambda=2),shape=49,scale=.2)))
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. let’s simulate 10,000 trials of 100 coin flips. what’s the distribution of heads?> heads <- rbinom(10^5,100,0.50) > hist(heads)
  • 15. Functions for Probability Distributions > pnorm(0) 0.05 > qnorm(0.9) 1.28 > rnorm(100) vector of length 100
  • 16. Functions for Probability Distributions How to find the functions for lognormal distribution? 1) Use the double question mark ‘??’ to search > ??lognormal 2) Then identify the package > ?Lognormal 3) Discover the dist functions dlnorm, plnorm, qlnorm, rlnorm
  • 17.
  • 18. Getting Data In - from Files > Insurance <- read.csv(“Insurance.csv”,header=TRUE) from Databases > con <- dbConnect(driver,user,password,host,dbname) > Insurance <- dbSendQuery(con, “SELECT * FROM claims”) from the Web > con <- url('http://labs.dataspora.com/test.txt') > Insurance <- read.csv(con, header=TRUE) from R objects > load(‘Insurance.RData’)
  • 19. Getting Data Out to Files write.csv(Insurance,file=“Insurance.csv”) to Databases con <- dbConnect(dbdriver,user,password,host,dbname) dbWriteTable(con, “Insurance”, Insurance) to R Objects save(Insurance, file=“Insurance.RData”)
  • 20. Navigating within the R environment listing all variables > ls() examining a variable ‘x’ > str(x) > head(x) > tail(x) > class(x) removing variables > rm(x) > rm(list=ls()) # remove everything
  • 21.
  • 22. A Swiss-Army Knife for Data
  • 23. A Swiss-Army Knife for Data Indexing Three ways to index into a data frame array of integer indices array of character names array of logical Booleans Examples: df[1:3,] df[c(“New York”, “Chicago”),] df[c(TRUE,FALSE,TRUE,TRUE),] df[city == “New York”,]
  • 24.
  • 25. A Statistical Modeler R’s has a powerful modeling syntax Models are specified with formulae, like y ~ x growth ~ sun + water model relationships between continuous and categorical variables. Models are also guide the visualization of relationships in a graphical form
  • 26.
  • 27. A Statistical Modeler Logistic model m <- glm(Age ~ Claims/Holders, data=Insurance, family=binomial(“logit”)) ) Examine it summary(m) Plot it plot(m)
  • 28.
  • 29. Visualization: Multivariate Barplot library(ggplot2) qplot(Group, Claims/Holders, data=Insurance, geom="bar", stat='identity', position="dodge", facets=District ~ ., fill=Age)
  • 30. Visualization: Boxplots library(ggplot2) qplot(Age, Claims/Holders, data=Insurance, geom="boxplot“) library(lattice) bwplot(Claims/Holders ~ Age, data=Insurance)
  • 31. Visualization: Histograms library(ggplot2) qplot(Claims/Holders, data=Insurance, facets=Age ~ ., geom="density") library(lattice) densityplot(~ Claims/Holders | Age, data=Insurance, layout=c(4,1)
  • 32.
  • 33. Getting Help with R Help within R itself for a function > help(func) > ?func For a topic > help.search(topic) > ??topic search.r-project.org Google Code Search www.google.com/codesearch Stack Overflow http://stackoverflow.com/tags/R R-help list http://www.r-project.org/posting-guide.html
  • 34. Six Indispensable Books on R Learning R Data Manipulation Visualization Statistical Modeling
  • 35. Extending R with Packages Over one thousand user-contributed packages are available on CRAN – the Comprehensive R Archive Network http://cran.r-project.org Install a package from the command-line > install.packages(‘actuar’) Install a package from the GUI menu “Packages”--> “Install packages(s)”
  • 36.
  • 37. For as many claims as were randomly simulated, simulate a severity from a gamma distribution with shape α=49 and scale θ=0.2 (NB: mean gamma = αθ, variance gamma = αθ2)
  • 38. Is the total simulated claim amount close to expected?
  • 39. Calculate usual parameterization (μ,p,φ)of this Tweedie distribution
  • 41. Repeat the above 10000 times.
  • 42.
  • 43. P&C Actuarial Models Design • Construction Collaboration • Education Valuable • Transparent Daniel Murphy, FCAS, MAAAdmurphy@trinostics.com 925.381.9869 From Data to Decision Big Data • Analytics • Visualization www.dataspora.com Michael E. Driscoll, Ph.D. med@dataspora.com 415.860.4347 37 Contact Us
  • 44. Appendices R as a Programming Language Advanced Visualization Embedding R in a Server Environment
  • 45. R as a Programming Language fibonacci <- function(n) { fib <- numeric(n) fib [1:2] <- 1 for (i in 3:n) { fib[i] <- fib[i-1] + fib[i-2] } return(fib[n]) } Image from cover of Abelson & Sussman’stextThe Structure and Interpretation of Computer Languages
  • 46. Assignment x <- c(1,2,6) x a variable x <- R’s assignment operator, equivalent to ‘=‘ c( a function c which combines its arguments into a vector y <- c(‘apples’,’oranges’) z <- c(TRUE,FALSE) c(TRUE,FALSE) -> z These are also valid assignment statements.
  • 47.
  • 48. Data Structures in R numeric x <- c(0,2:4) vectors Character y <- c(“alpha”, “b”, “c3”, “4”) logical z <- c(1, 0, TRUE, FALSE) > class(x) [1] "numeric" > x2 <- as.logical(x) > class(x2) [1] “logical”
  • 49. Data Structures in R lists lst <- list(x,y,z) objects matrices M <- matrix(rep(x,3),ncol=3) data frames* df <- data.frame(x,y,z) > class(df) [1] “data.frame"
  • 50. Summary of Data Structures ? matrices vectors data frames* lists
  • 51. Advanced Visualization lattice, ggplot2, and colorspace
  • 54. qplot(log(carat), log(price), data = diamonds, alpha=I(1/20)) + facet_grid(. ~ color) Achieving small multiples with “facets”
  • 55. lattice = trellis (source: http://lmdvr.r-forge.r-project.org )
  • 56. list of latticefunctions densityplot(~ speed | type, data=pitch)
  • 57. visualizing six dimensions of MLB pitches with lattice
  • 58. xyplot(x ~ y | type, data=pitch, fill.color = pitch$color, panel = function(x,y, fill.color, …, subscripts) { fill <- fill.color[subscripts] panel.xyplot(x, y, fill= fill, …) })
  • 59. Beautiful Colors with Colorspace library(“Colorspace”) red <- LAB(50,64,64) blue <- LAB(50,-48,-48) mixcolor(10, red, blue)
  • 60. efficient plotting with hexbinplot hexbinplot(log(price)~log(carat),data=diamonds,xbins=40)
  • 61. Embedding R in a Web Server Using Packages & R in a Server Environment
  • 62.
  • 63.

Editor's Notes

  1. These two men can help you. They are Robert Gentleman and Ross Ihaka, the creators of R.R is:free, open sourcecreated by statisticians extensible via packages - over 1000 packagesR is an open source programming language for statistical computing, data analysis, and graphical visualization.It has one million users worldwide, and its user base is growing. While most commonly used within academia, in fields such as computational biology and applied statistics, it is gaining currency in commercial areas such as quantitative finance – it is used by Barclay’s – and business intelligence – both Facebook and Google use R within their firms.It was created by two men at the University of Auckland – pictured in the NYT article on the rightOther languages exist that can do some of what R does, but here’s what sets it apart:1. Created by StatisticiansBo Cowgill, who uses R at Google has said: “the great thing about R is that it was created by statisticians.” By this – I can’t speak for him – that R has unparalleled built-in support for statistics. But he also says “the terrible thing about R is… that it was created by statisticians.” The learning curve can be steep, and the documentation for functions is sometimes sparse. Free, open sourcethe importance of this can’t be understated. anyone can improve to the core language, and in fact, a group of few dozen developers around the world do exactly this. the language is constantly vetted, tweaked, and improved.Extensible via packagesthis is related to the open source nature of the language. R has a core set of functions it uses, but just as Excel has ‘add-ons’ and Matlab has ‘toolkits’, it is extensible with ‘packages’. This is where R is most powerful: there are over 1000 different packages that have been written for R. If there’s a new statistical technique or method that has been published, there’s a good chance it has been implemented in R.Audience survey: How many of you use R regularly? Have ever used R? Have ever heard of R?
  2. These are the three fundamental steps of a data analyst.