Monte Carlo Simulation with R

Stochastic Modeling


A stochastic model is a tool for modeling data where uncertainty is present with the input. When input has certain uncertainty or probability associated with it then output will also have some probability associated with it. Hence stochastic modeling helps in predicting outcome when input does not have a fixed value. The input to model can be either a value from defined range or from a probability distribution. One of performing stochastic modeling is through Monte Carlo algorithms.

Monte Carlo Methods



It is a class of algorithms which uses random number in the numerical calculation. So if we have a input coming from a set of numbers or distribution and we have a numerical formula to be calculated then  Monte Carlo is the answer.
The Monte Carlo simulation is a Monte Carlo Method. This technique is used by professional in wide variety of fields as finance, project management, energy, manufacturing, engineering, research and development. It was first developed by Stanislaw Ulam while working on atom bomb to study nuclear cascades. It was named after the famous Casino de Monte Carlo in Monaco.

Monte carlo simulation helps the decision maker by providing a range of outcome along with their probability. Due to this feature it is helpful in understanding the risk and uncertainty in project management, costing, risk analysis and stock market.

Application of Monte Carlo Simulation

 1. Calculating Integrals: There are certain functions which cannot be integrated due to high dimensionality In those scenerio Monte Carlo methods are of great help.

2.Portfolio assessment: In finance, there are lot of parameters which determine a portfolio value and they all have some uncertainty associated. Hence Monte Carlo methods are best in these scenerios.

Steps of Monte Carlo Simulation

1. Identify a mathematical model of process you want to explore.
2. Define the parameters like mean and standard deviation for each factor of your model
3. Create random data according to those parameters.
4. Simulate and analyze the output of the process.

Example of Monte Carlo Simulation in R

Let us take a simple example of calculating profit for a new company which make some parts. Following are the input needed to calculate the profit.

1)Raw material for part (rc)= It can have three values-80,90,100 and these value follows uniform distribution. Which means all three have equal probability of being found.
2)Labour cost (lc)=It can have five values (43,44,45,46,47) and probability of each one is (.1,.2,.4,.2,.1). It follows normal distribution.
3)Initial cost = 1000000 will be needed to set up the production plant.
4)Cost of part = 250. The cost of part being sold will be 250.
5)No of part manufactured in first year=15000

Profit = (250- (rc+lc))*15000 ) - 1000000

So now we have all the input variables along with their probability distribution and numerical model.
Let us apply Monte Carlo Simulation on this model using R.

############################################
## install the package
install.packages("mc2d")
## load the library
library(mc2d)
## create first input parameter with variable input and its probability
labourcost=mcstoc(rempiricalD, values=c(43,44,45,46,47),prob=c(.1,.2,.4,.2,.1))
## create second input parameter with variable input and its probabilit
partcost=mcstoc(rempiricalD,values=c(80,90,100),prob=c(0.2,0.2,0.2))
## put the final numerical formula in another variable.
profit=((249-(labourcost+partcost)*15000)- 1000000
## generate the model by mc function
MC=mc(labourcost,partcost,profit)
## print summary and plot of model
print(MC)
summary(MC)
plot(MC)
hist(MC)
############################################









Connect R with Google Analytics

Google Analytics is being used by analyst for various purposes, like who all are accessing their websites and at what time of day. What are the prominent keywords being entered in search criteria of webpage. It will be very helpful for analysts/professionals if they can directly import data from GA into for further analysis.
Method 1

install.package("RGoogleAnalytics")
require(RGoogleAnalytics)

## It need not be executed in each session as the token is saved in the working directory of R on your computer

token <- Auth(client.id="client Id",client.secret="Client Secret")
save(token,file="token_file")
## In future sessions it can be loaded as follows
 load("./token_file") ,
ValidateToken(token)
query.list<-Init(start.date="2017-5-30",
                         end.date ="2017-5-31",
                         dimensions = "ga:date,ga:hour",
                         metrics = "ga:sessions,ga:pageviews",
                         max.results=100000,
                         sort = "-ga:date",
                         table.id = "ga:table.id")
## Table ID is in the URL of your Google Analyics page. It is everything past the “p” in the URL. Example,  
 https://www.google.com/analytics/web/?hl=en#management/Setting/a48963421w80588688pTABLE_ID_NUMBER

ga.query <- QueryBuilder(query.list)
ga.data <- GetReportData(ga.query, token, split_daywise = T, delay = 5)

The data get saved in data fram ga.data.

 
 

R connectivity with Oracle

 R can be connected with different databases like Oracle, Teradata, Netezza.
 Here I am explaining connectivity with Oracle.

How to connect R with Oracle


##Step1: Install RJDBC package in R

install.packages('RJDBC')
library(RJDBC)

##Step 2: Download Oracle RJDBC Driver.
##Go to http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html. 


Download the ojdbc6.jar file. Place it in a permanent directory.

##Step 3: Create a Driver Object in R. 

jdbcDriver =JDBC("oracle.jdbc.OracleDriver",classPath="/directory/ojdbc6.jar")

##Step 4: Create a Connection to the Oracle Database . 
 jdbcConnection =dbConnect(jdbcDriver, "jdbc:oracle:thin:@//database.hostname.com:port/service_name/sid", "username", "password")

##Step 5: Run Oracle SQL Query.
##dbReadTable: read a table into a data frame

df1=dbReadTable(con,'PC_ITEM')

# dbGetQuery: read the result from a SQL statement to a data frame

df2=dbGetQuery(con,'select * from tabl where to_number(colname)<10')

# dbWriteTable: write a data frame to the schema. It is typically very slow with large tables.

 dbWriteTable(con,'TableName',dataframe)

Functions used in R

R has a huge list of functions. Below are very commonly used functions which are used in day to day life while working in R.
We will use following object to test the functionality of functions.
 m<-matrix(1:12,6,2)
 a<-array(1:8,c(2,2,2))
 d<-data.frame("Amy",1001,c(78,45,89,78,67))

##1. dim function 

It is used to check the dimensions of an object like matrix, array or data frame. Dim function is not applicable on vectors.

dim(m)
[1] 6 2

dim(a)
[1] 2 2 2

dim(d)
[1] 5 3

##2. head(obj,n) function 

It is used to print the first n lines of an object like matrix or array or data frame. By default n is 5. So we write head(m), it will show first five lines of matrix.
head(m, 2)
        [,1] [,2]
[1,]    1    7
[2,]    2    8

##3.tail (obj,n) function 

It is used to print the last n lines of an object like matrix or array or data frame. By default n is 5. So we write tail(m), it will show last five lines of matrix.
tail(m,2)
[,1] [,2]
[5,]    5   11
[6,]    6   1
2

##4. Str(Object) 

It is used to check the structure of any new object. Like for m it has returned that m is an integer matrix with 6 rows and 2 column. Apart from this it also display the data stored in structure.
str(m)
int [1:6, 1:2] 1 2 3 4 5 6 7 8 9 10 ...

##5 sort(object, decreasing=FALSE/TRUE) 

Sort object is to sort the data of an object in ascending or descending order.
v<-c(9,1, 3, -4,0,-9)
sort(v)
[1] -9 -4  0  1  3  9

##6 order(object,decreasing=FALSE)

order object returns the index number of the object in ascending or descending order 
order(c(4,2,7,1,3,9,10,16,13))
[1] 4 2 5 1 3 6 7 9 8

##7 split(x,f) 

##split function divides the data into groups as defined by f.
data(energy)
expand stature
9.21  Obese
7.53  lean
7.48  lean
8.08  lean
8.09  lean
10.15  Obese

split(energy$expand, energy$stature)
 $lean
7.53 7.48.....
$obese
9.2110.15....

 ## 8) unique(object). 

##Unique function returns the unique value inside a object unique(c(1,1,1,2,2,3,3,3,4,4,4))
[1] 1 2 3

## 9) paste(vector1, vector2, sep= , collapse=). 

Paste concatenates the two vectors according to their index number. First element of vector1 gets concatenated with first element of vector2 and value passed in sep will be placed between them. Now all these elements are collapsed togaeher with value of collapse placed between them The output of paste function is a one element vector which has all elements concatenated together.
part1<-c("M","na","i", "Te")
part2<-c("y","me","s","st")
paste(part1,part2,sep="" ,collapse=" ")

[1] "My name is Test"
paste(part1,part2,sep="." ,collapse="-")
[1] "M.y-na.me-i.s-Te.st"

part1<-c(1,3,5,7)
part2<-c(2,4,6,8)
paste(part1,part2,sep="" ,collapse="")
[1] "12345678"



Longitudinal Data Analysis

What is Longitudinal data
It is the collection of few observations over time from various sources such a blood pressure measurement during a marathon (1 hour) for many people. It is different from time series data in duration and source. Time series data is collection of lot of observation for one source.

Case Study
install.package("nlme")
library(nlme)
## We will do the analysis on Orthodont Data. It is a study on 27 children (16 boys and 11 girls). Data is the distance of centre of pituitary gland to the pterygomaxillary fissure. There are four measurement at age 8, 10, 12, 14.
head(Orthodont,10)
  distance age subject gender
1  26         8        M01    Male
2  25         10      M01    Male
3  29         12      M01    Male
4  31         14      M01    Male

## Questions to answer:
1)  Whether distances over time are larger for boys than for girl.
2)  Determine whether rate of change of distance over time is similar for boys and girls.

Step 1: Plot(Orthodont)
Step 2:## Create Scatter plot
           plot(distance~age, data=Orthodont,
                  ylab="distance"
                  xlab="age")
Step 3: ## create scatter plot with smother
          with(Orthodont, scatter.smooth(distance, age, col="blue",
                  ylab="distance", xlab="age", lpars=list(col="red",lwd=3)))

Step 4: fm1<-lmList(distance ~ age | subject, Orthodont)
Step 5: plot(intervals(fm1))

Step 6:## Create Box plot
            library(lattice)
            bwplot(distance~as.factor(age)|Sex, data=Orthodont,
            ylab="Distance",
            xlab="6 year duration-8,1012,14")


Analysis:
1) The trajectory of distance is approximately a linear function of age.
2) The trajectories vary between child.
3) The distance measurement increases with age.
4) The distance trajectories for boys are higher  on an average than girls.
5) There is a population trend as well as subject specific variation in the data.






Translate

Monte Carlo Simulation with R

Stochastic Modeling A stochastic model is a tool for modeling data where uncertainty is present with the input. When input has cert...