R Data Frame

WHAT IS DATA FRAME
A data frame is 2 dimensional data structure which can store any type of data. It can store number, integer, character, boolean or complex. Whenever we load a file in R, it creates a data frame. Data frame helps in creating a table like structure in R where we store relational data base structure. But R does not follow any key constraints.
DATA FRAME CREATION METHOD
## Method 1: By function data.frame
child<-c("Joe","Amy","John")
age<-c(8,9,10)
class<-c(4,5,6)
childdata<-data.frame(child,age,class,stringsAsFactors=FALSE)
childdata
     child age class
1   Joe     8     4
2   Amy   9     5
3  John  10     6


## Method2: By loading data file.
## Download ozone.csv from following link and save it in location C:/R with name ozone.csv. Upload this data in R using following code.
airquality<-read.table("C:/R/ozone.csv",header=TRUE, sep=",")
## we will get a dataframe with name airquality.
ACCESSING DATA FRAME
A dataframe is like a 2 dimensional structure matrix. Only difference is a matrix can store single type of data, but data frame can store any type of data.
airquality[1,1]  ## returns the 1 element of 1st row of dataframe.
airquality[1, ]   ## returns the 1st row of data frame.
airquality[ ,1]   ## returns the 1st column of data frame.
airquality[1:2,1:4] ## returns first 2 two rows and first four columns of data.
COLUMN NAMES OF DATA FRAME
While uploading data in R, if first row in file contains header then set flag header=TRUE. Each column of data frame can be accessed by directly placing $ ahead of column name.
airquality$Ozone  ## returns the ozone column of data
airquality$Solar.R  ## returns the solar.R column of data
airquality$Temp     ## returns the Temp column of data
QUESTIONS-1
1)Extract first two rows of data frame
airquality[1:2,]
2)How may observations are in this data frame
dim(airquality)
3)What is the value of Ozone in 47th row?
airquality$Ozone[[47]]
4)Extract the rows where Ozone value is above 31 and  temp value are above 90.
airquality[airquality$Temp>90&&airquality$Ozone>31,1:6]
5)Take the mean of Solar.R, use function mean.
mean(airquality$Solar.R)
QUESTIONS-2
Download another file crime.csv from link. 
This file has robbery and murder data for 50 states of U.S of year 2005.
crime<-read.table("C:/R/crime.csv",header=TRUE, sep=",")
1) Extract those rows where population>5000000
Crime[crime$Population>5000000,]
2) Extract the name of states where murder>6
Crime[crime$Murder>6,1]
3) Extract the name of states where the number of murder is between 3 and 6.
    Crime[Crime$Murder>3 & Crime$Murder <6 ,1]
4) Extract the average murder rate of all states where population>5000,000.
 mean(Crime[Crime$Population>5000000,2])
5) The name of state with maximum number of crime.
Crime[Crime$Murder==max(Crime$Murder),1]
6) The name of state with maximum number of robbery.
Crime[Crime$Robbery==max(Crime$Robbery),1]


No comments:

Post a Comment

Translate

Monte Carlo Simulation with R

Stochastic Modeling A stochastic model is a tool for modeling data where uncertainty is present with the input. When input has cert...