WHAT IS DATA FRAME
A data frame is 2 dimensional data structure which can store any type of data. It can store number, integer, character, boolean or complex. Whenever we load a file in R, it creates a data frame. Data frame helps in creating a table like structure in R where we store relational data base structure. But R does not follow any key constraints.
DATA FRAME CREATION METHOD
## Method 1: By function data.frame
child<-c("Joe","Amy","John")
age<-c(8,9,10)
class<-c(4,5,6)
childdata<-data.frame(child,age,class,stringsAsFactors=FALSE)
childdata
child age class
1 Joe 8 4
2 Amy 9 5
3 John 10 6
## Method2: By loading data file.
## Download ozone.csv from following link and save it in location C:/R with name ozone.csv. Upload this data in R using following code.
airquality<-read.table("C:/R/ozone.csv",header=TRUE, sep=",")
## we will get a dataframe with name airquality.
ACCESSING DATA FRAME
A dataframe is like a 2 dimensional structure matrix. Only difference is a matrix can store single type of data, but data frame can store any type of data.
airquality[1,1] ## returns the 1 element of 1st row of dataframe.
airquality[1, ] ## returns the 1st row of data frame.
airquality[ ,1] ## returns the 1st column of data frame.
airquality[1:2,1:4] ## returns first 2 two rows and first four columns of data.
COLUMN NAMES OF DATA FRAME
While uploading data in R, if first row in file contains header then set flag header=TRUE. Each column of data frame can be accessed by directly placing $ ahead of column name.
airquality$Ozone ## returns the ozone column of data
airquality$Solar.R ## returns the solar.R column of data
airquality$Temp ## returns the Temp column of data
QUESTIONS-1
1)Extract first two rows of data frame
airquality[1:2,]
2)How may observations are in this data frame
dim(airquality)
3)What is the value of Ozone in 47th row?
airquality$Ozone[[47]]
4)Extract the rows where Ozone value is above 31 and temp value are above 90.
airquality[airquality$Temp>90&&airquality$Ozone>31,1:6]
5)Take the mean of Solar.R, use function mean.
mean(airquality$Solar.R)
QUESTIONS-2
Download another file crime.csv from link.
This file has robbery and murder data for 50 states of U.S of year 2005.
crime<-read.table("C:/R/crime.csv",header=TRUE, sep=",")
1) Extract those rows where population>5000000
Crime[crime$Population>5000000,]
2) Extract the name of states where murder>6
Crime[crime$Murder>6,1]
3) Extract the name of states where the number of murder is between 3 and 6.
Crime[Crime$Murder>3 & Crime$Murder <6 ,1]
4) Extract the average murder rate of all states where population>5000,000.
mean(Crime[Crime$Population>5000000,2])
5) The name of state with maximum number of crime.
Crime[Crime$Murder==max(Crime$Murder),1]
6) The name of state with maximum number of robbery.
Crime[Crime$Robbery==max(Crime$Robbery),1]
A data frame is 2 dimensional data structure which can store any type of data. It can store number, integer, character, boolean or complex. Whenever we load a file in R, it creates a data frame. Data frame helps in creating a table like structure in R where we store relational data base structure. But R does not follow any key constraints.
DATA FRAME CREATION METHOD
## Method 1: By function data.frame
child<-c("Joe","Amy","John")
age<-c(8,9,10)
class<-c(4,5,6)
childdata<-data.frame(child,age,class,stringsAsFactors=FALSE)
childdata
child age class
1 Joe 8 4
2 Amy 9 5
3 John 10 6
## Method2: By loading data file.
## Download ozone.csv from following link and save it in location C:/R with name ozone.csv. Upload this data in R using following code.
airquality<-read.table("C:/R/ozone.csv",header=TRUE, sep=",")
## we will get a dataframe with name airquality.
ACCESSING DATA FRAME
A dataframe is like a 2 dimensional structure matrix. Only difference is a matrix can store single type of data, but data frame can store any type of data.
airquality[1,1] ## returns the 1 element of 1st row of dataframe.
airquality[1, ] ## returns the 1st row of data frame.
airquality[ ,1] ## returns the 1st column of data frame.
airquality[1:2,1:4] ## returns first 2 two rows and first four columns of data.
COLUMN NAMES OF DATA FRAME
While uploading data in R, if first row in file contains header then set flag header=TRUE. Each column of data frame can be accessed by directly placing $ ahead of column name.
airquality$Ozone ## returns the ozone column of data
airquality$Solar.R ## returns the solar.R column of data
airquality$Temp ## returns the Temp column of data
QUESTIONS-1
1)Extract first two rows of data frame
airquality[1:2,]
2)How may observations are in this data frame
dim(airquality)
3)What is the value of Ozone in 47th row?
airquality$Ozone[[47]]
4)Extract the rows where Ozone value is above 31 and temp value are above 90.
airquality[airquality$Temp>90&&airquality$Ozone>31,1:6]
5)Take the mean of Solar.R, use function mean.
mean(airquality$Solar.R)
QUESTIONS-2
Download another file crime.csv from link.
This file has robbery and murder data for 50 states of U.S of year 2005.
crime<-read.table("C:/R/crime.csv",header=TRUE, sep=",")
1) Extract those rows where population>5000000
Crime[crime$Population>5000000,]
2) Extract the name of states where murder>6
Crime[crime$Murder>6,1]
3) Extract the name of states where the number of murder is between 3 and 6.
Crime[Crime$Murder>3 & Crime$Murder <6 ,1]
4) Extract the average murder rate of all states where population>5000,000.
mean(Crime[Crime$Population>5000000,2])
5) The name of state with maximum number of crime.
Crime[Crime$Murder==max(Crime$Murder),1]
6) The name of state with maximum number of robbery.
Crime[Crime$Robbery==max(Crime$Robbery),1]
No comments:
Post a Comment