Tag Archive | Data Science

R Cheatsheet: Reading XLSX files

#Use the xlsx library

#By default it is not available in base R so we must install package

install.packages("xlsx")
library(xlsx)

#If we use the read.xlsx with only the filename parameter it will cause an error, because we must provide the sheetindex and point out wheter the sheet contains headers for each column

readexcelfile<-function(){
 library(xlsx)
 localcopy<-"./data/cameras.xlsx"
 cameraXlsx<-read.xlsx(localcopy)
 head(cameraXlsx)
}

#Correct use of read.xlsx

readexcelfile<-function(){
library(xlsx)
 localcopy<-"./data/cameras.xlsx"
 cameraXlsx<-read.xlsx(localcopy, sheetIndex = 1, header = TRUE)
 head(cameraXlsx)
}

R Cheatsheet: Files and Directories

Check whether a directory exists, if true then nothing happen

if(!file.exists("test")){

dir.create("test")}

 

Next Step, write a function that downloads a csv file from the internet:

downloadfileurl<-function(){

#Check if a directory exists, otherwise we create 
 if(!file.exists("data")){
 dir.create("data")}
 
 #Next step, download a file from the internet.
 
 #First we create a variable with the url which contains the data:
 fileURl<-"https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"
 
 #Next variable containts the location of local copy of downloaded file
 localcopy<-"./data/cameras.csv"
 
 #In order to obtain an online file we must use the download.file() function
 #Since we are working from a Windows terminal the third parameter (method) should work
 #with the default value. If you're working from a Mac, then you must specify its value to "curl"
 #because that file is available via https protocol
 download.file(fileURl,destfile = localcopy)
 
 #we check the files in that directory
 files<-list.files("./data")
 print(files)
 #Finally we print the date we downloaded that file. This is very important specially because you need
 #to be able to keep track of that file.
 datedownloaded<-date()
 print(datedownloaded)
}

After executing this function we obtain as a result:

result-download-file

Now we check the existence of the new file using the File Explorer:

camerascsv

R Cheatsheet: str function

>summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-5.48800 -0.02978 1.96200 2.20200 4.37300 11.39000
> str(x)
num [1:100] 9.02 7.57 5.15 -3.65 -3.89 …

R Cheatsheet: Create a Matrix

x<- matrix(c(1:28), nrow = 4, ncol = 7)

matrix

R Cheatsheet: Installing packages

#By command:

install.packages("ggplot2")

#Or in R Studio

install_package

R Cheatsheet: Dates and Times

#R stores a date as the number of days counted from “Zero Day” which is January the first 1970. The same principle used by SAS, but the latter uses another “Start Date” or “Zero Day”

#Dates are represented as a “Date Class”

#On the other hand, times (number of seconds from “Start Date”) are represented by any of the following classes:

#POSIXct (a single integer value representing the time)

#POSIXlt (a list of  values representing the time)

#Defining a Date:

bday<-as.date(“1990-01-25”)

#If we need to see the “value” of a particular date class we have to unclass it:

bday

#So, it means it have passed 7329 since January 25th, 1990, the date stored in bday. Unclass is equivalent to the datepart() function of SAS.

#To obtain the current datetime, equals to TODAY() in SAS

dt<-Sys.time

date_and_time

#Of course we can use the POSIXlt to de-construct and obtain a singular value of our date.

posixlt

#strptime is a useful function you can use to convert time in a POSIXlt or POSIXct format.

R Cheatsheet: Utilities

#Get the Working directory:

getwd()

#Set the WK

setwd(“/folder1/folder2”)

#List files of a directory

list.files()

#Clear workspace

rm(list=ls())