ASSIGNMENT NO. 14
Using MongoDB create a database of employee performance, employee attendance on the workstation. Perform statistical analysis for the results of the products produced by employees rated as passed ok, damaged products ( 5 samples per batch size 1000) and the portion covered in the training and absentee of the employees during training. Use programming language R. (or R-Python/R-Java) or equivalent assignment using R Programming Language for BiGDATA computing.
R is an integrated suite of software facilities for data manipulation, calculation and graphical display. Among other things it has an effective data handling and storage facility, a suite of operators for calculations on arrays, in particular matrices, a large, coherent, integrated collection of intermediate tools for data analysis, graphical facilities for data analysis and display either directly at the computer or on hardcopy, and a well developed, simple and effective programming language (called „S‟) which includes conditionals, loops, user defined recursive functions and input and output facilities. (Indeed most of the system supplied functions are themselves written in the S language.)
The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software.
R is very much a vehicle for newly developing methods of interactive data analysis. It has developed rapidly, and has been extended by a large collection of packages. However, most programs written in R are essentially ephemeral, written for a single piece of data analysis.
Understanding different types of files:
There are commonly four different types of data files used with R for data storage operations. They are as follows:
• CSV (Comma Separated Values)
• Txt (with Tab Separated Values)
• .RDATA (R’s native data format)
• .rda (R’s native data format)
Installing R packages:
To use the data file with the format specified earlier, we don’t need to install extra R packages. We just need to use the built-in functions available with R.
Importing the data into R: To perform analytics-related activities, we need to use the following functions to get the data into R:
• CSV: read.csv() is intended for reading the comma separated value (CSV) files, where the decimal point is “,”. The retrieved data will be stored into one R object, which is considered as Dataframe.
Dataframe <- read.csv(“data.csv”,sep=”,”)
• TXT: To retrieve the tab separated values, the read.table() function will be used with some important parameters and the return type of this function will be Dataframe type.
Dataframe <- read.table(“data.csv”, sep=”\t”)
• .RDATA: Here, the .RDATA format is used by R for storing the workspace data for a particular time period. It is considered as image file. This will store/retrieve all of the data available in the workspace.
• .rda: This is also R’s native data format, which stores the specific data variable as per requirement.
Exporting the data from R:
To export the existing data object from R and to support data files as per requirements, we need to use the following functions:
• CSV: Write the dataframe object into the csv data file via the following command:
write.csv(mydata, “c:/mydata.csv”, sep=”,”, row.names=FALSE)
• TXT: Write the data with the tab delimiters via the following command:
write.table(mydata, “c:/mydata.txt”, sep=”\t”)
• .RDATA: To store the workspace data variables available to R session, use the following command:
• .rda: This function is used to store specific data objects that can be reused later. Use the following code for saving them to the .rda files.
# column vector
a <- c(1,2,3)
# column vector
b <- c(2,4,6)
# saving it to R (.rda) data format
save(a, b, file=” data_variables_a_and_b.rda”)
Conclusion: This assignments demonstrates concepts of R Programming