PL I – ASSIGNMENT 11

ASSIGNMENT NO.  11

Title:

Map reduce operation with suitable example using MongoDB.

Requirements:

  1. Computer System with Windows/Linux/Open Source Operating System.
  2. JDK
  3. JDBC Connector

Theory:

Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results. For map-reduce operations, MongoDB provides the mapReduce database command. All map-reduce functions in MongoDB are JavaScript and run within the mongod process. Map-reduce operations take the documents of a single collection as the input and can perform any arbitrary sorting and limiting before beginning the map stage. mapReduce can return the results of a map-reduce operation as a document, or may write the results to collections. The input and the output collections may be sharded. Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results. MongoDB uses mapReduce command for map-reduce operations. MapReduce is generally used for processing large data sets.

MapReduce Command:  Following is the syntax of the basic mapReduce command:

>db.collection.mapReduce(
function() {emit(key,value);},  //map function
function(key,values) {return reduceFunction},   //reduce function
{
out: collection,
query: document,
sort: document,
limit: number
})

The map-reduce function first queries the collection, then maps the result documents to emit key-value pairs which is then reduced based on the keys that have multiple values.

In the above syntax:

  1. map is a javascript function that maps a value with a key and emits a key-valur pair
  2. reduce is a javscript function that reduces or groups all the documents having the same key
  3. out specifies the location of the map-reduce query result
  4. query specifies the optional selection criteria for selecting documents
  5. sort specifies the optional sort criteria
  6. limit specifies the optional maximum number of documents to be returned

Using MapReduce:  Consider the following document structure storing user posts. The document stores user_name of the user and the status of post.

{
"post_text": "MongoDB is powerful and easy to use DB",
"user_name": "mark",
"status":"active"
}

Now, a mapReduce function on posts collection is used to select all the active posts, group them on the basis of user_name and then count the number of posts by each user using the following code:

db.posts.mapReduce(
function() { emit(this.user_id,1); },
function(key, values) {return Array.sum(values)},
{
query:{status:"active"},
out:"post_total"
} )

The result will show a total of documents matched with the query (status:”active”), the map function emitted how many documents with key-value pairs and finally the reduce function grouped mapped documents having the same keys.

To see the result of this mapReduce query use the find operator:

 

db.posts.mapReduce(
function() { emit(this.user_id,1); },
function(key, values) {return Array.sum(values)},
{
query:{status:"active"},
out:"post_total"
}
).find()

In similar manner, MapReduce queries can be used to construct large complex aggregation queries. The use of custom Javascript functions makes usage of MapReduce very flexible and powerful.

Conclusion : This assignment demonstrates use of Map-Reduce in MongoDB.

********************** QUERIES ***********************

AIM : Map reduce operation with suitable example using MongoDB.

> use te_a
switched to db te_a
 
> show collections
bank
expt5
sales
system.indexes

> db.bank.insert({"CUSTID":"B1","ACCTYPE":"SB","BRANCH":"PUNE","BALANCE":2300});
WriteResult({ "nInserted" : 1 })

> db.bank.insert({"CUSTID":"A1","ACCTYPE":"SB","BRANCH":"PUNE","BALANCE":8900});
WriteResult({ "nInserted" : 1 })

> db.bank.insert({"CUSTID":"A1","ACCTYPE":"CU","BRANCH":"PUNE","BALANCE":0900});
WriteResult({ "nInserted" : 1 })

> db.bank.insert({"CUSTID":"B1","ACCTYPE":"CU","BRANCH":"PUNE","BALANCE":2900});
WriteResult({ "nInserted" : 1 })

>db.bank.insert({"CUSTID":"B1","ACCTYPE":"CU","BRANCH":"WAGHOLI","BALANCE":2900});
WriteResult({ "nInserted" : 1 })

> db.bank.insert({"CUSTID":"A1","ACCTYPE":"CU","BRANCH":"WAGHOLI","BALANCE":5600});
WriteResult({ "nInserted" : 1 })

> db.bank.find();
{ "_id" : ObjectId("541e535f313dc145de686dfd"), "CUSTID" : "B1", "ACCTYPE" : "SB", "BRANCH" : "PUNE", "BALANCE" : 2300 }
{ "_id" : ObjectId("541e536d313dc145de686dfe"), "CUSTID" : "A1", "ACCTYPE" : "SB", "BRANCH" : "PUNE", "BALANCE" : 8900 }
{ "_id" : ObjectId("541e5379313dc145de686dff"), "CUSTID" : "A1", "ACCTYPE" : "CU", "BRANCH" : "PUNE", "BALANCE" : 900 }
{ "_id" : ObjectId("541e5386313dc145de686e00"), "CUSTID" : "B1", "ACCTYPE" : "CU", "BRANCH" : "PUNE", "BALANCE" : 2900 }
{ "_id" : ObjectId("541e5398313dc145de686e01"), "CUSTID" : "B1", "ACCTYPE" : "CU", "BRANCH" : "WAGHOLI", "BALANCE" : 2900 }
{ "_id" : ObjectId("541e53a3313dc145de686e02"), "CUSTID" : "A1", "ACCTYPE" : "CU", "BRANCH" : "WAGHOLI", "BALANCE" : 5600 }

> db.bank.mapReduce(function(){emit(this.ACCTYPE,1);},function(key,values){return Array.sum(values)},{out:"TOTAL_BALANCE"});
{
     "result" : "TOTAL_BALANCE",
      "timeMillis" : 5,
      "counts" : {
            "input" : 6,
            "emit" : 6,
            "reduce" : 2,
            "output" : 2
      },
      "ok" : 1,
}
 
> db.TOTAL_BALANCE.find();
 
{ "_id" : "CU", "value" : 4 }
{ "_id" : "SB", "value" : 2 }
 
> db.bank.mapReduce(function(){emit(this.CUSTID,1);},function(key,values){return Array.sum(values)},{out:"CUST_DET"});
{
      "result" : "CUST_DET",
      "timeMillis" : 4,
      "counts" : {
            "input" : 6,
            "emit" : 6,
            "reduce" : 2,
            "output" : 2
      },
      "ok" : 1,
}
 
> db.CUST_DET.find();
 
{ "_id" : "A1", "value" : 3 }
{ "_id" : "B1", "value" : 3 }
 
> db.bank.mapReduce(function(){emit(this.CUSTID,1);},function(key,values){return Array.sum(values)},{query:{"BRANCH":"PUNE"},out:"CUST_DET"});
 
{
      "result" : "CUST_DET",
      "timeMillis" : 5,
      "counts" : {
            "input" : 4,
            "emit" : 4,
            "reduce" : 2,
            "output" : 2
      },
      "ok" : 1,
}

> db.CUST_DET.find();
{ "_id" : "A1", "value" : 2 }
{ "_id" : "B1", "value" : 2 }


> db.bank.mapReduce(function(){emit(this.CUSTID,1);},function(key,values){return Array.sum(values)},{query:{"ACCTYPE":"CU"},out:"ACC_DET"});
{
      "result" : "ACC_DET",
      "timeMillis" : 4,
      "counts" : {
            "input" : 4,
            "emit" : 4,
            "reduce" : 2,
            "output" : 2
      },
      "ok" : 1,
}
> db.ACC_DET.find();
{ "_id" : "A1", "value" : 2 }
{ "_id" : "B1", "value" : 2 }

Leave a Reply

Your email address will not be published. Required fields are marked *