Monday, July 9, 2012

Asynchronous programming model in Node.JS

Sometime back I had attended a session conducted by K-MUG and one of the session was ‘How to integrate Node.JS with Windows Azure’. Shiju who is a MVP took the session really good and as the result I started learning the Node.JS technology.I am not the right person to talk about the technology and compare its merits with other existing technologies. But the idea of asynchronous event driven programming really interested me as that is something new,comparing with the existing programming approach.(I had worked with threads,events,callbacks etc…in .Net.But Node.JS seems fully leveraging the async)
Some points what I understood about Node.JS
  • It helps to write javascript at server side.
  • It has or starts its own webserver like IIS and Apache
  • Even IIS can relay the request to NodeJS (node.exe) via IISNode module.
  • NodeJS can be used to develop web sites as well as web services.
  • How Node will handle different protocols / compete with WCF is still confusing me. May be there will be extensions for handling that. Don’t ask why Node.JS need to support .Net specific remoting via net.tcp because I don’t want to write services twice for my external clients and internal clients.
  • Everything except our code runs in different thread.For example all the IO and DB related code runs in different thread. Our code always runs in single thread.
  • The above can be understood by an example of 2 concurrent requests. If there are no IO or other parallelizable requests Node will process requests one by one.But if there is an IO operation in first request ,Node will put that IO operation into another thread and takes the second from the event loop. This gives us a virtual feeling of requests/our code being executed in parallel. This is achieved by using 2 libraries called libev & libeio
  • Better for simple web site / ReST kind of service applications. ie Only if our code which is executing in the single thread completes as soon as possible. If we can route the long running process to another thread such as how IO, DB,Network related operations are performed, its fine.
  • Its production ready and so many big busiest business sites are using it. Check out NodeJS site for the list of big users.
  • Since it uses same language (javascript) the training cost is less and easy for the new developers or even designers who know js.
Ensuring that our code in Node.JS is not running in parallel
This is just a code snippet to prove that our code is not running in parallel in Node.JS. Main reason is Node.JS user code don’t have capability to create it’s own threads. Below is a code snippet which puts a delay of 5 seconds in a normal request. If you hit the URL (http://localhost:8000) from your browser you will see that the response is coming after 5 seconds.
var http=require('http')
http.createServer(function (req, res) {
    var startTime= new Date();
    console.log("Process started at :"+startTime.toString());
    
    res.writeHead(200, {'Content-Type': 'text/plain'});
    while(new Date().getSeconds() < startTime.getSeconds() + 5) {
        // I should have googled for a sleep method.
    }
    res.end('Start time:'+startTime.toLocaleTimeString()+",EndTime:"+new Date().toLocaleTimeString());
    }).listen(8000, "127.0.0.1");
    console.log("Server started @ 127.0.0.1:8000");

The response in browser will be.

Start time:08:46:21,EndTime:08:46:26

Now Open 2 tabs in your browser and hit the same url simultaneously. The result will be

Start time:08:47:21,EndTime:08:47:26

Start time:08:47:26,EndTime:08:47:31

It clearly says that the Node.JS is single threaded from our application point of view and it needs to wait to complete current user code to take another request.If my delay code was to fetch contents from file it should have processed differently as the I/O code will go into another thread.Think about ASP.Net.If you write the same code in ASP.Net and hit simultaneously from 2 browser tabs, the results would be

Start time:08:47:51,EndTime:08:47:56

Start time:08:47:52,EndTime:08:47:57

Ok. What about seeing the Node.JS code execution in parallel.I am altering the delay code to execute a sql query in SQL server. Query is nothing but a WAIT FOR DELAY statement.

var http=require('http')
var sql=require('node-sqlserver')
http.createServer(function (req, res) {
    res.writeHead(200, {'Content-Type': 'text/html'});
    res.write("Process started at :"+new Date().toString()+'<br/>');
    var conn_str = "Driver={SQL Server Native Client 10.0};Server=(local);Database=master;Uid=sa;Pwd=Password!"
    sql.open(conn_str, function (err, conn) { 
        if (err) { 
            res.end('Error in DB opening'+err.toString());
            return; 
        }
        conn.queryRaw("WAITFOR DELAY '000:00:10'", function (err, results) { 
            if (err) { 
                    res.end('Error in query execution' + err);
                    return; 
            }
            res.end('DB execution completed @'+new Date().toLocaleTimeString());
        }); 
        res.write('DB Execution (WAIT FOR DELAY "000:00:10") started @ '+new Date().toLocaleTimeString() +'<br/>');        
    }); 
}).listen(8000, "127.0.0.1");
console.log("Server started @ 127.0.0.1:8000");

If you are not familiar with how to setup SQL Server with NodeJS please refer the below link.
http://weblogs.asp.net/chanderdhall/archive/2012/06/19/microsoft-sql-server-driver-for-nodejs.aspx

After this I tried hitting the url from different browser tabs and I got the below output.

Process started at :Fri Jul 06 2012 20:45:51 GMT+0530 (India Standard Time)
DB Execution (WAIT FOR DELAY "000:00:10") started @ 20:45:51
DB execution completed @20:46:01


Process started at :Fri Jul 06 2012 20:45:53 GMT+0530 (India Standard Time)  
DB Execution (WAIT FOR DELAY "000:00:10") started @ 20:45:53
DB execution completed @20:46:03

Hope you understood the output. The second request was able to get into execution only because the first request entered into SQL execution which happens in different thread. For more details refer the below link.

http://www.quora.com/How-does-IO-concurrency-work-in-node-js-despite-the-whole-app-running-in-a-single-thread

The programming model

All the operations which are to be done after async calls like I/O calls needs to be inside the event handler / callbacks. in simple words nested callbacks. So lets see how to read a file after a sql database call where the file read is depend on the first sql execution result. 

var http=require('http')
var sql=require('node-sqlserver')
var fs=require('fs')
http.createServer(function (req, res) {
    res.writeHead(200, {'Content-Type': 'text/html'});
    res.write("Process started at :"+new Date().toString()+'<br/>');
    var conn_str = "Driver={SQL Server Native Client 10.0};Server=(local);Database=master;Uid=sa;Pwd=Password!"
    sql.open(conn_str, function (err, conn) { 
        if (err) throw err;
        conn.queryRaw("WAITFOR DELAY '000:00:10'", function (err, results) { 
            if (err) throw err;
            res.end('DB execution completed @'+new Date().toLocaleTimeString());
            fs.readFile('joy.txt', function (err, data) {
                if (err) throw err;
                console.log(data);
            });
        }); 
        res.write('DB Execution (WAIT FOR DELAY "000:00:10") started @ '+new Date().toLocaleTimeString() +'<br/>');        
    }); 
}).listen(8000, "127.0.0.1");
console.log("Server started @ 127.0.0.1:8000");

Simple isn’t it? Lets consider one more scenario where there is parallelism in SQL and File operations and another operation needs to be performed after these 2 operations.

But this needs an additional check to ensure that both the operations are completed.ie NodeJS don’t have native beautiful way of handling multiple async callbacks and do operations based on that.So inject our own logic.Keep 2 variables to hold the return state and in the post processing function check the variables. 

var http=require('http')
var sql=require('node-sqlserver')
var fs=require('fs')
http.createServer(function (req, res) {
    res.writeHead(200, {'Content-Type': 'text/html'});
    var bSQLCompleted=false;
    var bFileCompleted=false;
    var conn_str = "Driver={SQL Server Native Client 10.0};Server=(local);Database=master;Uid=sa;Pwd=Password!"
    sql.open(conn_str, function (err, conn) { 
        if (err) throw err;
        conn.queryRaw("WAITFOR DELAY '000:00:05'", function (err, results) { 
            if (err) throw err;
            res.end('DB execution completed @'+new Date().toLocaleTimeString());
            bSQLCompleted=true;
            postSQLnFileReadOperation(bSQLCompleted,bFileCompleted);
        }); 
    }); 
    fs.readFile('joy.txt', function (err, data) {
        if (err) throw err;
        bFileCompleted =true;        
        postSQLnFileReadOperation(bSQLCompleted,bFileCompleted);
    });
}).listen(8000, "127.0.0.1");
console.log("Server started @ 127.0.0.1:8000");
//Accept the http variable if you want to do something specific to output
function postSQLnFileReadOperation(bSQL,bFile){
    if(bSQL && bFile) console.log("Operation after SQL & File read");
}


More links below.
http://raynos.github.com/presentation/shower/controlflow.htm
http://stevehanov.ca/blog/index.php?id=127
http://stackoverflow.com/questions/4234619/how-to-avoid-long-nesting-of-asynchronous-functions-in-node-js
http://stackoverflow.com/questions/5172244/idiomatic-way-to-wait-for-multiple-callbacks-in-node-js

1 comment:

VeeKayBee said...

Please put a tweet button, so that readers can tweet it easily. Nice article :)