Suppose that you have to count the number of accesses for each IP address in the access.log
file. And list the IP addresses which accessed more than 100,000 times.
You want to do it with JavaScript or node.js. How can you do that?
You may write a script like as follows:
1 | const fs = require('fs'); |
This code works great for a small file.
But, it would not work for a large file.
It takes so much memory as the size of the access.log
file since readFileSync
returns the lines
variable after reading all the file contents.
Therefore, if the file is too large to fit in memory, the script does not work with the following error.
1 | Error: Cannot create a string longer than 0x3fffffe7 characters |
Using readFile()
, which is the asynchronous version of readFileSync()
, would be a solution to the memory problem.
But, the callback function would be called more than once and the passed data
through callback function is not guaranteed to be passed line by line.
You need the way to read a file line by line, if it is possible, asynchronously. In this article, some ways to process text line by line are presented.
readline
: Standard node.js Module
The standard node.js way to process text line by line is using the readline module.
It seems that the major purpose of readline
module is to make interactive text environment easily.
But, we can make use of the feature to split the input stream by one line at a time.
The rewritten script is as follows:
1 | const fs = require('fs'); |
Note that the line processing part, which was in the for-loop, is in the ‘line’ event handler. And, since it is asynchronous, the post-processing part should be in the ‘close’ event handler.
split
Transform Stream
You may notice the event name of the ‘readline’ module is different from the standard event name if you are familiar to node.js stream.
If you just want to supply a line at a time to stream handler, you may use ‘split
‘ module.
1 | const fs = require('fs'); |
readline
for async/await
For who loves async/await or Generator function, asyncIterator interface of ‘readline’ was experimentally added to node.js since v11.4.0. Using this feature, we can rewrite the script as follows:
1 | const fs = require('fs'); |