
Introduction
Node.js is non-blocking I/O so it is efficient when working with files even super-large files. PDF, which stands for Portable Document Format, is used to display text and images independently with software and hardware. CSV or Comma-separated Values is a file format that stores tabular data (numbers and text) in plain text.
This article will show you how to read content from PDF and CSV files using Node.js through 2 end-to-end examples.
The PDF file we’ll use for testing in this tutorial:
https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf
And here’s the CSV:
https://www.kindacode.com/wp-content/uploads/2021/01/kindacode.csv
Working with PDF files

We will use a library named pdf-parse to do the job.
1. Copy the PDF from the link above to the folder where you want your example project to live the create a file named index.js.
2. Install pdf-parse by running this command:
npm install pdf-parse --save
Our file structure:
.
├── dummy.pdf
├── index.js
├── package-lock.json
└── package.json
└── node_modules
3. Add the following to index.js:
const fs = require('fs');
const pdfParse = require('pdf-parse');
const readPdf = async (uri) => {
const buffer = fs.readFileSync(uri);
try {
const data = await pdfParse(buffer);
// The content
console.log('Content: ', data.text);
// Total page
console.log('Total pages: ', data.numpages);
// File information
console.log('Info: ', data.info);
}catch(err){
throw new Error(err);
}
}
// Testing
const DUMMY_PDF = './dummy.pdf';
readPdf(DUMMY_PDF);
4. Run the code and check the output in the console. It should look like this:
Content:
Dummy PDF file
Total pages: 1
Info: {
PDFFormatVersion: '1.4',
IsAcroFormPresent: false,
IsXFAPresent: false,
Author: 'Evangelos Vlachogiannis',
Creator: 'Writer',
Producer: 'OpenOffice.org 2.1',
CreationDate: "D:20070223175637+02'00'"
}
Reading CSV File
We’ll use fast-csv to extract data from a CSV file. It’s very lightweight but powerful and works well with both small and very big CSV files.
1. Create a new folder for this example then create a new file named index.js inside it.
2. Download the CSV file from the link above to the root directory of the project. Its data is simple as below:
Id,Name,Age
1,John Doe,40
2,Kindacode,41
3,Voldermort,71
4,Joe Biden,80
5,Ryo Hanamura,35
3. Install fast-csv:
npm i fast-csv
4. Add this code into index.js:
const fs = require('fs');
const path = require('path');
const csv = require('fast-csv');
// This function reads data from a given CSV file
const readCSV = (filePath) => {
const readStream = fs.createReadStream(filePath);
const data = [];
readStream
.pipe(csv.parse())
.on('data', (row) => {
data.push(row);
console.log('Id:', row[0]);
console.log('Name:', row[1]);
console.log('Age:', row[2]);
console.log('\n');
})
.on('end', (rowCount) => {
console.log(`${rowCount} rows has been parsed!`);
// Do something with the data you get
console.log(data);
})
.on('error', (error) => console.error(error));
};
// Try it
const myFile = path.resolve(__dirname, 'kindacode.csv');
readCSV(myFile);
5. Run the code and see the output:
Id: Id
Name: Name
Age: Age
Id: 1
Name: John Doe
Age: 40
Id: 2
Name: Kindacode
Age: 41
Id: 3
Name: Voldermort
Age: 71
Id: 4
Name: Joe Biden
Age: 80
Id: 5
Name: Ryo Hanamura
Age: 35
6 rows has been parsed!
[
[ 'Id', 'Name', 'Age' ],
[ '1', 'John Doe', '40' ],
[ '2', 'Kindacode', '41' ],
[ '3', 'Voldermort', '71' ],
[ '4', 'Joe Biden', '80' ],
[ '5', 'Ryo Hanamura', '35' ]
]
Conclusion
At this point, you should have a better sense and feel more confident when working with PDF and CSV files. Node.js is powerful and awesome. If you would like to learn more about that Javascript runtime, have a look at the following articles:
- 2 Ways to Set Default Time Zone in Node.js
- Node.js: How to Compress a File using Gzip format
- Node + Mongoose + TypeScript: Defining Schemas and Models
- Using Docker Compose with Node.js and MongoDB
- Using Axios to download files in Node.js
- How to get all Links from a Webpage using Node.js
You can also check out our Node.js category page for the latest tutorials and examples.
How do I do this with a password-protected pdf?
We’ll see it