Node.js: Reading content from PDF and CSV files

Updated: March 19, 2022 By: A Goodman 2 comments

Introduction

Node.js is non-blocking I/O so it is efficient when working with files even super-large files. PDF, which stands for Portable Document Format, is used to display text and images independently with software and hardware. CSV or Comma-separated Values is a file format that stores tabular data (numbers and text) in plain text.

This article will show you how to read content from PDF and CSV files using Node.js through 2 end-to-end examples.

The PDF file we’ll use for testing in this tutorial:

https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf

And here’s the CSV:

https://www.kindacode.com/wp-content/uploads/2021/01/kindacode.csv

Working with PDF files

We will use a library named pdf-parse to do the job.

1. Copy the PDF from the link above to the folder where you want your example project to live the create a file named index.js.

2. Install pdf-parse by running this command:

npm install pdf-parse --save

Our file structure:

.
├── dummy.pdf
├── index.js
├── package-lock.json
└── package.json
└── node_modules

3. Add the following to index.js:

const fs = require('fs');
const pdfParse = require('pdf-parse');

const readPdf = async (uri) => {
    const buffer = fs.readFileSync(uri);
    try {
        const data = await pdfParse(buffer);

        // The content
        console.log('Content: ', data.text); 

        // Total page
        console.log('Total pages: ', data.numpages);

        // File information
        console.log('Info: ', data.info);
    }catch(err){
        throw new Error(err);
    }
}

// Testing
const DUMMY_PDF = './dummy.pdf';
readPdf(DUMMY_PDF);

4. Run the code and check the output in the console. It should look like this:

Content:  

Dummy PDF file
Total pages:  1
Info:  {
  PDFFormatVersion: '1.4',
  IsAcroFormPresent: false,
  IsXFAPresent: false,
  Author: 'Evangelos Vlachogiannis',
  Creator: 'Writer',
  Producer: 'OpenOffice.org 2.1',
  CreationDate: "D:20070223175637+02'00'"
}

Reading CSV File

We’ll use fast-csv to extract data from a CSV file. It’s very lightweight but powerful and works well with both small and very big CSV files.

1. Create a new folder for this example then create a new file named index.js inside it.

2. Download the CSV file from the link above to the root directory of the project. Its data is simple as below:

Id,Name,Age
1,John Doe,40
2,Kindacode,41
3,Voldermort,71
4,Joe Biden,80
5,Ryo Hanamura,35

3. Install fast-csv:

npm i fast-csv

4. Add this code into index.js:

const fs = require('fs');
const path = require('path');
const csv = require('fast-csv');

// This function reads data from a given CSV file
const readCSV = (filePath) => {
  const readStream = fs.createReadStream(filePath);
  const data = [];
  readStream
    .pipe(csv.parse())
    .on('data', (row) => {
      data.push(row);
      console.log('Id:', row[0]);
      console.log('Name:', row[1]);
      console.log('Age:', row[2]);
      console.log('\n');
    })
    .on('end', (rowCount) => {
      console.log(`${rowCount} rows has been parsed!`);

      // Do something with the data you get
      console.log(data);
    })
    .on('error', (error) => console.error(error));
};

// Try it
const myFile = path.resolve(__dirname, 'kindacode.csv');
readCSV(myFile);

5. Run the code and see the output:

Id: Id
Name: Name
Age: Age

Id: 1
Name: John Doe
Age: 40

Id: 2
Name: Kindacode
Age: 41

Id: 3
Name: Voldermort
Age: 71

Id: 4
Name: Joe Biden
Age: 80

Id: 5
Name: Ryo Hanamura
Age: 35

6 rows has been parsed!
[
  [ 'Id', 'Name', 'Age' ],
  [ '1', 'John Doe', '40' ],
  [ '2', 'Kindacode', '41' ],
  [ '3', 'Voldermort', '71' ],
  [ '4', 'Joe Biden', '80' ],
  [ '5', 'Ryo Hanamura', '35' ]
]

Conclusion

At this point, you should have a better sense and feel more confident when working with PDF and CSV files. Node.js is powerful and awesome. If you would like to learn more about that Javascript runtime, have a look at the following articles:

You can also check out our Node.js category page for the latest tutorials and examples.

Subscribe
Notify of
guest
2 Comments
Inline Feedbacks
View all comments
Victor Karanja Mbugua
Victor Karanja Mbugua
1 year ago

How do I do this with a password-protected pdf?

Related Articles