Node.js: Reading content from PDF and CSV files

Last updated on January 19, 2021 A Goodman Loading... Post a comment

Introduction

Node.js is non-blocking I/O so it is efficient when working with files even super large files. PDF, which stands for Portable Document Format, is used to display text and images independently with software and hardware. CSV or Comma-separated Values, is a file format that stores tabular data (numbers and text) in plain text.

This article will show you how to read content from PDF and CSV files using Node.js through 2 end-to-end examples.

The PDF file we’ll use for testing in this tutorial:

https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf

And here’s the CSV:

https://www.kindacode.com/wp-content/uploads/2021/01/kindacode.csv

1. Working with PDF files

We will use a library named pdf-parse to do the job.

1. Copy the PDF from the link above to the folder where you want your example project to live the create a file named index.js.

2. Install pdf-parse by running the this command:

npm install pdf-parse --save

Our file structure:

.
├── dummy.pdf
├── index.js
├── package-lock.json
└── package.json
└── node_modules

3. Add the following to index.js:

const fs = require('fs');
const pdfParse = require('pdf-parse');

const readPdf = async (uri) => {
    const buffer = fs.readFileSync(uri);
    try {
        const data = await pdfParse(buffer);

        // The content
        console.log('Content: ', data.text); 

        // Total page
        console.log('Total pages: ', data.numpages);

        // File information
        console.log('Info: ', data.info);
    }catch(err){
        throw new Error(err);
    }
}

// Testing
const DUMMY_PDF = './dummy.pdf';
readPdf(DUMMY_PDF);

4. Run the code and check the output in the console. It should look like this:

Content:  

Dummy PDF file
Total pages:  1
Info:  {
  PDFFormatVersion: '1.4',
  IsAcroFormPresent: false,
  IsXFAPresent: false,
  Author: 'Evangelos Vlachogiannis',
  Creator: 'Writer',
  Producer: 'OpenOffice.org 2.1',
  CreationDate: "D:20070223175637+02'00'"
}

2. Reading CSV File

We’ll use fast-csv to extract data from a CSV file. It’s very lightweight but powerful and works well with both small and very big CSV files.

1. Create a new folder for this example then create a new file named index.js inside it.

2. Download the CSV file from the link above to the root directory of the project. Its data is simple as below:

Id,Name,Age
1,John Doe,40
2,Kindacode,41
3,Voldermort,71
4,Joe Biden,80
5,Ryo Hanamura,35

3. Install fast-csv:

npm i fast-csv

4. Add this code into index.js:

const fs = require('fs');
const path = require('path');
const csv = require('fast-csv');

// This function reads data from a given CSV file
const readCSV = (filePath) => {
  const readStream = fs.createReadStream(filePath);
  const data = [];
  readStream
    .pipe(csv.parse())
    .on('data', (row) => {
      data.push(row);
      console.log('Id:', row[0]);
      console.log('Name:', row[1]);
      console.log('Age:', row[2]);
      console.log('\n');
    })
    .on('end', (rowCount) => {
      console.log(`${rowCount} rows has been parsed!`);

      // Do something with the data you get
      console.log(data);
    })
    .on('error', (error) => console.error(error));
};

// Try it
const myFile = path.resolve(__dirname, 'kindacode.csv');
readCSV(myFile);

5. Run the code and see the output:

Id: Id
Name: Name
Age: Age


Id: 1
Name: John Doe
Age: 40


Id: 2
Name: Kindacode
Age: 41


Id: 3
Name: Voldermort
Age: 71


Id: 4
Name: Joe Biden
Age: 80


Id: 5
Name: Ryo Hanamura
Age: 35


6 rows has been parsed!
[
  [ 'Id', 'Name', 'Age' ],
  [ '1', 'John Doe', '40' ],
  [ '2', 'Kindacode', '41' ],
  [ '3', 'Voldermort', '71' ],
  [ '4', 'Joe Biden', '80' ],
  [ '5', 'Ryo Hanamura', '35' ]
]

Conclusion

At this point, you should have a better sense and feel more confident when working with PDF and CSV files. If you would like to learn more about Node.js, have a look at these articles:

You can also check out our Node.js category page for the latest tutorials and examples.

Related Articles

guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x