How to get all links from a webpage using Node.js

February 10, 2020 Goodman Loading... Post a comment

In this article, we will crawl all links (including “href” and “title”) using Node.js and 2 packages: cheerio and request-promise.


npm install cheerio request-promise


const $ = require('cheerio');
const rp = require('request-promise');

const url = '';
// I use Wikipedia for the exmaple but you can use other sites you like

rp(url).then(html => {
    const linkObjects = $('a', html);
    // this is a mass object, not an array

    const total = linkObjects.length;
    // The linkObjects has a property named "lenght"

    const links = [];
    // we only need the "href" and "title" of each link

    for(let i = 0; i < total; i++){
            href: linkObjects[i].attribs.href,
            title: linkObjects[i].attribs.title

    // do something else here with links
.catch(err => {

Simple as that. From here you are pretty good to go. Now, you can start building more complex web crawlers 🙂


Related Articles