How to Build a Web Scraper using JavaScript

Node.js, Async/Await and Headless Browsers

Bret Cameron
6 min readMay 24, 2019
A different kind of scraper… (Image Credit: Jannes Glas / Unsplash)

If you want to collect data from the web, you’ll come across a lot of resources teaching you how to do this using more established back-end tools like Python or PHP. But there’s a lot less guidance out there for the new kid on the block, Node.js.

Thanks to Node.js, JavaScript is a great language to use for a web scraper: not only is Node fast, but you’ll likely end up using a lot of the same methods you’re used to from querying the DOM with front-end JavaScript. Node.js has tools for querying both static and dynamic web pages, and it is well-integrated with lots of useful APIs, node modules and more.

In this article, I’ll walk through a powerful way to use JavaScript to build a web scraper. We’ll also explore one of the key concepts useful for writing robust data-fetching code: asynchronous code.

Asynchronous Code

Fetching data is often one of the first times beginners encounter asynchronous code. By default, JavaScript is synchronous, meaning that events are executed line-by-line. Whenever a function is called, the program waits until the function is returned before moving on to the next line of code.

--

--

Bret Cameron

Writer and developer based in London. On Medium, I mainly write about JavaScript, web development and Rust 💻