How to Use Selenium in NodeJS

TL;DR

Setup: init Node project, install selenium-webdriver + chromedriver, build a Chrome driver and open a page.
Scrape: getPageSource, select with By.css, iterate titles / ratings.
Techniques: wait (until.elementLocated), infinite scroll, type / click flows.
For scale or less hassle, use Scrapingdog to handle proxies, headless, retries.

If you want to scrape dynamic websites using Node.js and a headless browser, tools like Puppeteer and Selenium are great options. We have already covered web scraping with Puppeteer, and today, we’ll learn how to use Selenium with Node.js for web scraping.

Many of you might already be familiar with Selenium if you’ve used it for web scraping with Python. In this article, however, we’ll explore using Selenium with Node.js for web scraping from scratch. We’ll cover topics such as scraping a website, waiting for specific elements to load, and more.

Setting Up Selenium in Node.js

Before diving into how to use Selenium for web scraping, you need to ensure your environment is ready. Follow these steps to install and set up Selenium with Node.js.

Install Node.js

I hope you have already installed Nodejs on your machine and if not then you can download it from here. You can verify the installation with this step.

1node -v

Create a new Node.js project

Create a folder with any name you like. We will store all of our .js files inside this folder.

1mkdir selenium-nodejs-demo
2cd selenium-nodejs-demo

Then initialize package.json file.

1npm init -y

Install Required Packages

To interact with the browser we have to install the selenium-webdriver package.

1npm install selenium-webdriver

Now, if you are going to use the Google Chrome browser then you have to install chromedriver as well.

1npm install chromedriver

We are done with the installation part. Let’s test our setup.

How to Run Selenium with Nodejs

1const { Builder } = require('selenium-webdriver');
2 
3async function testSetup() {
4  let driver = await new Builder().forBrowser('chrome').build();
5  await driver.get('https://scrapingdog.com/');
6  console.log('Browser launched successfully!');
7  await driver.quit();
8}
9 
10testSetup();

First the Builder class from the Selenium WebDriver library is imported to create a new WebDriver instance for browser automation. Then a new WebDriver instance is created to automate Google Chrome. The browser instance is launched using the build() method. In the next step, the driver navigates to scrapingdog.com.

After the browser launches a message is printed for confirmation. Then we are closing the driver using .quit() method.

Extracting Data with Selenium and Nodejs

Let’s take this IMDB page as an example URL for this section.

1const { Builder } = require('selenium-webdriver');
2 
3async function testSetup() {
4  let driver = await new Builder().forBrowser('chrome').build();
5 
6 
7  await driver.get('https://www.imdb.com/chart/moviemeter/');
8 
9 
10  let html = await driver.getPageSource();
11 
12 
13  console.log(html);
14 
15  
16  await driver.quit();
17}
18 
19testSetup();

Using .getPageSource() function we are extracting the raw HTML of the target website. Then finally before closing the browser, we print the raw HTML on the console.

Once you run this code you will see this as a result.

Now, if I want to parse the title and rating of the movies on this page, I have to use the By class to search for a particular CSS selector.

In the above image, you can see that the title of the movie is located inside .ipc-title — title a

The rating part is stored inside the span tag with the CSS selector .ipc-rating-star — imdb span:nth-child(2)

Let’s parse this data using By.

1const { Builder, By } = require('selenium-webdriver');
2 
3async function testSetup() {
4  
5  let driver = await new Builder().forBrowser('chrome').build();
6 
7  try {
8 
9    await driver.get('https://www.imdb.com/chart/moviemeter/');
10    await driver.sleep(5000);
11 
12    let movies = await driver.findElements(By.css('.ipc-title--title a'));
13    let ratings = await driver.findElements(By.css('.ipc-rating-star--imdb span:nth-child(2)'));
14 
15 
16    console.log(`Found ${movies.length} movies and ${ratings.length} ratings.`);
17 
18 
19    for (let i = 0; i

In the above code, I am using .findEements() in order to search for those CSS selectors in the DOM.

Then with the help of a for loop, I am iterating over all the movies and printing their names and ratings. Once you run this code you should see this.

How to do Infinite Scrolling

Many e-commerce websites have infinite scrolling and to reach the bottom we have to use infinite scrolling in order to scrape the data present at the very bottom.

1const { Builder } = require('selenium-webdriver');
2 
3async function infiniteScrollExample() {
4  let driver = await new Builder().forBrowser('chrome').build();
5 
6  try {
7    // Navigate to the target website
8    await driver.get('https://www.imdb.com/chart/top/'); // Replace with your target URL
9    console.log('Page loaded.');
10 
11    let lastHeight = 0;
12 
13    while (true) {
14      // Scroll to the end of the page
15      await driver.executeScript('window.scrollTo(0, document.body.scrollHeight);');
16      console.log('Scrolled to the bottom.');
17 
18      // Wait for 3 seconds to allow content to load
19      await driver.sleep(3000);
20 
21      // Get the current height of the page
22      const currentHeight = await driver.executeScript('return document.body.scrollHeight;');
23 
24      // Break the loop if no new content is loaded
25      if (currentHeight === lastHeight) {
26        console.log('No more content to load. Exiting infinite scroll.');
27        break;
28      }
29 
30      // Update lastHeight for the next iteration
31      lastHeight = currentHeight;
32    }
33 
34  } catch (error) {
35    console.error('An error occurred:', error);
36  } finally {
37    // Quit the driver
38    await driver.quit();
39  }
40}
41 
42infiniteScrollExample();

There is a while loop in the above code which keeps running until the height of the page no longer changes after scrolling, which indicates that no more content is being loaded.

Once the currentHeight becomes equal to lastHeight then only the loop will break.

How to wait for an Element

Many times you will face a scenario when an element might not load in a particular time frame. So, you have to wait for that element before you begin scraping.

1const { Builder, By, until } = require('selenium-webdriver');
2 
3async function waitForSearchBar() {
4  let driver = await new Builder().forBrowser('chrome').build();
5  await driver.get('https://www.imdb.com/chart/top/');
6 
7  let searchBar = await driver.wait(
8    until.elementLocated(By.css('.ipc-title__text')),
9    5000 // Wait for up to 5 seconds
10  );
11 
12  
13  await driver.quit();
14}
15 
16waitForSearchBar();

Here we are waiting for 5 seconds for the selected element. You can refer to the official Selenium documentation to learn more about the wait method.

How to type and click

Sometimes, you may need to scrape content that appears after typing or clicking an element. For example, let’s search for a query on Google. First, we will type the query into Google’s input field. Then, we will perform the search by clicking the search button.

1const { Builder, By } = require('selenium-webdriver');
2 
3async function typeInFieldExample() {
4  let driver = await new Builder().forBrowser('chrome').build();
5 
6  try {
7    // Navigate to a website with an input field
8    await driver.get('https://www.google.com');
9 
10    // Find the search input field and type a query
11    let searchBox = await driver.findElement(By.name('q')); 
12    await searchBox.sendKeys('Scrapingdog'); 
13    await driver.sleep(3000);
14    
15    console.log('Text typed successfully!');
16  } catch (error) {
17    console.error('An error occurred:', error);
18  } finally {
19    await driver.quit();
20  }
21}
22 
23typeInFieldExample();

Using locators like By.id, By.className, By.css, or By.xpath to find the element. Then using .sendKeys() method we typed Scrapingdog in the Google input field. Now, let’s click on the Enter button to search.

1const { Builder, By } = require('selenium-webdriver');
2 
3async function typeInFieldExample() {
4  let driver = await new Builder().forBrowser('chrome').build();
5 
6  try {
7    // Navigate to a website with an input field
8    await driver.get('https://www.google.com');
9 
10    // Find the search input field and type a query
11    let searchBox = await driver.findElement(By.name('q')); // 'q' is the name attribute of Google's search box
12    await searchBox.sendKeys('Scrapingdog'); 
13    await driver.sleep(3000);
14    let searchButton = await driver.findElement(By.name('btnK')); 
15    await searchButton.click(); // Click the button
16    await driver.sleep(3000);
17    console.log('Text typed successfully!');
18  } catch (error) {
19    console.error('An error occurred:', error);
20  } finally {
21    await driver.quit();
22  }
23}
24 
25typeInFieldExample();

Once you run the code you will see the browser will navigate to google.com and then it will type the input search query and hit the enter button on its own. Read more about sendKeys here.

Here are Some Key Takeaways:

The guide explains how to set up Selenium WebDriver with Node.js to automate real browser actions.
It walks through installing required packages like selenium-webdriver and browser drivers such as ChromeDriver.
A sample script demonstrates launching a browser, navigating to a page, and interacting with elements.
It covers handling asynchronous behavior and using waits to manage dynamic content properly.
The tutorial also introduces advanced automation tasks like switching tabs, handling forms, and running browsers in headless mode.

Conclusion

In conclusion, Selenium combined with Node.js is a powerful duo for automating web interactions and performing web scraping tasks efficiently. Whether you’re extracting dynamic content, simulating user actions, or navigating through infinite scrolling pages, Selenium provides the flexibility to handle complex scenarios with ease. By following this guide, you’ve learned how to set up Selenium, perform basic scraping, and interact with real websites, including typing, clicking, scrolling, and waiting for elements to load.

Now, if you prefer not to deal with headless browsers, proxies, and retries yourself, it’s recommended to use a web scraping API like Scrapingdog. The API will take care of all these tedious tasks for you, allowing you to focus solely on collecting the data you need.