Run Puppeteer and Headless Chrome in a Docker Container - WindowsTips.net - Windows Tips and Tricks with Geek

Tuesday, August 10, 2021

Run Puppeteer and Headless Chrome in a Docker Container

 

Illustration showing the Puppeteer logoa

The Basic Requirements

We’re using a Debian-based image for the purposes of this article. If you’re using a different base, you’ll need to adapt the displayed package manager commands accordingly. The official Node.js image is a suitable starting point that means you don’t need to manually install Node.

Puppeteer is distributed via npm, the Node.js package manager. It bundles the latest build of Chromium within its package, so theoretically an npm install puppeteer would get you running. In practice, a clean Docker environment will lack the dependencies you need to run Chrome.

As it’s ordinarily a heavyweight GUI program, Chrome depends on font, graphics, configuration, and window management libraries. These all need to be installed within your Dockerfile.

At the time of writing, the current dependency list looks like this:

FROM node:latest
WORKDIR /puppeteer
RUN apt-get install -y \
    fonts-liberation \
    gconf-service \
    libappindicator1 \
    libasound2 \
    libatk1.0-0 \
    libcairo2 \
    libcups2 \
    libfontconfig1 \
    libgbm-dev \
    libgdk-pixbuf2.0-0 \
    libgtk-3-0 \
    libicu-dev \
    libjpeg-dev \
    libnspr4 \
    libnss3 \
    libpango-1.0-0 \
    libpangocairo-1.0-0 \
    libpng-dev \
    libx11-6 \
    libx11-xcb1 \
    libxcb1 \
    libxcomposite1 \
    libxcursor1 \
    libxdamage1 \
    libxext6 \
    libxfixes3 \
    libxi6 \
    libxrandr2 \
    libxrender1 \
    libxss1 \
    libxtst6 \
    xdg-utils

The dependencies are being installed manually to facilitate use of the Chromium binary that’s bundled with Puppeteer. This ensures consistency between Puppeteer releases and avoids the possibilities of a new Chrome release arriving with incompatibilities that break Puppeteer.

Now run npm install puppeteer in your local working directory. This will create a package.json and package-lock.json for you to use. In your Dockerfile, copy these files into the container and use npm ci to install Puppeteer.

# (above section omitted)
COPY package.json .
COPY package-lock.json .
RUN npm ci

The final step is to make Puppeteer’s bundled Chromium binary properly executable. Otherwise, you’ll run into permission errors whenever Puppeteer tries to start Chrome.

# (above section omitted)
RUN chmod -R o+rwx node_modules/puppeteer/.local-chromium

You might want to manually install a specific Chrome version in customized environments. Setting the PUPPETEER_SKIP_CHROMIUM_DOWNLOAD environment variable before you run npm ci will disable Puppeteer’s own browser download during installation. This helps slim down your final image.

At this point you should be ready to build your image:

docker build . -t puppeteer:latest

This is a fairly large build process which could take several minutes on a slower internet connection.

Using Puppeteer in Docker

Some special considerations apply to launching Chrome when you’re using Puppeteer in a Dockerized environment. Despite installing all the dependencies, the environment still looks different to most regular Chrome installations, so additional launch flags are required.

Here’s a minimal example of using Puppeteer inside your container:

const puppeteer = require("puppeteer");
 
const browser = await puppeteer.launch({
    headless: true,
    args: [
        "--disable-gpu",
        "--disable-dev-shm-usage",
        "--disable-setuid-sandbox",
        "--no-sandbox",
    ]
});
 
const page = await browser.newPage();
await page.goto("https://example.com");
const ss = await page.screenshot({path: "/screenshot.png"});
 
await page.close();
await browser.close();

This demonstrates a simple script that launches a headless Chrome instance, navigates to a URL, and captures a screenshot of the page. The browser is then closed to avoid wasting system resources.

The important section is the arguments list that’s passed to Chromium as part of the launch() call:

  • disable-gpu – The GPU isn’t usually available inside a Docker container, unless you’ve specially configured the host. Setting this flag explicitly instructs Chrome not to try and use GPU-based rendering.
  • no-sandbox and disable-setuid-sandbox – These disable Chrome’s sandboxing, a step which is required when running as the root user (the default in a Docker container). Using these flags could allow malicious web content to escape the browser process and compromise the host. It’s vital you ensure your Docker containers are strongly isolated from your host. If you’re uncomfortable with this, you’ll need to manually configure working Chrome sandboxing, which is a more involved process.
  • disable-dev-shm-usage – This flag is necessary to avoid running into issues with Docker’s default low shared memory space of 64MB. Chrome will write into /tmp instead.

Add your JavaScript to your container with a COPY instruction. You should find Puppeteer executes successfully, provided proper Chrome flags are used.

No comments:

Post a Comment