PandaWebScraper API

PandaWebScraper is a high-performance web scraping API designed to simplify the process of capturing screenshots of web pages in a variety of formats (PDF, PNG, JPEG, HTML). This service offers advanced rendering capabilities, allowing you to scrape modern web applications built with frameworks like ReactJS. By launching real browsers to render target URLs, PandaWebScraper ensures accurate capture of dynamic content that may not render well in traditional scraping methods.

With customizable viewport dimensions, smart retries, and user-agent manipulation, PandaWebScraper stands out as a fast, reliable, and straightforward alternative to tools like Puppeteer and Playwright. Whether you need quick screenshots for documentation, analysis, or archiving, you can easily tailor each capture request to meet specific needs while the underlying system efficiently manages aspects like proxy rotation and request handling.

From e-commerce platforms to social media sites, PandaWebScraper has you covered in capturing high-quality screenshots of various websites.

PandaWebScraper provides two primary endpoints to facilitate your screenshot tasks:

Screenshot API Endpoints

1. Initiate Screenshot Task

This endpoint initiates a new screenshot task with optional parameters.

Screenshot API Endpoints

1. Initiate Screenshot Task

This endpoint initiates a new screenshot task with optional parameters.

Request

/scrape

Parameters:

url: (required) The URL of the web page to capture.
width: (optional) Viewport width in pixels (default: 1920).
height: (optional) Viewport height in pixels (default: 1080).
format: (optional) Output format: 'pdf', 'png', 'jpeg', or 'html' (default: 'pdf').
user-agent: (optional) Replace the entire User-Agent string with a custom value.
user-agent-suffix: (optional) Append text to the default User-Agent string.

Note: If both user-agent and user-agent-suffix are provided, user-agent takes precedence.

Response

{
    "taskId": "abc123def456",
    "status": "pending"
}

2. Check Screenshot Status & Get the file

This endpoint retrieves the status of a previously initiated screenshot task.

Request

curl -X POST "http://localhost:4003/scrape/status" \
  -H "Content-Type: application/json" \
  -d '{"taskId": "abc123def456"}'

Response when completed:

{
    "status": "completed",
    "url": "https://example.r2.cloudflarestorage.com/rapidapi-screenshot/xxx.xxx?xxx=xxx}"
}

Response when pending(because of 404, 502 http codes from the target):

{
    "status": "pending"
}

Response when error:

{
    "status": "error",
    "error": "Error message here"
}

After scrape a target url, we save new file in a new remote url like https://example.r2.cloudflarestorage.com/rapidapi-screenshot/xxxxx.xxxxx?XX=xxx&xxx=xxx. And it will be expired in 24h.