Configure Scraping browser

How to configure Scraping browser

Before using the Scraping Browser, some configurations are needed. This article will guide you through setting up your credentials configuration, Scraping Browser, running sample scripts, and real-time browser sessions on the Page Operations Console. Follow our detailed instructions to ensure the efficient use of the Scraping Browser for web scraping.

Before you start using Scraping Browser, get your credentials - the username and password you'll use for the web automation tool. We assume that you have obtained a valid credential. If not, get it from ABCproxy.

Sample Code

We have provided some scraping examples to help you get started with our Scraping Browser more efficiently. You simply need to replace your credentials and target URL, then customize the scripts according to your business scenarios.

To run scripts in your local environment, you can refer to the following examples. Ensure you have installed the required dependencies locally, don’t forget to configure your credentials, and execute your scripts to obtain the desired data.

If the webpage you are accessing might encounter CAPTCHAs or verification challenges, don’t worry—we’ll handle them for you seamlessly.

import asyncio  
from playwright.async_api import async_playwright  
  
AUTH = 'PROXY-FULL-ACCOUNT:PASSWORD';  
SBR_WS_SERVER = f'wss://${AUTH}@upg-scbr.abcproxy.com';  
  
async def run(pw):  
    print('Connecting to Scraping Browser...')  
    browser = await pw.chromium.connect_over_cdp(SBR_WS_SERVER)  
    try:  
        print('Connected! Navigating to Target...')  

        page = await browser.new_page()  
        await page.goto('https://example.com', timeout= 2 * 60 * 1000) 

        # Screenshot
        print('To Screenshot from page')  
        await page.screenshot(path='./remote_screenshot_page.png')  
        # html content
        print('Scraping page content...')  
        html = await page.content()  
        print(html)  
 
    finally:  
        # In order to better use the Scraping browser, be sure to close the browser 
        await browser.close()  
   
async def main():  
    async with async_playwright() as playwright:  
        await run(playwright)  
  
if __name__ == '__main__':  
 asyncio.run(main())

from selenium.webdriver import Remote, ChromeOptions  
from selenium.webdriver.chromium.remote_connection import ChromiumRemoteConnection  
from selenium.webdriver.common.by import By  

# Enter your credentials - the zone name and password  
AUTH = 'PROXY-FULL-ACCOUNT:PASSWORD'  
REMOTE_WEBDRIVER = f'https://{AUTH}@hs-scbr.abcproxy.com'  
  
def main():  
    print('Connecting to Scraping Browser...')  
    sbr_connection = ChromiumRemoteConnection(REMOTE_WEBDRIVER, 'goog', 'chrome')  
    with Remote(sbr_connection, options=ChromeOptions()) as driver:  

        # get target URL
        print('Connected! Navigating to target ...')  
        driver.get('https://example.com') 

        # screenshot 
        print('screenshot to png')  
        driver.get_screenshot_as_file('./remote_page.png')  

        # html content
        print('Get page content...')  
        html = driver.page_source  
        print(html)  
  
if __name__ == '__main__':  
   main()

const puppeteer = require('puppeteer-core');  

const AUTH = 'PROXY-FULL-ACCOUNT:PASSWORD';  
const WS_ENDPOINT = `wss://${AUTH}@upg-scbr.abcproxy.com`;  
  
(async () => {
    console.log('Connecting to Scraping Browser...');  
    const browser = await puppeteer.connect({  
        browserWSEndpoint: SBR_WS_ENDPOINT,
        defaultViewport: {width: 1920, height: 1080}  
   });  
    try {  
        console.log('Connected! Navigating to Target URL');  
        const page = await browser.newPage();  
        
        await page.goto('https://example.com', { timeout: 2 * 60 * 1000 });  

        //1.Screenshot
        console.log('Screenshot to page.png');  
        await page.screenshot({ path: 'remote_screenshot.png' }); 
        console.log('Screenshot be saved');  

        //2.Get content
        console.log('Get page content...');  
        const html = await page.content();  
        console.log("source Htmml: ", html)  

    } finally {  
        // In order to better use the Scraping browser, be sure to close the browser after the script is executed
        await browser.close();  
   }  
})();

const pw = require('playwright');


const AUTH = 'PROXY-FULL-ACCOUNT:PASSWORD';  
const SBR_CDP = `wss://${AUTH}@upg-scbr.abcproxy.com`;  
  
async function main() {  
    console.log('Connecting to Scraping Browser...');  
    const browser = await pw.chromium.connectOverCDP(SBR_CDP);  
    try {  
        console.log('Connected! Navigating to target...');  
        const page = await browser.newPage();
        // Target URL
        await page.goto('https://www.windows.com', { timeout: 2 * 60 * 1000 });  
        // Screenshot
        console.log('To Screenshot from page');  
        await page.screenshot({ path: './remote_screenshot_page.png'});  

        // html content
        console.log('Scraping page content...');  
        const html = await page.content();  
        console.log(html);  
    } finally {  
        // In order to better use the Scraping browser, be sure to close the browser after the script is executed
        await browser.close();  
   }  
}  
  
if (require.main === module) {  
    main().catch(err => {  
        console.error(err.stack || err);  
        process.exit(1);  
   });  
}

The scraping browser session architecture allows each session to perform only one initial navigation. This initial navigation refers to the first instance of loading the target website that will be used for subsequent data extraction. After this initial phase, users can freely navigate the website within the same session through clicks, scrolls, and other interactive actions. However, to start a new scraping job from the initial navigation phase—whether targeting the same site or a different one—a new session must be created.

Time limit of Session

1.Regardless of your operational method, note that session timeout limits apply. If a browser session is not explicitly closed in your script, the system will automatically terminate it after a maximum of 60 minutes.

2.When using the Scraping Browser via the web console, the system enforces a strict one active session per account rule. To ensure optimal performance and experience, always explicitly close the browser session in your script.

PreviousGet started NextStandard Functions

Last updated 2 months ago

How to configure Scraping browser

Sample Code

Scraping Browser Initial Navigation and Workflow Management

Time limit of Session