Before using the Scraping Browser, some configurations are needed. This article will guide you through setting up your credentials configuration, Scraping Browser, running sample scripts, and real-time browser sessions on the Page Operations Console. Follow our detailed instructions to ensure the efficient use of the Scraping Browser for web scraping.
To start using Scraping Browser, please first obtain your credentials - the username and password you will use for the web automation tool. We assume that you have already installed your preferred web automation tool. If not, please install it.
Sample Code
We have provided some scraping examples to help you get started with our Scraping Browser more efficiently. You simply need to replace your credentials and target URL, then customize the scripts according to your business scenarios.
To run scripts in your local environment, you can refer to the following examples. Ensure you have installed the required dependencies locally, don’t forget to configure your credentials, and execute your scripts to obtain the desired data.
If the webpage you are accessing might encounter CAPTCHAs or verification challenges, don’t worry—we’ll handle them for you seamlessly.
import asyncio
from playwright.async_api import async_playwright
const AUTH = 'PROXY-FULL-ACCOUNT:PASSWORD';
const SBR_WS_SERVER = `wss://${AUTH}@upg-scbr.abcproxy.com`;
async def run(pw):
print('Connecting to Scraping Browser...')
browser = await pw.chromium.connect_over_cdp(SBR_WS_SERVER)
try:
print('Connected! Navigating to Target...')
page = await browser.new_page()
await page.goto('https://example.com', timeout= 2 * 60 * 1000)
# Screenshot
print('To Screenshot from page')
await page.screenshot(path='./remote_screenshot_page.png')
# html content
print('Scraping page content...')
html = await page.content()
print(html)
finally:
# In order to better use the Scraping browser, be sure to close the browser
await browser.close()
async def main():
async with async_playwright() as playwright:
await run(playwright)
if _name_ == '_main_':
asyncio.run(main())
from selenium.webdriver import Remote, ChromeOptions
from selenium.webdriver.chromium.remote_connection import ChromiumRemoteConnection
from selenium.webdriver.common.by import By
# Enter your credentials - the zone name and password
AUTH = 'PROXY-FULL-ACCOUNT:PASSWORD'
REMOTE_WEBDRIVER = f'https://{AUTH}@hs-scbr.abcproxy.com'
def main():
print('Connecting to Scraping Browser...')
sbr_connection = ChromiumRemoteConnection(REMOTE_WEBDRIVER, 'goog', 'chrome')
with Remote(sbr_connection, options=ChromeOptions()) as driver:
# get target URL
print('Connected! Navigating to target ...')
driver.get('https://example.com')
# screenshot
print('screenshot to png')
driver.get_screenshot_as_file('./remote_page.png')
# html content
print('Get page content...')
html = driver.page_source
print(html)
if __name__ == '__main__':
main()
const puppeteer = require('puppeteer-core');
const AUTH = 'PROXY-FULL-ACCOUNT:PASSWORD';
const WS_ENDPOINT = `wss://${AUTH}@upg-scbr.abcproxy.com`;
(async () => {
console.log('Connecting to Scraping Browser...');
const browser = await puppeteer.connect({
browserWSEndpoint: SBR_WS_ENDPOINT,
defaultViewport: {width: 1920, height: 1080}
});
try {
console.log('Connected! Navigating to Target URL');
const page = await browser.newPage();
await page.goto('https://example.com', { timeout: 2 * 60 * 1000 });
//1.Screenshot
console.log('Screenshot to page.png');
await page.screenshot({ path: 'remote_screenshot.png' });
console.log('Screenshot be saved');
//2.Get content
console.log('Get page content...');
const html = await page.content();
console.log("source Htmml: ", html)
} finally {
// In order to better use the Scraping browser, be sure to close the browser after the script is executed
await browser.close();
}
})();
const pw = require('playwright');
const AUTH = 'PROXY-FULL-ACCOUNT:PASSWORD';
const SBR_CDP = `wss://${AUTH}@upg-scbr.abcproxy.com`;
async function main() {
console.log('Connecting to Scraping Browser...');
const browser = await pw.chromium.connectOverCDP(SBR_CDP);
try {
console.log('Connected! Navigating to target...');
const page = await browser.newPage();
// Target URL
await page.goto('https://www.windows.com', { timeout: 2 * 60 * 1000 });
// Screenshot
console.log('To Screenshot from page');
await page.screenshot({ path: './remote_screenshot_page.png'});
// html content
console.log('Scraping page content...');
const html = await page.content();
console.log(html);
} finally {
// In order to better use the Scraping browser, be sure to close the browser after the script is executed
await browser.close();
}
}
if (require.main === module) {
main().catch(err => {
console.error(err.stack || err);
process.exit(1);
});
}
Scraping Browser Initial Navigation and Workflow Management
The scraping browser session architecture allows each session to perform only one initial navigation. This initial navigation refers to the first instance of loading the target website that will be used for subsequent data extraction. After this initial phase, users can freely navigate the website within the same session through clicks, scrolls, and other interactive actions. However, to start a new scraping job from the initial navigation phase—whether targeting the same site or a different one—a new session must be created.
Time limit of Session
1.Regardless of your operational method, note that session timeout limits apply. If a browser session is not explicitly closed in your script, the system will automatically terminate it after a maximum of 60 minutes.
2.When using the Scraping Browser via the web console, the system enforces a strict one active session per account rule. To ensure optimal performance and experience, always explicitly close the browser session in your script.