How to Scrape Google Careers: Using Code and No Code Approaches

Share:

how to scrape google careers

Table of Content

This article outlines a few methods to scrape Google Careers listings. This could effectively export job listing data to Excel or other formats for easier access and use.

There are two methods to scrape Google Careers:

  1. Scraping Google Careers in Python or JavaScript
  2. Using the ScrapeHero Cloud, Google Careers Scraper, a no-code tool

If you don't like or want to code, ScrapeHero Cloud is just right for you!

Skip the hassle of installing software, programming and maintaining the code. Download this data using ScrapeHero cloud within seconds.

Get Started for Free
Deploy to ScrapeHero Cloud

Building a Google Careers Scraper in Python/JavaScript

In this section, we will guide you on how to scrape Google Careers using either Python or JavaScript. We will utilize the browser automation framework called Playwright to emulate browser behavior in our code.

One of the key advantages of this approach is its ability to bypass common blocks often put in place to prevent scraping. However, familiarity with the Playwright API is necessary to use it effectively.

You could also use Python Requests, LXML, or Beautiful Soup to build a Google Careers Scraper without using a browser or a browser automation library. But bypassing the anti scraping mechanisms put in place can be challenging and is beyond the scope of this article.

Here are the steps to scrape Google Careers listing data using Playwright:

Step 1: Choose Python or JavaScript as your programming language.
Step 2: Install Playwright for your preferred language:

Python
JavaScript
Python


import asyncio
import json
from playwright.async_api import Playwright, async_playwright
search_keyword = "Software Engineer"
search_location = "New York"
pagination_limit = 2
data = []
def save_data():
    """
    Saving the globaly stored data as json
    """
    with open("google_career_data.json", "w") as outfile:
        json.dump(data, outfile, indent=4)
def clean_data(data: str or list) -> str:
    """
    This function will do basic string cleaning. If the input is string
    It will clean the data and return the cleaned data. If it is list,
    It will iterate through each elements clean and join them with a pipe.
    Args:
        data (str or list): The input can be string or list
    Returns:
        str: cleaned string
    """
    if isinstance(data, str):
        data = " ".join(data.split()).strip()
        return data
    data = [" ".join(i.split()).strip() for i in data]
    data = " | ".join(data)
    return data
async def extract_data(page, job_element) -> None:
    """This function is to extract data from the job listings page
    Args:
        page (playwright page object)
        job_element (Playwright locator object)
    """
    # Initializing necessary xpaths
    xpath_title = "//h2[@class='p1N2lc']"
    xpath_min_qualification = "//h3[text()='Minimum qualifications:']/following-sibling::ul[1]/li"
    xpath_prefered_qualification = "//h3[text()='Preferred qualifications:']/following-sibling::ul[1]/li"
    xpath_about_this_job = "//div[@class='aG5W3']/p"
    xpath_responsibilities = '//div[@class="BDNOWe"]/ul/li'
    xpath_job_url = "../../a"
    # Extracting necessary data
    title = await page.locator(xpath_title).inner_text()
    min_qualification = await page.locator(xpath_min_qualification).all_inner_texts()
    preferred_qualifications = await page.locator(xpath_prefered_qualification).all_inner_texts()
    about_this_job = await page.locator(xpath_about_this_job).all_inner_texts()
    responsibilities = await page.locator(xpath_responsibilities).all_inner_texts()
    job_url = await job_element.locator(xpath_job_url).get_attribute("href")
    # Cleaning
    title = clean_data(title)
    min_qualification = clean_data(min_qualification)
    preferred_qualifications = clean_data(preferred_qualifications)
    about_this_job = clean_data(about_this_job)
    responsibilities = clean_data(responsibilities)
    job_url = clean_data(job_url)
    job_url = f"https://www.google.com/about/careers/applications{job_url}"
    data_to_save = {
        "title": title,
        "min_qualification": min_qualification,
        "preferred_qualifications": preferred_qualifications,
        "about_this_job": about_this_job,
        "responsibilities": responsibilities,
        "job_url": job_url,
    }
    # Appending to a list to save
    data.append(data_to_save)
async def parse_listing_page(page, current_page: int) -> None:
    """This function will go through each jobs listed and click it
    and pass the page object to extract_data function to extract the data.
    This function also handles pagination
    Args:
        page (playwright page object)
        current_page (int): current page number
    """
    xpath_learn_more = "//span[text()='Learn more']/following-sibling::a"
    xpath_jobs = "//li[@class='zE6MFb']//h3"
    xpath_title = "//h2[@class='p1N2lc']"
    xpath_next_page = "//div[@class='bsEDOd']//i[text()='chevron_right']"
    if current_page == 1:
        # Clicking Learn more button (For the first page only)
        learn_more_buttons = page.locator(xpath_learn_more)
        first_learn_more_buttons = learn_more_buttons.nth(0)
        await first_learn_more_buttons.click()
    # Locating all listed jobs
    await page.wait_for_selector(xpath_jobs)
    jobs = page.locator(xpath_jobs)
    jobs_count = await jobs.count()
    # Iterating through each jobs
    for i in range(jobs_count):
        # Clicking each job
        job_element = jobs.nth(i)
        await job_element.click()
        await extract_data(page, job_element)
        await page.wait_for_selector(xpath_title)
    # Pagination
    next_page = page.locator(xpath_next_page)
    if await next_page.count() > 0 and current_page < pagination_limit:
        await next_page.click()
        await page.wait_for_selector('//h3[@class="Ki3IFe"]')
        await page.wait_for_timeout(2000)
        current_page += 1
        await parse_listing_page(page, current_page=current_page)
async def run(playwright: Playwright) -> None:
    """This is the main function to initialize the playwright browser
    and create a page. Then do the initial navigations.
    Args:
        playwright (Playwright)
    """
    # Initializing browser and opening a new page
    browser = await playwright.chromium.launch(headless=False)
    context = await browser.new_context()
    page = await context.new_page()
    # Navigating to homepage and clicking the "jobs" icon
    await page.goto("https://careers.google.com/", wait_until="domcontentloaded")
    await page.get_by_role("link", name="Jobs results page").click()
    # Typing the job name and clicking enter
    job_search_box = page.locator("//input[@id='c3']")
    await job_search_box.click()
    await job_search_box.type(search_keyword)
    await job_search_box.press("Enter")
    # Clicking the location searchbox icon
    await page.locator("//h3[text()='Locations']").click()
    location_filter_box = page.locator('//input[@aria-label="Which location(s) do you prefer working out of?"]')
    await location_filter_box.click()
    await location_filter_box.type(search_location, delay=200)
    await location_filter_box.press("Enter")
    await page.wait_for_load_state()
    await page.wait_for_timeout(2000)
    await parse_listing_page(page, current_page=1)
    save_data()
    await context.close()
    await browser.close()
async def main() -> None:
    async with async_playwright() as playwright:
        await run(playwright)
asyncio.run(main())

JavaScript


const { chromium } = require('playwright');
const searchKeyword = "Software Engineer";
const searchLocation = "New York";
const paginationLimit = 2;
let data = [];
/**
 * Saves the globally stored data as JSON.
 */
function saveData() {
    const fs = require('fs');
    fs.writeFileSync("google_career_data.json", JSON.stringify(data, null, 4));
}
/**
 * Basic string cleaning function. If the input is a string,
 * it will clean the data and return the cleaned data. If it is a list,
 * it will iterate through each element, clean it, and join them with a pipe.
 * @param {string | string[]} data - The input can be a string or a list
 * @returns {string} - Cleaned string
 */
function cleanData(data) {
    if (typeof data === "string") {
        return data.replace(/s+/g, " ").trim();
    }
    return data.map((item) => item.replace(/s+/g, " ").trim()).join(" | ");
}
/**
 * Extracts data from the product details page
 * @param {Page} page - playwright page object
 * @param {ElementHandle} jobElement - Playwright locator object
 */
async function extractData(page, jobElement) {
    // Initializing necessary xpaths
    const xpathTitle = "//h2[@class='p1N2lc']";
    const xpathMinQualification = "//h3[text()='Minimum qualifications:']/following-sibling::ul[1]/li";
    const xpathPreferredQualification = "//h3[text()='Preferred qualifications:']/following-sibling::ul[1]/li";
    const xpathAboutThisJob = "//div[@class='aG5W3']/p";
    const xpathResponsibilities = '//div[@class="BDNOWe"]/ul/li';
    const xpathJobUrl = "../../a";
    // Extracting necessary data
    const title = await page.locator(xpathTitle).innerText();
    const minQualification = await page.locator(xpathMinQualification).allInnerTexts();
    const preferredQualifications = await page.locator(xpathPreferredQualification).allInnerTexts();
    const aboutThisJob = await page.locator(xpathAboutThisJob).allInnerTexts();
    const responsibilities = await page.locator(xpathResponsibilities).allInnerTexts();
    const jobUrl = await jobElement.locator(xpathJobUrl).getAttribute("href");
    // Cleaning data
    const cleanedTitle = cleanData(title);
    const cleanedMinQualification = cleanData(minQualification);
    const cleanedPreferredQualifications = cleanData(preferredQualifications);
    const cleanedAboutThisJob = cleanData(aboutThisJob);
    const cleanedResponsibilities = cleanData(responsibilities);
    const cleanedJobUrl = `https://www.google.com/about/careers/applications${cleanData(jobUrl)}`;
    const dataToSave = {
        title: cleanedTitle,
        minQualification: cleanedMinQualification,
        preferredQualifications: cleanedPreferredQualifications,
        aboutThisJob: cleanedAboutThisJob,
        responsibilities: cleanedResponsibilities,
        jobUrl: cleanedJobUrl,
    };
    // Appending to a list to save
    data.push(dataToSave);
}
/**
 * Parses each job listing page, clicks on each job, and extracts data from the details page.
 * Also handles pagination.
 * @param {Page} page - playwright page object
 * @param {number} currentPage - current page number
 */
async function parseListingPage(page, currentPage) {
    // Initializing necessary xpaths
    const xpathLearnMore = "//span[text()='Learn more']/following-sibling::a";
    const xpathJobs = "//li[@class='zE6MFb']//h3";
    const xpathTitle = "//h2[@class='p1N2lc']";
    const xpathNextPage = "//div[@class='bsEDOd']//i[text()='chevron_right']";
    if (currentPage === 1) {
        // Clicking Learn more button (For the first page only)
        const learnMoreButtons = await page.locator(xpathLearnMore);
        const firstLearnMoreButton = learnMoreButtons.nth(0);
        await firstLearnMoreButton.click();
    }
    // Locating all listed jobs
    await page.waitForSelector(xpathJobs);
    const jobs = await page.locator(xpathJobs);
    const jobsCount = await jobs.count();
    // Iterating through each job
    for (let i = 0; i < jobsCount; i++) {
        // Clicking each job
        const jobElement = jobs.nth(i);
        await jobElement.click();
        await extractData(page, jobElement);
        await page.waitForSelector(xpathTitle);
    }
    // Pagination
    const nextPage = await page.locator(xpathNextPage);
    if (await nextPage.count() > 0 && currentPage < paginationLimit) {
        await nextPage.click();
        await page.waitForSelector('//h3[@class="Ki3IFe"]');
        await page.waitForTimeout(2000);
        currentPage += 1;
        await parseListingPage(page, currentPage);
    }
}
/**
 * Main function to initialize the playwright browser,
 * create a page, and do the initial navigations.
 */
async function run() {
    const browser = await chromium.launch({headless: false});
    const context = await browser.newContext();
    const page = await context.newPage();
    // Navigating to homepage and clicking the "jobs" icon
    await page.goto("https://careers.google.com/", { waitUntil: "domcontentloaded" });
    await page.getByRole("link", { name: "Jobs results page" }).click();
    // Typing the job name and clicking enter
    const jobSearchBox = page.locator("//input[@id='c3']");
    await jobSearchBox.click();
    await jobSearchBox.type(searchKeyword);
    await jobSearchBox.press("Enter");
    // Clicking the location search box icon
    await page.locator("//h3[text()='Locations']").click();
    const locationFilterBox = page.locator('//input[@aria-label="Which location(s) do you prefer working out of?"]');
    await locationFilterBox.click();
    await locationFilterBox.type(searchLocation, { delay: 200 });
    await locationFilterBox.press("Enter");
    await page.waitForLoadState();
    await page.waitForTimeout(2000);
    await parseListingPage(page, 1);
    saveData();
    await context.close();
    await browser.close();
}
/**
 * Main async function to run the script.
 */
run()

 

This code shows how to scrape Google Careers using the Playwright library in Python and JavaScript.
The corresponding scripts have two main functions:

  • run function: This function takes a Playwright instance as an input and performs the scraping process. The function launches a Chromium browser instance, navigates to Google Careers, fills in a search query based on role and location, clicks the search button, and waits for the results to be displayed on the page. The save_data function is then called to extract the listing details and store the data in a google_career_data.json file.
  • extract_data function: This function takes a Playwright page object as input and returns a list of dictionaries containing details of the job listings. The details include each role’s title, qualifications required, description, responsibilities, and specific URL.

Finally, the main function uses the async_playwright context manager to execute the run function. A JSON file containing the listings of the Google Careers script you just executed would be created.

Step 4: Run your code and collect the scraped data from Google Careers.


View Code on GitHub

Using No-Code Google Careers Scraper by ScrapeHero Cloud

The Google Careers Scraper by ScrapeHero Cloud is a convenient method for scraping listings of job openings at Google from Google Careers. It provides an easy, no-code method for scraping data, making it accessible for individuals with limited technical skills.
This section will guide you through the steps to set up and use the Google Careers scraper.

  1. Sign up or log in to your ScrapeHero Cloud account.
  2. Go to the Google Careers Scraper by ScrapeHero Cloud.
  3. Add the scraper to your account. (Don’t forget to verify your email if you haven’t already.)
  4. Add the Google Careers listing page URL to start the scraper. If it’s just a single query, enter it in the field provided and choose the number of pages to scrape.
    You can get the careers listing URL from the Google Careers search results page.

      1. Input role and location of choice. to scrape Google Careers, first input your search query

     

    1. Copy the targeted Google careers listing page URL. to scrape google careers, copy the URL of the job listings page

     

  5. To scrape results for multiple queries, switch to Advance Mode, and in the Input tab, add the listing page URL to the SearchQuery field and save the settings.
  6. To start the scraper, click on the Gather Data button.
  7. The scraper will start fetching data for your queries, and you can track its progress under the Jobs tab.
  8. Once finished, you can view or download the data from the same.
  9. You can also export the job listings data into an Excel spreadsheet from here. Click on the Download Data, select “Excel,” and open the downloaded file using Microsoft Excel.

Uses cases of Google Careers Listings Data

For individuals actively seeking job opportunities at Google, scraping Google Careers can improve their chances of landing a position, here’s how:

Real-Time Job Alerts

Using web scraping techniques on Google Careers, job seekers can set up real-time alerts for specific job roles, locations, or keywords. This ensures that they receive immediate notifications whenever relevant job openings are posted. Staying updated on the latest opportunities gives candidates a competitive advantage and increases their chances of applying early.

Analyzing Job Requirements

Career data scraping enables candidates to analyze the requirements and qualifications Google seeks for various positions. By studying the skills, experience, and educational background desired by the company, candidates can tailor their resumes and cover letters accordingly, increasing the likelihood of catching the recruiter’s attention.

Company Insights

Scraping Google Careers listings provides valuable insights into the company’s hiring patterns and trends. Understanding the frequency and types of job openings can help candidates identify recurring opportunities and areas where Google is actively recruiting.

Web scraping can help candidates track Google’s hiring trends. Observing when the company tends to increase hiring or focuses on specific roles can offer a broader perspective on the company’s current priorities and potential upcoming opportunities.

Preparing for Interviews

Reviewing past job descriptions and requirements can assist candidates in preparing for interviews by anticipating potential questions and understanding the company’s expectations.

 Frequently Asked Questions

What is Google Careers scraping?

Google Careers scraping refers to extracting job listing data from a pool of openings at Google. This process allows for candidates to combine web scraping insights with other job search strategies to maximize their chances of securing a position at Google.

What is the subscription fee for the Google Careers Scraper by ScrapeHero?

To know more about the pricing, visit the pricing page.

Is it legal to scrape Google Careers?

Legality depends on the legal jurisdiction, i.e., laws specific to the country and the locality. Gathering or scraping publicly available information is not illegal.

Generally, Web scraping is legal if you are scraping publicly available data.
Please refer to our Legal Page to learn more about the legality of web scraping.

Legal information

We can help with your data or automation needs

Turn the Internet into meaningful, structured and usable data



Please DO NOT contact us for any help with our Tutorials and Code using this form or by calling us, instead please add a comment to the bottom of the tutorial page for help

Table of content

Scrape any website, any format, no sweat.

ScrapeHero is the real deal for enterprise-grade scraping.

Ready to turn the internet into meaningful and usable data?

Contact us to schedule a brief, introductory call with our experts and learn how we can assist your needs.

Continue Reading

NoSQL vs. SQL databases

Stuck Choosing a Database? Explore NoSQL vs. SQL Databases in Detail

Find out which SQL and NoSQL databases are best suited to store your scraped data.
Scrape JavaScript-Rich Websites

Upgrade Your Web Scraping Skills: Scrape JavaScript-Rich Websites

Learn all about scraping JavaScript-rich websites.
Web scraping with mechanicalsoup

Ditch Multiple Libraries by Web Scraping with MechanicalSoup

Learn how you can replace Python requests and BeautifulSoup with MechanicalSoup.
ScrapeHero Logo

Can we help you get some data?