How to Scrape Easy Way to Google News SERP in 2023?

3 min readJan 9, 2023

What will be scrapped :

Prerequisites

Install libraries:

pip install requests bs4 google-search-results

google-search-results is a SerpApi API package.

Basic knowledge scraping with CSS selectors

CSS selectors declare which part of the markup a style applies to thus allowing to extract data from matching tags and attributes.

If you haven’t scraped with CSS selectors, there’s a dedicated blog post of mine
about how to use CSS selectors when web-scraping that covers what it is, its pros and cons, and why they matter from a web-scraping perspective.

Separate virtual environment

In short, it’s a thing that creates an independent set of installed libraries including different Python versions that can coexist with each other in the same system thus preventing libraries or Python version conflicts.

If you didn’t work with a virtual environment before, have a look at the
dedicated Python virtual environments tutorial using Virtualenv and Poetry blog post of mine to get a little bit more familiar.

📌Note: this is not a strict requirement for this blog post.

Reduce the chance of being blocked

There’s a chance that a request might be blocked. Have a look
at how to reduce the chance of being blocked while web-scraping, there are eleven methods to bypass blocks from most websites.

Make sure to pass User-Agent, because Google might block your requests eventually and you'll receive a different HTML thus empty output.

User-Agent identifies the browser, its version number, and its host operating system that represents a person (browser) in a Web context that lets servers and network peers identify if it's a bot or not. And we're faking "real" user visit.

Using Google News Result API

The main difference between API and DIY solution is that it’s a quicker approach if you don’t want to create the parser from scratch, maintain it over time or figure out how to scale the number of requests without being blocked.

Basic Hello World example:

from serpapi import GoogleSearch
import json

params = {
    "api_key": "...",           # https://serpapi.com/manage-api-key
    "engine": "google",         # serpapi parsing engine
    "q": "gta san andreas",     # search query
    "gl": "us",                 # country from where search comes from
    "tbm": "nws"                # news results
    # other parameters such as language `hl` and number of news results `num`, etc.
}search = GoogleSearch(params)   # where data extraction happens on the backend
results = search.get_dict()     # JSON - > Python dictionaryfor result in results["news_results"]:
    print(json.dumps(results, indent=2))

Outputs:

{
   "position":1,
   "link":"https://www.sportskeeda.com/gta/5-strange-gta-san-andreas-glitches",
   "title":"5 strange GTA San Andreas glitches",
   "source":"Sportskeeda",
   "date":"9 hours ago",
   "snippet": "GTA San Andreas has a wide assortment of interesting and strange glitches.",
   "thumbnail":"https://serpapi.com/searches/60e71e1f8b7ed2dfbde7629b/images/1394ee64917c752bdbe711e1e56e90b20906b4761045c01a2cefb327f91d40bb.jpeg"
}

Google News Results API with Pagination

If there’s a need to extract all results from all pages, SerpApi has a great Python pagination() a method that iterates over all pages under the hood and returns an iterator:

… to be continued…

Source of the Article: https://serpapi.com/blog/

How to Scrape Easy Way to Google News SERP in 2023?

Prerequisites

Using Google News Result API

Google News Results API with Pagination

Written by Mayur Shinde

No responses yet