Google Search from a Python App

PythonApiGoogle Search-Api

Python Problem Overview


I'm trying to run a google search query from a python app. Is there any python interface out there that would let me do this? If there isn't does anyone know which Google API will enable me to do this. Thanks.

Python Solutions


Solution 1 - Python

There's a simple example here (peculiarly missing some quotes;-). Most of what you'll see on the web is Python interfaces to the old, discontinued SOAP API -- the example I'm pointing to uses the newer and supported AJAX API, that's definitely the one you want!-)

Edit: here's a more complete Python 2.6 example with all the needed quotes &c;-)...:

#!/usr/bin/python
import json
import urllib

def showsome(searchfor):
  query = urllib.urlencode({'q': searchfor})
  url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
  search_response = urllib.urlopen(url)
  search_results = search_response.read()
  results = json.loads(search_results)
  data = results['responseData']
  print 'Total results: %s' % data['cursor']['estimatedResultCount']
  hits = data['results']
  print 'Top %d hits:' % len(hits)
  for h in hits: print ' ', h['url']
  print 'For more results, see %s' % data['cursor']['moreResultsUrl']

showsome('ermanno olmi')

Solution 2 - Python

Here is Alex's answer ported to Python3

#!/usr/bin/python3
import json
import urllib.request, urllib.parse

def showsome(searchfor):
  query = urllib.parse.urlencode({'q': searchfor})
  url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
  search_response = urllib.request.urlopen(url)
  search_results = search_response.read().decode("utf8")
  results = json.loads(search_results)
  data = results['responseData']
  print('Total results: %s' % data['cursor']['estimatedResultCount'])
  hits = data['results']
  print('Top %d hits:' % len(hits))
  for h in hits: print(' ', h['url'])
  print('For more results, see %s' % data['cursor']['moreResultsUrl'])

showsome('ermanno olmi')

Solution 3 - Python

Here's my approach to this: http://breakingcode.wordpress.com/2010/06/29/google-search-python/

A couple code examples:

    # Get the first 20 hits for: "Breaking Code" WordPress blog
    from google import search
    for url in search('"Breaking Code" WordPress blog', stop=20):
        print(url)

    # Get the first 20 hits for "Mariposa botnet" in Google Spain
    from google import search
    for url in search('Mariposa botnet', tld='es', lang='es', stop=20):
        print(url)

Note that this code does NOT use the Google API, and is still working to date (January 2012).

Solution 4 - Python

I am new in python and I was investigating how to do this. None of the provided examples are working properly for me. Some are blocked by google if you make many (few) requests, some are outdated. Parsing the google search html (adding the header in the request) will work until google changes the html structure again. You can use the same logic to search in any other search engine, looking into the html (view-source).

import urllib2

def getgoogleurl(search,siteurl=False):
    if siteurl==False:
        return 'http://www.google.com/search?q='+urllib2.quote(search)
    else:
        return 'http://www.google.com/search?q=site:'+urllib2.quote(siteurl)+'%20'+urllib2.quote(search)

def getgooglelinks(search,siteurl=False):
   #google returns 403 without user agent
   headers = {'User-agent':'Mozilla/11.0'}
   req = urllib2.Request(getgoogleurl(search,siteurl),None,headers)
   site = urllib2.urlopen(req)
   data = site.read()
   site.close()

   #no beatifulsoup because google html is generated with javascript
   start = data.find('<div id="res">')
   end = data.find('<div id="foot">')
   if data[start:end]=='':
      #error, no links to find
      return False
   else:
      links =[]
      data = data[start:end]
      start = 0
      end = 0        
      while start>-1 and end>-1:
          #get only results of the provided site
          if siteurl==False:
            start = data.find('<a href="/url?q=')
          else:
            start = data.find('<a href="/url?q='+str(siteurl))
          data = data[start+len('<a href="/url?q='):]
          end = data.find('&amp;sa=U&amp;ei=')
          if start>-1 and end>-1: 
              link =  urllib2.unquote(data[0:end])
              data = data[end:len(data)]
              if link.find('http')==0:
                  links.append(link)
      return links

Usage:

links = getgooglelinks('python','http://www.stackoverflow.com/')
for link in links:
       print link

(Edit 1: Adding a parameter to narrow the google search to a specific site)

(Edit 2: When I added this answer I was coding a Python script to search subtitles. I recently uploaded it to Github: Subseek)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionResView Question on Stackoverflow
Solution 1 - PythonAlex MartelliView Answer on Stackoverflow
Solution 2 - PythonJohn La RooyView Answer on Stackoverflow
Solution 3 - PythonMario VilasView Answer on Stackoverflow
Solution 4 - PythonFederico Nicolas MottaView Answer on Stackoverflow