Search in html source with GOOGLE?

SearchWeb CrawlerSearch EngineKeyword

Search Problem Overview


I have several websites, and I can't remember where I wrote some lines of code. As my pages are indexed by Google, I would like to know if Google offers a facility to search within the HTML source code/mark-up itself, instead of just allowing search within the visual, rendered, part of a page?

Thanks

Search Solutions


Solution 1 - Search

I've come across the following resources on my travels (some already mentioned above):

HTML Mark-up-focused search engines

I'd also like to throw in the following:

Huge, website crawl data archives

How can we analyze this crawl data?

For an idea of how to begin analyzing some of this massive data, take a look at Big Data/Map-reduce-type frameworks(s).

Google lists some ideas on using Apache's Spark project to analyze Common Crawl's dump(s). To understand the file format(s) used by Common Crawl, refer to the following:

The article, Accessing-Common-Crawl-Dataset-on-S3, outlines accessing Common Crawl's 250TB+ dump(s) in a low cost manner without transferring that data load outside of Amazon's AWS/S3 network. Of course, that assumes you are going to use some combination AWS/EC2/S3 etc. to analyze the crawl data.

Finally, Patrick Durusau maintains some interesting Common-Crawl-usage-related blog pages.

Personally, I find this subject intriguing, I suggest we get this crawl data while it's HOT! ;-)

Solution 2 - Search

You can try PublicWWW for search in source/mark-up. It allows to find any HTML, JavaScript, CSS and plain text in web page source code on 167+ million websites.

With PublicWWW you can:

  • Find related websites through the unique HTML codes they share, i.e. widgets & publisher IDs.

  • Identify sites using certain images or badges.

  • Find out who else is using your theme.

  • Identify sites mentioning you.

  • Find your competitor's affiliates.

  • Identify sites where your competitors personally collaborate or interact.

  • References to use a library or a platform.

  • Find code examples on the net.

  • Figure out who is using what JS widgets on their sites.

  • ...

Of course you can find not only your websites which use some code/mark-up snippet.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionEntretoizeView Question on Stackoverflow
Solution 1 - SearchBig RichView Answer on Stackoverflow
Solution 2 - SearchJames AndreenkoView Answer on Stackoverflow