How can I control PhantomJS to skip download some kind of resource?

Phantomjs

Phantomjs Problem Overview


phantomjs has config loadImage,

but I want more,

how can I control phantomjs to skip download some kind of resource,

such as css etc...

=====

good news: this feature is added.

https://code.google.com/p/phantomjs/issues/detail?id=230

The gist:

page.onResourceRequested = function(requestData, request) {
    if ((/http:\/\/.+?\.css/gi).test(requestData['url']) || requestData['Content-Type'] == 'text/css') {
        console.log('The url of the request is matching. Aborting: ' + requestData['url']);
        request.abort();
    }
};

Phantomjs Solutions


Solution 1 - Phantomjs

UPDATED, Working!

Since PhantomJS 1.9, the existing answer didn't work. You must use this code:

var webPage = require('webpage');
var page = webPage.create();

page.onResourceRequested = function(requestData, networkRequest) {
  var match = requestData.url.match(/wordfamily.js/g);
  if (match != null) {
    console.log('Request (#' + requestData.id + '): ' + JSON.stringify(requestData));
    networkRequest.cancel(); // or .abort() 
  }
};

If you use abort() instead of cancel(), it will trigger onResourceError.

You can look at the PhantomJS docs

Solution 2 - Phantomjs

So finally you can try this http://github.com/eugenehp/node-crawler

otherwise you can still try the below approach with PhantomJS

The easy way, is to load page -> parse page -> exclude unwanted resource -> load it into PhatomJS.

Another way is just simply block the hosts in the firewall.

Optionally you can use a proxy to block certain URL addresses and queries to them.

And additional one, load the page, and then remove the unwanted resources, but I think its not the right approach here.

Solution 3 - Phantomjs

Use page.onResourceRequested, as in example loadurlwithoutcss.js:

page.onResourceRequested = function(requestData, request) {
    if ((/http:\/\/.+?\.css/gi).test(requestData['url']) || 
            requestData.headers['Content-Type'] == 'text/css') {
        console.log('The url of the request is matching. Aborting: ' + requestData['url']);
        request.abort();
    }
};

Solution 4 - Phantomjs

No way for now (phantomjs 1.7), it does NOT support that.

But a nasty solution is using a http proxy, so you can screen out some request that you don't need

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionatian25View Question on Stackoverflow
Solution 1 - Phantomjswebo80View Answer on Stackoverflow
Solution 2 - PhantomjsEugene HauptmannView Answer on Stackoverflow
Solution 3 - PhantomjsbainView Answer on Stackoverflow
Solution 4 - PhantomjsSHAWNView Answer on Stackoverflow