In the new version (scrapy 1.1) launched 2016-05-11 the crawl first downloads robots.txt before crawling. To change this behavior change in your `settings.py` with [ROBOTSTXT_OBEY][1]

    ROBOTSTXT_OBEY = False

Here are the [release notes][2]


  [1]: http://doc.scrapy.org/en/1.1/topics/settings.html#robotstxt-obey
  [2]: https://doc.scrapy.org/en/1.1/news.html#id7

First thing you need to ensure is that you change your user agent in the request, otherwise default user agent will be blocked for sure.

Netflix&#39;s Terms of Use state:

&gt;  You also agree not to circumvent, remove, alter, deactivate, degrade or thwart any of the content protections in the Netflix service; use any robot, spider, scraper or other automated means to access the Netflix service;

They have their robots.txt set up to block web scrapers. If you override the setting in `settings.py` to `ROBOTSTXT_OBEY=False` then you are violating their terms of use which can result in a law suit.

I just updated to Android Studio 2.2.1 for Mac. Then I updated the JDK to version 8. Tried to start a new project and run it. I got the following error:  




Error:(1, 1) A problem occurred evaluating project &#39;:app&#39;.
&gt; java.lang.UnsupportedClassVersionError: com/android/build/gradle/AppPlugin : Unsupported major.minor version 52.0





I also tried going back to JDK version 7 as it says on google site that JDK 8 is unstable for Mac. still got the same error.








Getting error when trying to run new project in Android Studio 2.2.1

I wonder if there is an open API to access WhatsApp through an internet protocol?
The concrete question is:

 1. Is there a way to send a message to a list of WhatsApp users from an internet server? 
 2. Is there any open concept for authentication? 
 3. Or - is WhatsApp at the end a closed system without any open API through an internet protocol?

Does WhatsApp offer an open API?

while crawling website like https://www.netflix.com, getting Forbidden by robots.txt: &lt;GET https://www.netflix.com/&gt;

ERROR: No response downloaded for: https://www.netflix.com/

getting Forbidden by robots.txt: scrapy

<p>while crawling website like <a href="https://www.netflix.com" target="_blank" rel="noopener noreferrer">https://www.netflix.com</a>, getting Forbidden by robots.txt: <get &#x3C;a="" href="https://www.netflix.com/">https://www.netflix.com/></get></p>
<p>ERROR: No response downloaded for: <a href="https://www.netflix.com/" target="_blank" rel="noopener noreferrer">https://www.netflix.com/</a></p>


Sometimes there is some non-critical asynchronous operation that needs to happen but I don&#39;t want to wait for it to complete.  In Tornado&#39;s coroutine implementation you can &quot;fire &amp; forget&quot; an asynchronous function by simply ommitting the `yield` key-word.

I&#39;ve been trying to figure out how to &quot;fire &amp; forget&quot; with the new `async`/`await` syntax released in Python 3.5.  E.g., a simplified code snippet:

&lt;!-- language: lang-python --&gt;

    async def async_foo():
        print(&quot;Do some stuff asynchronously here...&quot;)
    
    def bar():
        async_foo()  # fire and forget &quot;async_foo()&quot;
    
    bar()


What happens though is that `bar()` never executes and instead we get a runtime warning:

&lt;!-- language: lang-python --&gt;

    RuntimeWarning: coroutine &#39;async_foo&#39; was never awaited
      async_foo()  # fire and forget &quot;async_foo()&quot;



&quot;Fire and forget&quot; python async/await

I have data which is being accessed via http request and is sent back by the server in a comma separated format, I have the following code :

    site= &#39;www.example.com&#39;
    hdr = {&#39;User-Agent&#39;: &#39;Mozilla/5.0&#39;}
    req = urllib2.Request(site,headers=hdr)
    page = urllib2.urlopen(req)
    soup = BeautifulSoup(page)
    soup = soup.get_text()
    text=str(soup)

The content of text is as follows:

    april,2,5,7
    may,3,5,8
    june,4,7,3
    july,5,6,9

How can I save this data into a CSV file.
I know I can do something along the lines of the following to iterate line by line:

    import StringIO
    s = StringIO.StringIO(text)
    for line in s:

But i&#39;m unsure how to now properly write each line to CSV

EDIT---&gt; Thanks for the feedback as suggested the solution was rather simple and can be seen below.

Solution:

    import StringIO
    s = StringIO.StringIO(text)
    with open(&#39;fileName.csv&#39;, &#39;w&#39;) as f:
        for line in s:
            f.write(line)



How to write to a CSV line by line?

I have a machine learning classification problem with 80% categorical variables. Must I use one hot encoding if I want to use some classifier for the classification? Can i pass the data to a classifier without the encoding? 

I am trying to do the following for feature selection:

1. I read the train file:

        num_rows_to_read = 10000
        train_small = pd.read_csv(&quot;../../dataset/train.csv&quot;,   nrows=num_rows_to_read)

2. I change the type of the categorical features to &#39;category&#39;:

        non_categorial_features = [&#39;orig_destination_distance&#39;,
                                  &#39;srch_adults_cnt&#39;,
                                  &#39;srch_children_cnt&#39;,
                                  &#39;srch_rm_cnt&#39;,
                                  &#39;cnt&#39;]

        for categorical_feature in list(train_small.columns):
            if categorical_feature not in non_categorial_features:
                train_small[categorical_feature] = train_small[categorical_feature].astype(&#39;category&#39;)

3. I use one hot encoding: 

        train_small_with_dummies = pd.get_dummies(train_small, sparse=True)


The problem is that the 3&#39;rd part often get stuck, although I am using a strong machine.

Thus, without the one hot encoding I can&#39;t do any feature selection, for determining the importance of the features.

What do you recommend?

How can I one hot encode in Python?

Currently I use the following code:
            
    callbacks = [
    	EarlyStopping(monitor=&#39;val_loss&#39;, patience=2, verbose=0),
    	ModelCheckpoint(kfold_weights_path, monitor=&#39;val_loss&#39;, save_best_only=True, verbose=0),
    ]
    model.fit(X_train.astype(&#39;float32&#39;), Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
    	  shuffle=True, verbose=1, validation_data=(X_valid, Y_valid),
    	  callbacks=callbacks)

It tells Keras to stop training when loss didn&#39;t improve for 2 epochs. But I want to stop training after loss became smaller than some constant &quot;THR&quot;:

    if val_loss &lt; THR:
        break

I&#39;ve seen in documentation there are possibility to make your own callback:
http://keras.io/callbacks/
But nothing found how to stop training process. I need an advice.

How to tell Keras stop training based on loss value?

I&#39;ve run several training sessions with different graphs in TensorFlow. The summaries I set up show interesting results in the training and validation. Now, I&#39;d like to take the data I&#39;ve saved in the summary logs and perform some statistical analysis and in general plot and look at the summary data in different ways. Is there any existing way to easily access this data?

More specifically, is there any built in way to read a TFEvent record back into Python?

If there is no simple way to do this, [TensorFlow states that all its file formats are protobuf files](https://www.tensorflow.org/versions/r0.8/how_tos/tool_developers/index.html#protocol-buffers). From my understanding of protobufs (which is limited), I think I&#39;d be able to extract this data if I have the TFEvent protocol specification. Is there an easy way to get ahold of this? Thank you much.

TensorFlow - Importing data from a TensorBoard TFEvent file?

I just started programming Python. I want to use scrapy to create a bot，and it showed 
TypeError: Object of type &#39;bytes&#39; is not JSON serializable when I run the project. 



    import json
    import codecs

    class W3SchoolPipeline(object):
    
      def __init__(self):
          self.file = codecs.open(&#39;w3school_data_utf8.json&#39;, &#39;wb&#39;, encoding=&#39;utf-8&#39;)

      def process_item(self, item, spider):
          line = json.dumps(dict(item)) + &#39;\n&#39;
          # print line

          self.file.write(line.decode(&quot;unicode_escape&quot;))
          return item

-------------------------------


    from scrapy.spiders import Spider
    from scrapy.selector import Selector
    from w3school.items import W3schoolItem

    class W3schoolSpider(Spider):
   
        name = &quot;w3school&quot;
        allowed_domains = [&quot;w3school.com.cn&quot;]
    
        start_urls = [
            &quot;http://www.w3school.com.cn/xml/xml_syntax.asp&quot;
        ]

        def parse(self, response):
            sel = Selector(response)
            sites = sel.xpath(&#39;//div[@id=&quot;navsecond&quot;]/div[@id=&quot;course&quot;]/ul[1]/li&#39;)

        items = []
        for site in sites:
            item = W3schoolItem()
            title = site.xpath(&#39;a/text()&#39;).extract()
            link = site.xpath(&#39;a/@href&#39;).extract()
            desc = site.xpath(&#39;a/@title&#39;).extract()

            item[&#39;title&#39;] = [t.encode(&#39;utf-8&#39;) for t in title]
            item[&#39;link&#39;] = [l.encode(&#39;utf-8&#39;) for l in link]
            item[&#39;desc&#39;] = [d.encode(&#39;utf-8&#39;) for d in desc]
            items.append(item)
            return items


Traceback：






    TypeError: Object of type &#39;bytes&#39; is not JSON serializable
    2017-06-23 01:41:15 [scrapy.core.scraper] ERROR: Error processing       {&#39;desc&#39;: [b&#39;\x
    e4\xbd\xbf\xe7\x94\xa8 XSLT \xe6\x98\xbe\xe7\xa4\xba XML&#39;],
     &#39;link&#39;: [b&#39;/xml/xml_xsl.asp&#39;],
     &#39;title&#39;: [b&#39;XML XSLT&#39;]}

    Traceback (most recent call last):
    File  
    &quot;c:\users\administrator\appdata\local\programs\python\python36\lib\site-p
    ackages\twisted\internet\defer.py&quot;, line 653, in _runCallbacks
        current.result = callback(current.result, *args, **kw)
    File &quot;D:\LZZZZB\w3school\w3school\pipelines.py&quot;, line 19, in process_item
        line = json.dumps(dict(item)) + &#39;\n&#39;
    File 
    &quot;c:\users\administrator\appdata\local\programs\python\python36\lib\json\_
    _init__.py&quot;, line 231, in dumps
        return _default_encoder.encode(obj)
    File 
    &quot;c:\users\administrator\appdata\local\programs\python\python36\lib\json\e
    ncoder.py&quot;, line 199, in encode
        chunks = self.iterencode(o, _one_shot=True)
    File  
    &quot;c:\users\administrator\appdata\local\programs\python\python36\lib\json\e
    ncoder.py&quot;, line 257, in iterencode
        return _iterencode(o, 0)
    File      
    &quot;c:\users\administrator\appdata\local\programs\python\python36\lib\
    json\encoder.py&quot;, line 180, in default
        o.__class__.__name__)
      TypeError: Object of type &#39;bytes&#39; is not JSON serializable


  

TypeError: Object of type &#39;bytes&#39; is not JSON serializable

I&#39;m practicing the code from &#39;Web Scraping with Python&#39;, and I keep having this certificate problem:

    from urllib.request import urlopen 
    from bs4 import BeautifulSoup 
    import re

    pages = set()
    def getLinks(pageUrl):
    	global pages
    	html = urlopen(&quot;http://en.wikipedia.org&quot;+pageUrl)
    	bsObj = BeautifulSoup(html)
    	for link in bsObj.findAll(&quot;a&quot;, href=re.compile(&quot;^(/wiki/)&quot;)):
    		if &#39;href&#39; in link.attrs:
    			if link.attrs[&#39;href&#39;] not in pages:
    				#We have encountered a new page
    				newPage = link.attrs[&#39;href&#39;] 
    				print(newPage) 
    				pages.add(newPage) 
    				getLinks(newPage)
    getLinks(&quot;&quot;)

The error is:

      File &quot;/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py&quot;, line 1319, in do_open
        raise URLError(err)
    urllib.error.URLError: &lt;urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1049)&gt;


Btw,I was also practicing scrapy, but kept getting the problem: command not found: scrapy (I tried all sorts of solutions online but none works... really frustrating)

Scraping: SSL: CERTIFICATE_VERIFY_FAILED error for http://en.wikipedia.org

tl;dr
-----

Hide email address from bots without using scripts and maintain `mailto:` functionality. Method must also support screen-readers.


----------


Summary
-------


 - Email **obfuscation without** using **scripts** or contact forms

 - Email address needs to be **completely visible** to human viewers and **maintain `mailto:` functionality**

 - Email Address **must not be in image form**. 

 - Email address **must be &quot;completely&quot; hidden from spam-crawlers and spam-bots** and **any other harvester type**



----------





Desired Effect:
---------------







 - **No scripts**, please. There are no scripts used in the project and **I&#39;d like to keep it that way**.  

 - Email address is either **displayed on the page** or can be easily displayed after some sort of user interaction, like opening a modal.

 - The **user can click on on the email address** which in turn would trigger the `mailto:` functionality.
 - Clicking the email will open the user&#39;s email application. 

      

    *In other words, `mailto:` functionality must work.*

 - The email address in not visible or not identified as an email address to bots **(This includes the page source)** 

 - I don&#39;t have an inbox that&#39;s full of spam



----------





What does *NOT* Work
--------------------





 - Adding a contact form - or anything similar - instead of the email address



 ***I hate contact forms**. I rarely fill up a contact form. If there&#39;s no email address, I look for a phone number, and if that&#39;s not there, I start looking for an alternative service. I would only fill up a contact form if I absolutely have to.* 



 - Replacing the address with an image of the address



 This creates a **HUGE** disadvantage to someone using a screenreader (**please remember the visually impaired in your future projects**)



 It also **removes** the `mailto:` functionality unless you make the image clickable and then add the `mailto:` functionality as the `href` for the link, but that **defeats the purpose** and now the email is visible to bots.

 

----------


What might work:
----------------




 - Clever usage of `pseudo-elements` in `CSS`

 - Solutions that make use of `base64` encoding

 - **Breaking up** the email address and spreading the parts across the document then putting them back together in a modal when the user clicks a button (This will probably involve multiple `CSS` classes and the usage of `anchor tags`)

 - Alterting `html` attributes via `CSS` 

 @MortezaAsadi gracefully brought up the possibility in the comments below. This is the link to the full - The article is from 2012:

 [What if We Could Use CSS to Alter HTML Attributes?][1] 

 - Other creative solutions that are beyond my scope of knowledge. 



----------



Similar Questions / Fixes
-------------------------





 - [JavaScript: Protect your email address by Joe Maller][2]



(This a great fix suggested by Joe Maller, it works well but it&#39;s **script based**. Here&#39;s what it looks like;



&lt;!-- begin snippet: js hide: false console: true babel: false --&gt;



&lt;!-- language: lang-html --&gt;



    &lt;SCRIPT TYPE=&quot;text/javascript&quot;&gt;

      emailE = &#39;emailserver.com&#39;

      emailE = (&#39;yourname&#39; + &#39;@&#39; + emailE)

      document.write(&#39;&lt;A href=&quot;mailto:&#39; + emailE + &#39;&quot;&gt;&#39; + emailE + &#39;&lt;/a&gt;&#39;)

    &lt;/script&gt;



    &lt;NOSCRIPT&gt;

      Email address protected by JavaScript

    &lt;/NOSCRIPT&gt;







&lt;!-- end snippet --&gt;



 - [Looking for a php only email address obfuscator function][3]



   (A Clever solution using both `PHP` and `CSS` to first **reverse** the email using PHP then **reverse it** back with CSS) A very promising solution that Works great! But it&#39;s **too easy to solve**. 



 - [Is it worth obfuscating email addresses on the web these days?][4]



  (Javascript fix)







 - [https://stackoverflow.com/questions/748780/best-way-to-obfuscate-an-e-mail-address-on-a-website/748805#748805][5]



  **The selected answer works**. It actually works really well. It involves encoding the email as `html entities`. Can it be improved?

  

 Here&#39;s what it looks like;



 &lt;!-- begin snippet: js hide: false console: true babel: false --&gt;



 &lt;!-- language: lang-html --&gt;



    &lt;A HREF=&quot;mailto:

    &amp;#121;&amp;#111;&amp;#117;&amp;#114;&amp;#110;&amp;#097;&amp;#109;&amp;#101;&amp;#064;&amp;#100;&amp;#111;&amp;#109;&amp;#097;&amp;#105;&amp;#110;&amp;#046;&amp;#099;&amp;#111;&amp;#109;&quot;&gt;

    &amp;#121;&amp;#111;&amp;#117;&amp;#114;&amp;#110;&amp;#097;&amp;#109;&amp;#101;&amp;#064;&amp;#100;&amp;#111;&amp;#109;&amp;#097;&amp;#105;&amp;#110;&amp;#046;&amp;#099;&amp;#111;&amp;#109;

    &lt;/A&gt;



&lt;!-- end snippet --&gt;






 - [Does e-mail address obfuscation actually work?][6] 



  (The selected answer to this SuperUser question is great and it presents a study of the amount of spam received by using different obfuscation methods.



  It seems that manipulating the email address with `CSS` to make it `rtl` does work. This is the same method used in the first question I linked to in this section. 



  I am uncertain what effects adding `mailto:` functionality to the fix would have on the results. 





 - There are also many other questions on [SO][7] which all have similar answers. I have not found anything that fits **my desired effect**



----------



The Question:
-------------





Would it be possible to **increase the efficiency** (ie as little spam as possible) of the email obfuscation methods above by **combining two or more of the fixes (or even adding new fixes)** while:



 **A- Maintaining `mailto:` functionality; and**

 **B- Supporting screen-readers**


----------
**Edit:** 

Many of the **answers and comments below** pose a very good question while indicating the impossibility of doing this without some sort of `js`

The question that&#39;s *asked/implied* is:

&gt; Why not use `js`?

The answer is that I am allergic to `js`

Joking aside though,

The three main reasons I asked this question are:

 - Contact forms are becoming more and more accepted as a replacement
   for providing an email address - which they should not.

 - If it **can be done** without scripting then it **should be done** without
   scripting.

 - **Curiosity:** (as I am in fact using one of the `js` fixes currently) I wanted to see *if discussing the matter would lead to a better way of doing it.*

  [1]: http://andydavies.me/blog/2012/08/13/what-if-we-could-use-css-to-manipulate-html-attributes/
  [2]: http://joemaller.com/js-mailer.shtml
  [3]: https://stackoverflow.com/questions/12592363/looking-for-a-php-only-email-address-obfuscator-function
  [4]: https://stackoverflow.com/questions/4098408/is-it-worth-obfuscating-email-addresses-on-the-web-these-days
  [5]: https://stackoverflow.com/questions/748780/best-way-to-obfuscate-an-e-mail-address-on-a-website/748805#748805
  [6]: https://superuser.com/questions/235937/does-e-mail-address-obfuscation-actually-work
  [7]: https://stackoverflow.com/search?tab=newest&amp;q=Email%20Address%20Obfuscation

Content Type	Original Author	Original Content on Stackoverflow
Question	deepak kumar	View Question on Stackoverflow
Solution 1 - Python	Rafael Almeida	View Answer on Stackoverflow
Solution 2 - Python	Ketan Patel	View Answer on Stackoverflow
Solution 3 - Python	CubeOfCheese	View Answer on Stackoverflow

getting Forbidden by robots.txt: scrapy

Python Problem Overview

Python Solutions

Solution 1 - Python

Solution 2 - Python

Solution 3 - Python

Does WhatsApp offer an open API?

Getting error when trying to run new project in Android Studio 2.2.1

Attributions