The following line is looking for the **exact** NavigableString &#39;Python&#39;:

    &gt;&gt;&gt; soup.body.findAll(text=&#39;Python&#39;)
    []

Note that the following NavigableString is found:

    &gt;&gt;&gt; soup.body.findAll(text=&#39;Python Jobs&#39;) 
    [u&#39;Python Jobs&#39;]

Note this behaviour:

    &gt;&gt;&gt; import re
    &gt;&gt;&gt; soup.body.findAll(text=re.compile(&#39;^Python$&#39;))
    []

So your regexp is looking for an occurrence of &#39;Python&#39; not the exact match to the NavigableString &#39;Python&#39;. 



`text=&#39;Python&#39;` searches for elements that have the exact text you provided:

    import re
    from BeautifulSoup import BeautifulSoup
    
    html = &quot;&quot;&quot;&lt;p&gt;exact text&lt;/p&gt;
       &lt;p&gt;almost exact text&lt;/p&gt;&quot;&quot;&quot;
    soup = BeautifulSoup(html)
    print soup(text=&#39;exact text&#39;)
    print soup(text=re.compile(&#39;exact text&#39;))

### Output

    [u&#39;exact text&#39;]
    [u&#39;exact text&#39;, u&#39;almost exact text&#39;]


&quot;To see if the string &#39;Python&#39; is located on the page http://python.org&quot;:

    import urllib2
    html = urllib2.urlopen(&#39;http://python.org&#39;).read()
    print &#39;Python&#39; in html # -&gt; True

If you need to find a position of substring within a string you could do `html.find(&#39;Python&#39;)`.

In addition to the [accepted answer][1]. You can use a `lambda` instead of `regex`:

    from bs4 import BeautifulSoup
    
    html = &quot;&quot;&quot;&lt;p&gt;test python&lt;/p&gt;&quot;&quot;&quot;
    
    soup = BeautifulSoup(html, &quot;html.parser&quot;)
    
    print(soup(text=&quot;python&quot;))
    print(soup(text=lambda t: &quot;python&quot; in t))

Output:

    []
    [&#39;test python&#39;]


  [1]: https://stackoverflow.com/a/8936235/

I have not used BeuatifulSoup but maybe the following can help in some tiny way.

    import re
    import urllib2
    stuff = urllib2.urlopen(your_url_goes_here).read()  # stuff will contain the *entire* page

    # Replace the string Python with your desired regex
    results = re.findall(&#39;(Python)&#39;,stuff)

    for i in results:
        print i

I&#39;m not suggesting this is a replacement but maybe you can glean some value in the concept until a direct answer comes along.

I must be overlooking something very simple here but I can&#39;t seem to figure out how to render a simple ERB template with values from a hash-map.

I am relatively new to ruby, coming from python. I have an ERB template (not HTML), which I need rendered with context that&#39;s to be taken from a hash-map, which I receive from an external source.

However, the documentation of ERB, states that the `ERB.result` method takes a `binding`. I learnt that they are something that hold the variable contexts in ruby (something like `locals()` and `globals()` in python, I presume?). But, I don&#39;t know how I can build a binding object out of my hash-map.

A little (a *lot*, actually) googling gave me this: http://refactormycode.com/codes/281-given-a-hash-of-variables-render-an-erb-template, which uses some ruby metaprogramming magic that escapes me.

So, isn&#39;t there a simple solution to this problem? Or is there a better templating engine (not tied to HTML) better suited for this? (I only chose ERB because its in the stdlib).

Render an ERB template with values from a hash

Let&#39;s say I am trying to remove elements from array `a = [1,1,1,2,2,3]`. If I perform the following:

    b = a - [1,3]
Then I will get:

    b = [2,2]
However, I want the result to be

    b = [1,1,2,2]
i.e. I only remove one instance of each element in the subtracted vector not all cases. Is there a simple way in Ruby to do this?

Removing elements from array Ruby

I am using BeautifulSoup to look for user-entered strings on a specific page. 
For example, I want to see if the string &#39;Python&#39; is located on the page: http://python.org

When I used:
`find_string = soup.body.findAll(text=&#39;Python&#39;)`, 
`find_string` returned `[]`

But when I used:
`find_string = soup.body.findAll(text=re.compile(&#39;Python&#39;), limit=1)`, 
`find_string` returned `[u&#39;Python Jobs&#39;]` as expected

What is the difference between these two statements that makes the second statement work when there are more than one instances of the word to be searched?


Using BeautifulSoup to search HTML for string

I am using BeautifulSoup to look for user-entered strings on a specific page.
For example, I want to see if the string 'Python' is located on the page: <a href="http://python.org" target="_blank" rel="noopener noreferrer">http://python.org</a>
When I used:
<code>find_string = soup.body.findAll(text='Python')</code>,
<code>find_string</code> returned <code>[]</code>
But when I used:
<code>find_string = soup.body.findAll(text=re.compile('Python'), limit=1)</code>,
<code>find_string</code> returned <code>[u'Python Jobs']</code> as expected
What is the difference between these two statements that makes the second statement work when there are more than one instances of the word to be searched?

I was just trying to see how variable scopes work and ran into the following situation (all ran from the terminal):

    x = 1
    def inc():
        x += 5

    inc()
    Traceback (most recent call last):
      File &quot;&lt;stdin&gt;&quot;, line 1, in &lt;module&gt;
      File &quot;&lt;stdin&gt;&quot;, line 2, in inc
    UnboundLocalError: local variable &#39;x&#39; referenced before assignment

I was thinking maybe I don&#39;t have access to x in my method, so I tried:

    def inc():
        print(x)

    1

So this works. Now I know I could just do: 

     def inc():
         global x
         x += 1

And this would work, but my question is why does the first example fail? I mean I would expect since `print(x)` worked that `x` is visible inside the function so why would the `x += 5` fail?


Variables declared outside function

I keep getting an error that says

&lt;!-- language: lang-none --&gt;

    AttributeError: &#39;NoneType&#39; object has no attribute &#39;something&#39;

The code I have is too long to post here. What general scenarios would cause this `AttributeError`, what is `NoneType` supposed to mean and how can I narrow down what&#39;s going on?


Why do I get AttributeError: &#39;NoneType&#39; object has no attribute &#39;something&#39;?

I&#39;ve been working on a new dev platform using nginx/gunicorn and Flask for my application.

Ops-wise, everything works fine - the issue I&#39;m having is with debugging the Flask layer. When there&#39;s an error in my code, I just get a straight 500 error returned to the browser and nothing shows up on the console or in my logs.

I&#39;ve tried many different configs/options.. I guess I **must** be missing something obvious.

My gunicorn.conf:

    import os
    
    bind = &#39;127.0.0.1:8002&#39;
    workers = 3
    backlog = 2048
    worker_class = &quot;sync&quot;
    debug = True
    proc_name = &#39;gunicorn.proc&#39;
    pidfile = &#39;/tmp/gunicorn.pid&#39;
    logfile = &#39;/var/log/gunicorn/debug.log&#39;
    loglevel = &#39;debug&#39;

An example of some Flask code that borks- testserver.py:

    from flask import Flask
    from flask import render_template_string
    from werkzeug.contrib.fixers import ProxyFix
    
    app = Flask(__name__)
    
    @app.route(&#39;/&#39;)
    def index():
        n = 1/0
        return &quot;DIV/0 worked!&quot;

And finally, the command to run the flask app in gunicorn:

    gunicorn -c gunicorn.conf.py testserver:app

Thanks y&#39;all

Debugging a Flask app running in Gunicorn

How do I import a module(python file) that resides in the parent directory?

Both directories have a `__init__.py` file in them but I still cannot import a file from the parent directory?

In this folder layout, Script B is attempting to import Script A:

    Folder A:
       __init__.py
       Script A:
       Folder B:
         __init__.py
         Script B(attempting to import Script A)

The following code in Script B doesn&#39;t work:

    import ../scriptA.py # I get a compile error saying the &quot;.&quot; is invalid

Import Script from a Parent Directory

I want to import subfolders as modules. Therefore every subfolder contains a `__init__.py`. My folder structure is like this:

    src\
      main.py
      dirFoo\
        __init__.py
        foofactory.py
        dirFoo1\
          __init__.py
          foo1.py
        dirFoo2\
          __init__.py
          foo2.py

In my main script I import
    
    from dirFoo.foofactory import FooFactory
In this factory file I include the sub modules:

    from dirFoo1.foo1 import Foo1
    from dirFoo2.foo2 import Foo2

If I call my foofactory I get the error, that python can&#39;t import the submodules foo1 and foo2:

    Traceback (most recent call last):
      File &quot;/Users/tmp/src/main.py&quot;, line 1, in &lt;module&gt;
    from dirFoo.foofactory import FooFactory
      File &quot;/Users/tmp/src/dirFoo/foofactory.py&quot;, line 1, in    &lt;module&gt;
    from dirFoo1.foo1 import Foo1
        ImportError: No module named dirFoo1.foo1



Import module from subfolder

I&#39;m having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup. 

The problem is that the error is not always reproducible; it sometimes works with some pages, and sometimes, it barfs by throwing a `UnicodeEncodeError`. I have tried just about everything I can think of, and yet I have not found anything that works consistently without throwing some kind of Unicode-related error.

One of the sections of code that is causing problems is shown below:

    agent_telno = agent.find(&#39;div&#39;, &#39;agent_contact_number&#39;)
    agent_telno = &#39;&#39; if agent_telno is None else agent_telno.contents[0]
    p.agent_info = str(agent_contact + &#39; &#39; + agent_telno).strip()


Here is a stack trace produced on SOME strings when the snippet above is run:

    Traceback (most recent call last):
      File &quot;foobar.py&quot;, line 792, in &lt;module&gt;
        p.agent_info = str(agent_contact + &#39; &#39; + agent_telno).strip()
    UnicodeEncodeError: &#39;ascii&#39; codec can&#39;t encode character u&#39;\xa0&#39; in position 20: ordinal not in range(128)


I suspect that this is because some pages (or more specifically, pages from some of the sites) may be encoded, whilst others may be unencoded. All the sites are based in the UK and provide data meant for UK consumption - so there are no issues relating to internalization or dealing with text written in anything other than English.

Does anyone have any ideas as to how to solve this so that I can CONSISTENTLY fix this problem?



UnicodeEncodeError: &#39;ascii&#39; codec can&#39;t encode character u&#39;\xa0&#39; in position 20: ordinal not in range(128)

I am currently using Beautiful Soup to parse an HTML file and calling `get_text()`, but it seems like I&#39;m being left with a lot of \xa0 Unicode representing spaces. Is there an efficient way to remove all of them in Python 2.7, and change them into spaces? I guess the more generalized question would be, is there a way to remove Unicode formatting?

I tried using: `line = line.replace(u&#39;\xa0&#39;,&#39; &#39;)`, as suggested by another thread, but that changed the \xa0&#39;s to u&#39;s, so now I have &quot;u&quot;s everywhere instead. ):

EDIT: The problem seems to be resolved by `str.replace(u&#39;\xa0&#39;, &#39; &#39;).encode(&#39;utf-8&#39;)`, but just doing `.encode(&#39;utf-8&#39;)` without `replace()` seems to cause it to spit out even weirder characters, \xc2 for instance. Can anyone explain this?

How to remove \xa0 from string in Python?

I want to print an attribute value based on its name, take for example

    &lt;META NAME=&quot;City&quot; content=&quot;Austin&quot;&gt;

I want to do something like this

```python
soup = BeautifulSoup(f)  # f is some HTML containing the above meta tag
for meta_tag in soup(&quot;meta&quot;):
    if meta_tag[&quot;name&quot;] == &quot;City&quot;:
        print(meta_tag[&quot;content&quot;])
```

The above code give a `KeyError: &#39;name&#39;`, I believe this is because name is used by BeatifulSoup so it can&#39;t be used as a keyword argument.

Get an attribute value based on the name attribute with BeautifulSoup

I am using BeautifulSoup to scrape an URL and I had the following code, to find the `td` tag whose class is `&#39;empformbody&#39;`:

    import urllib
    import urllib2
    from BeautifulSoup import BeautifulSoup
    
    url =  &quot;http://www.example.com/servlet/av/ResultTemplate=AVResult.html&quot;
    req = urllib2.Request(url)
    response = urllib2.urlopen(req)
    the_page = response.read()
    soup = BeautifulSoup(the_page)

    soup.findAll(&#39;td&#39;,attrs={&#39;class&#39;:&#39;empformbody&#39;})

Now in the above code we can use `findAll` to get tags and information related to them, but I want to use XPath. Is it possible to use XPath with BeautifulSoup? If possible, please provide me example code.

can we use XPath with BeautifulSoup?

I&#39;m working in Python and using Flask. When I run my main Python file on my computer, it works perfectly, but when I activate venv and run the Flask Python file in the terminal, it says that my main Python file has &quot;No Module Named bs4.&quot; Any comments or advice is greatly appreciated.

Content Type	Original Author	Original Content on Stackoverflow
Question	kachilous	View Question on Stackoverflow
Solution 1 - Python	sgallen	View Answer on Stackoverflow
Solution 2 - Python	jfs	View Answer on Stackoverflow
Solution 3 - Python	MendelG	View Answer on Stackoverflow
Solution 4 - Python	Bit Bucket	View Answer on Stackoverflow

Using BeautifulSoup to search HTML for string

Python Problem Overview

Python Solutions

Solution 1 - Python

Solution 2 - Python

Output

Solution 3 - Python

Solution 4 - Python

Removing elements from array Ruby

Render an ERB template with values from a hash

Attributions