bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

PythonPython 2.7BeautifulsoupLxml

Python Problem Overview


...
soup = BeautifulSoup(html, "lxml")
File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 152, in __init__
% ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

The above outputs on my Terminal. I am on Mac OS 10.7.x. I have Python 2.7.1, and followed this tutorial to get Beautiful Soup and lxml, which both installed successfully and work with a separate test file located here. In the Python script that causes this error, I have included this line: from pageCrawler import comparePages And in the pageCrawler file I have included the following two lines: from bs4 import BeautifulSoup from urllib2 import urlopen

Any help in figuring out what the problem is and how it can be solved would much be appreciated.

Python Solutions


Solution 1 - Python

I have a suspicion that this is related to the parser that BS will use to read the HTML. They document is here, but if you're like me (on OSX) you might be stuck with something that requires a bit of work:

You'll notice that in the BS4 documentation page above, they point out that by default BS4 will use the Python built-in HTML parser. Assuming you are in OSX, the Apple-bundled version of Python is 2.7.2 which is not lenient for character formatting. I hit this same problem, so I upgraded my version of Python to work around it. Doing this in a virtualenv will minimize disruption to other projects.

If doing that sounds like a pain, you can switch over to the LXML parser:

pip install lxml

And then try:

soup = BeautifulSoup(html, "lxml")

Depending on your scenario, that might be good enough. I found this annoying enough to warrant upgrading my version of Python. Using virtualenv, you can migrate your packages fairly easily.

Solution 2 - Python

I'd prefer the built in python html parser, no install no dependencies

soup = BeautifulSoup(s, "html.parser")

Solution 3 - Python

For basic out of the box python with bs4 installed then you can process your xml with

soup = BeautifulSoup(html, "html5lib")

If however you want to use formatter='xml' then you need to

pip3 install lxml

soup = BeautifulSoup(html, features="xml")

Solution 4 - Python

Run these three commands to make sure that you have all the relevant packages installed:

pip install bs4
pip install html5lib
pip install lxml

Then restart your Python IDE, if needed.

That should take care of anything related to this issue.

Solution 5 - Python

Actually 3 of the options mentioned by other work.

soup_object= BeautifulSoup(markup,"html.parser") #Python HTML parser
pip install lxml

soup_object= BeautifulSoup(markup,'lxml') # C dependent parser 
pip install html5lib

soup_object= BeautifulSoup(markup,'html5lib') # C dependent parser 

Solution 6 - Python

Install LXML parser in python environment.

pip install lxml

Your problem will be resolve. You can also use built-in python package for the same as:

soup = BeautifulSoup(s,  "html.parser")

Note: The "HTMLParser" module has been renamed to "html.parser" in Python3

Solution 7 - Python

I am using Python 3.6 and I had the same original error in this post. After I ran the command:

python3 -m pip install lxml

it resolved my problem

Solution 8 - Python

Instead of using lxml use html.parser, you can use this piece of code:

soup = BeautifulSoup(html, 'html.parser')

Solution 9 - Python

Although BeautifulSoup supports the HTML parser by default If you want to use any other third-party Python parsers you need to install that external parser like(lxml).

soup_object= BeautifulSoup(markup, "html.parser") #Python HTML parser

But if you don't specified any parser as parameter you will get an warning that no parser specified.

soup_object= BeautifulSoup(markup) #Warnning

To use any other external parser you need to install it and then need to specify it. like

pip install lxml

soup_object= BeautifulSoup(markup, 'lxml') # C dependent parser 

External parser have c and python dependency which may have some advantage and disadvantage.

Solution 10 - Python

I encountered the same issue. I found the reason is that I had a slightly-outdated python six package.

>>> import html5lib
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/html5lib/__init__.py", line 16, in <module>
    from .html5parser import HTMLParser, parse, parseFragment
  File "/usr/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 2, in <module>
    from six import with_metaclass, viewkeys, PY3
ImportError: cannot import name viewkeys

Upgrading your six package will solve the issue:

sudo pip install six=1.10.0

Solution 11 - Python

In some references, use the second instead of the first:

soup_object= BeautifulSoup(markup,'html-parser')
soup_object= BeautifulSoup(markup,'html.parser')

Solution 12 - Python

The error is coming because of the parser you are using. In general, if you have HTML file/code then you need to use html5lib(documentation can be found here) & in-case you have XML file/data then you need to use lxml(documentation can be found here). You can use lxml for HTML file/code also but sometimes it gives an error as above. So, better to choose the package wisely based on the type of data/file. You can also use html_parser which is built-in module. But, this also sometimes do not work.

For more details regarding when to use which package you can see the details here

Solution 13 - Python

Blank parameter will result in a warning for best available.
soup = BeautifulSoup(html)

---------------/UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html5lib"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.----------------------/

python --version Python 3.7.7

PyCharm 19.3.4 CE

Solution 14 - Python

My solution was to remove lxml from conda and reinstalling it with pip.

Solution 15 - Python

In my case I had an outdated version of the lxml package. So I just updated it and this fixed the issue.

sudo python3 -m pip install lxml --upgrade

Solution 16 - Python

I am using python 3.8 in pycharm. I assume that you had not installed "lxml" before you started working. This is what I did:


  1. Go to File -> Settings
  2. Select " Python Interpreter " on the left menu bar of settings, select "Python Interpreter."
  3. Click the "+" icon over the list of packages.
  4. Search for "lxml."
  5. Click "Install Package" on the bottom left of the "Available Package" window.

Solution 17 - Python

This method worked for me. I prefer to mention that I was trying this in the virtual environment. First:

pip install --upgrade bs4

Secondly, I used:

html.parser

instead of

html5lib

Solution 18 - Python

I fixed with below changes

Before changes

soup = BeautifulSoup(r.content, 'html5lib' )
print (soup.prettify())

After change

soup = BeautifulSoup(r.content, features='html')
print(soup.prettify())

my code works properly

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser3773048View Question on Stackoverflow
Solution 1 - PythonJames ErricoView Answer on Stackoverflow
Solution 2 - PythonErnstView Answer on Stackoverflow
Solution 3 - PythonTim SeedView Answer on Stackoverflow
Solution 4 - PythonPikamander2View Answer on Stackoverflow
Solution 5 - Python33Anika33View Answer on Stackoverflow
Solution 6 - PythonShankar VishnuView Answer on Stackoverflow
Solution 7 - PythonBasharView Answer on Stackoverflow
Solution 8 - PythonYogeshView Answer on Stackoverflow
Solution 9 - PythonProjesh BhoumikView Answer on Stackoverflow
Solution 10 - PythonQiao YangView Answer on Stackoverflow
Solution 11 - PythonabhishekPakrashiView Answer on Stackoverflow
Solution 12 - PythonPranav BhendawadeView Answer on Stackoverflow
Solution 13 - Pythonuser176105View Answer on Stackoverflow
Solution 14 - PythonMJimitaterView Answer on Stackoverflow
Solution 15 - PythonblizzView Answer on Stackoverflow
Solution 16 - PythonJd_mahmudView Answer on Stackoverflow
Solution 17 - Pythonabbas abaeiView Answer on Stackoverflow
Solution 18 - PythonShivam BView Answer on Stackoverflow