Corpora/stopwords not found when import nltk library
PythonNltkPython Problem Overview
I trying to import the nltk package in python 2.7
import nltk
stopwords = nltk.corpus.stopwords.words('english')
print(stopwords[:10])
Running this gives me the following error:
LookupError:
**********************************************************************
Resource 'corpora/stopwords' not found. Please use the NLTK
Downloader to obtain the resource: >>> nltk.download()
So therefore I open my python termin and did the following:
import nltk
nltk.download()
Which gives me:
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
However this does not seem to stop. And running it again still gives me the same error. Any thoughts where this goes wrong?
Python Solutions
Solution 1 - Python
You are currently trying to download every item in nltk data, so this can take long. You can try downloading only the stopwords that you need:
import nltk
nltk.download('stopwords')
Or from command line (thanks to Rafael Valero's answer):
python -m nltk.downloader stopwords
Reference:
Solution 2 - Python
The some as mentioned here by Kurt Bourbaki but in the command line:
python -m nltk.downloader stopwords
Solution 3 - Python
You can do this in separately in console.
It will give you a result.
import nltk
nltk.download('stopwords')
I used jupyter console when I faced this problem.
Solution 4 - Python
if you get an SSL/Certificate error, run the following command.
This works by disabling SSL check!
import nltk
import ssl
try:
_create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
pass
else:
ssl._create_default_https_context = _create_unverified_https_context
nltk.download()
Solution 5 - Python
If your PC uses proxy for connectivity, then try this:
import nltk
nltk.set_proxy('http://proxy.example.com:3128', ('USERNAME', 'PASSWORD'))
nltk.download('stopwords')
Solution 6 - Python
Use GPU runtime, it will not give you any error.
The same code will work which you are using
import nltk
stopwords = nltk.corpus.stopwords.words('english')
print(stopwords[:10])
Solution 7 - Python
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
If you are running this command in a jupyter notebook, it opens another window titled 'NLTK Downloader'. Once you go in that window, you can select the topics you want to download and then click on download button to start downloading.
Until you close the NLTK Downloader window, the cell in the Jupyter keeps on running.
Solution 8 - Python
I know the comment is quite late, but if it helps:
Although the nltk.download('stopwords')
will do the job, there might be times when it won't work due to proxy issues if your organization has blocked it.
I found this github link pretty handy, from where I can just pick up the list of words and integrate it manually in my project just as a workaround.
Solution 9 - Python
check what error you are getting --
python3 -m nltk.downloader stopwords
Error :
RuntimeWarning: 'nltk.downloader' found in sys.modules after import of package 'nltk', but prior to execution of 'nltk.downloader'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
[nltk_data] Error loading stopwords: <urlopen error [SSL:
[nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data] unable to get local issuer certificate (_ssl.c:1123)>
Use the solution provided my @reshma2k