NLTK and Stopwords Fail #lookuperror
PythonNltkSentiment AnalysisStop WordsPython Problem Overview
I am trying to start a project of sentiment analysis and I will use the stop words method. I made some research and I found that nltk have stopwords but when I execute the command there is an error.
What I do is the following, in order to know which are the words that nltk use (like what you may found here http://www.nltk.org/book/ch02.html in section4.1):
from nltk.corpus import stopwords
stopwords.words('english')
But when I press enter I obtain
---------------------------------------------------------------------------
LookupError Traceback (most recent call last)
<ipython-input-6-ff9cd17f22b2> in <module>()
----> 1 stopwords.words('english')
C:\Users\Usuario\Anaconda\lib\site-packages\nltk\corpus\util.pyc in __getattr__(self, attr)
66
67 def __getattr__(self, attr):
---> 68 self.__load()
69 # This looks circular, but its not, since __load() changes our
70 # __class__ to something new:
C:\Users\Usuario\Anaconda\lib\site-packages\nltk\corpus\util.pyc in __load(self)
54 except LookupError, e:
55 try: root = nltk.data.find('corpora/%s' % zip_name)
---> 56 except LookupError: raise e
57
58 # Load the corpus.
LookupError:
**********************************************************************
Resource 'corpora/stopwords' not found. Please use the NLTK
Downloader to obtain the resource: >>> nltk.download()
Searched in:
- 'C:\\Users\\Meru/nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
- 'C:\\Users\\Meru\\Anaconda\\nltk_data'
- 'C:\\Users\\Meru\\Anaconda\\lib\\nltk_data'
- 'C:\\Users\\Meru\\AppData\\Roaming\\nltk_data'
**********************************************************************
And, because of this problem things like this cannot run properly (obtaining the same error):
>>> from nltk.corpus import stopwords
>>> stop = stopwords.words('english')
>>> sentence = "this is a foo bar sentence"
>>> print [i for i in sentence.split() if i not in stop]
Do you know what may be problem? I must use words in Spanish, do you recomend another method? I also thought using Goslate package with datasets in english
Thanks for reading!
P.D.: I use Ananconda
Python Solutions
Solution 1 - Python
You don't seem to have the stopwords corpus on your computer.
You need to start the NLTK Downloader and download all the data you need.
Open a Python console and do the following:
>>> import nltk
>>> nltk.download()
showing info http://nltk.github.com/nltk_data/
In the GUI window that opens simply press the 'Download' button to download all corpora or go to the 'Corpora' tab and only download the ones you need/want.
Solution 2 - Python
I tried from ubuntu terminal and I don't know why the GUI didn't show up according to tttthomasssss answer. So I followed the comment from KLDavenport and it worked. Here is the summary:
Open your terminal/command-line and type python then
>>> import nltk .>>> nltk.download("stopwords")
This will store the stopwords corpus under the nltk_data. For my case it was /home/myusername/nltk_data/corpora/stopwords
.
If you need another corpus then visit nltk data and find the corpus with their ID. Then use the ID to download like we did for stopwords.
Solution 3 - Python
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
STOPWORDS = set(stopwords.words('english'))
Solution 4 - Python
If you want to manually install NLTK Corpus.
-
Go to http://www.nltk.org/nltk_data/ and download your desired NLTK Corpus file.
-
Now in a Python shell check the value of nltk.data.path
-
Choose one of the path that exists on your machine, and unzip the data files into the corpora sub directory inside.
-
Now you can import the data from nltk.corpos import stopwords
Reference: https://medium.com/@satorulogic/how-to-manually-download-a-nltk-corpus-f01569861da9
Solution 5 - Python
import nltk
nltk.download()
- A GUI pops up and in that go the Corpora section, select the required corpus.
- Verified Result
Solution 6 - Python
You can use the following commands
import nltk
nltk.download()
After hitting enter, a popup will open up, from where you can download all the required corpora and other nltk tools as well.
Solution 7 - Python
import nltk
nltk.download()
Click on download button when gui prompted. It worked for me.(nltk.download('stopwords')
doesn't work for me)