How to config nltk data directory from code?

PythonPathDirectoryNlpNltk

Python Problem Overview


How to config nltk data directory from code?

Python Solutions


Solution 1 - Python

Just change items of nltk.data.path, it's a simple list.

Solution 2 - Python

From the code, http://www.nltk.org/_modules/nltk/data.html:

> nltk:path: Specifies the file stored in the NLTK data > package at path. NLTK will search for these files in the > directories specified by nltk.data.path.

Then within the code:

######################################################################
# Search Path
######################################################################

path = []
"""A list of directories where the NLTK data package might reside.
   These directories will be checked in order when looking for a
   resource in the data package.  Note that this allows users to
   substitute in their own versions of resources, if they have them
   (e.g., in their home directory under ~/nltk_data)."""

# User-specified locations:
path += [d for d in os.environ.get('NLTK_DATA', str('')).split(os.pathsep) if d]
if os.path.expanduser('~/') != '~/':
    path.append(os.path.expanduser(str('~/nltk_data')))

if sys.platform.startswith('win'):
    # Common locations on Windows:
    path += [
        str(r'C:\nltk_data'), str(r'D:\nltk_data'), str(r'E:\nltk_data'),
        os.path.join(sys.prefix, str('nltk_data')),
        os.path.join(sys.prefix, str('lib'), str('nltk_data')),
        os.path.join(os.environ.get(str('APPDATA'), str('C:\\')), str('nltk_data'))
    ]
else:
    # Common locations on UNIX & OS X:
    path += [
        str('/usr/share/nltk_data'),
        str('/usr/local/share/nltk_data'),
        str('/usr/lib/nltk_data'),
        str('/usr/local/lib/nltk_data')
    ]

To modify the path, simply append to the list of possible paths:

import nltk
nltk.data.path.append("/home/yourusername/whateverpath/")

Or in windows:

import nltk
nltk.data.path.append("C:\somewhere\farfar\away\path")

Solution 3 - Python

I use append, example

nltk.data.path.append('/libs/nltk_data/')

Solution 4 - Python

Instead of adding nltk.data.path.append('your/path/to/nltk_data') to every script, NLTK accepts NLTK_DATA environment variable. (code link)

Open ~/.bashrc (or ~/.profile) with text editor (e.g. nano, vim, gedit), and add following line:

export NLTK_DATA="your/path/to/nltk_data"

Execute source to load environmental variable

source ~/.bashrc


Test

Open python and execute following lines

import nltk
nltk.data.path

Your can see your nltk data path already in there.

Reference: @alvations's answer on nltk/nltk #1997

Solution 5 - Python

For those using uwsgi:

I was having trouble because I wanted a uwsgi app (running as a different user than myself) to have access to nltk data that I had previously downloaded. What worked for me was adding the following line to myapp_uwsgi.ini:

env = NLTK_DATA=/home/myuser/nltk_data/

This sets the environment variable NLTK_DATA, as suggested by @schemacs.
You may need to restart your uwsgi process after making this change.

Solution 6 - Python

Using fnjn's advice above on printing out the path:

print(nltk.data.path)

I saw the path strings in this type of format on windows:

C:\\Users\\my_user_name\\AppData\\Roaming\\SPB_Data

So I switched my path from the python type forward slash '/', to a double backslash '\\' when I used path.append:

nltk.data.path.append("C:\\workspace\\my_project\\data\\nltk_books")

The exception went away.

Solution 7 - Python

Another solution is to get ahead of it.

try import nltk nltk.download()

When the window box pops up asking if you want to download the corpus , you can specify there which directory it is to be downloaded to.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJuanjo ContiView Question on Stackoverflow
Solution 1 - PythonTim McNamaraView Answer on Stackoverflow
Solution 2 - PythonalvasView Answer on Stackoverflow
Solution 3 - PythonbahlumView Answer on Stackoverflow
Solution 4 - PythonfnjnView Answer on Stackoverflow
Solution 5 - PythondanyamachineView Answer on Stackoverflow
Solution 6 - Pythonuser5099519View Answer on Stackoverflow
Solution 7 - PythonSteveView Answer on Stackoverflow