Access data in package subdirectory

PythonPackage

Python Problem Overview


I am writing a python package with modules that need to open data files in a ./data/ subdirectory. Right now I have the paths to the files hardcoded into my classes and functions. I would like to write more robust code that can access the subdirectory regardless of where it is installed on the user's system.

I've tried a variety of methods, but so far I have had no luck. It seems that most of the "current directory" commands return the directory of the system's python interpreter, and not the directory of the module.

This seems like it ought to be a trivial, common problem. Yet I can't seem to figure it out. Part of the problem is that my data files are not .py files, so I can't use import functions and the like.

Any suggestions?

Right now my package directory looks like:

/
__init__.py
module1.py
module2.py
data/   
   data.txt

I am trying to access data.txt from module*.py!

Python Solutions


Solution 1 - Python

The standard way to do this is with setuptools packages and pkg_resources.

You can lay out your package according to the following hierarchy, and configure the package setup file to point it your data resources, as per this link:

http://docs.python.org/distutils/setupscript.html#installing-package-data

You can then re-find and use those files using pkg_resources, as per this link:

http://peak.telecommunity.com/DevCenter/PkgResources#basic-resource-access

import pkg_resources

DATA_PATH = pkg_resources.resource_filename('<package name>', 'data/')
DB_FILE = pkg_resources.resource_filename('<package name>', 'data/sqlite.db')

Solution 2 - Python

You can use __file__ to get the path to the package, like this:

import os
this_dir, this_filename = os.path.split(__file__)
DATA_PATH = os.path.join(this_dir, "data", "data.txt")
print open(DATA_PATH).read()

Solution 3 - Python

There is often not point in making an answer that details code that does not work as is, but I believe this to be an exception. Python 3.7 added importlib.resources that is supposed to replace pkg_resources. It would work for accessing files within packages that do not have slashes in their names, i.e.

foo/
    __init__.py
    module1.py
    module2.py
    data/   
       data.txt
    data2.txt

i.e. you could access data2.txt inside package foo with for example

importlib.resources.open_binary('foo', 'data2.txt')

but it would fail with an exception for

>>> importlib.resources.open_binary('foo', 'data/data.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/importlib/resources.py", line 87, in open_binary
    resource = _normalize_path(resource)
  File "/usr/lib/python3.7/importlib/resources.py", line 61, in _normalize_path
    raise ValueError('{!r} must be only a file name'.format(path))
ValueError: 'data/data2.txt' must be only a file name

This cannot be fixed except by placing __init__.py in data and then using it as a package:

importlib.resources.open_binary('foo.data', 'data.txt')

The reason for this behaviour is "it is by design"; but the design might change...

Solution 4 - Python

To provide a solution working today. Definitely use this API to not reinvent all those wheels.

A true filesystem filename is needed. Zipped eggs will be extracted to a cache directory:

from pkg_resources import resource_filename, Requirement

path_to_vik_logo = resource_filename(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png")

Return a readable file-like object for the specified resource; it may be an actual file, a StringIO, or some similar object. The stream is in “binary mode”, in the sense that whatever bytes are in the resource will be read as-is.

from pkg_resources import resource_stream, Requirement

vik_logo_as_stream = resource_stream(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png")

Package Discovery and Resource Access using pkg_resources

Solution 5 - Python

You need a name for your whole module, you're given directory tree doesn't list that detail, for me this worked:

import pkg_resources
print(    
    pkg_resources.resource_filename(__name__, 'data/data.txt')
)

Notibly setuptools does not appear to resolve files based on a name match with packed data files, soo you're gunna have to include the data/ prefix pretty much no matter what. You can use os.path.join('data', 'data.txt) if you need alternate directory separators, Generally I find no compatibility problems with hard-coded unix style directory separators though.

Solution 6 - Python

I think I hunted down an answer.

I make a module data_path.py, which I import into my other modules containing:

data_path = os.path.join(os.path.dirname(__file__),'data')

And then I open all my files with

open(os.path.join(data_path,'filename'), <param>)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJacob LylesView Question on Stackoverflow
Solution 1 - Pythonelliot42View Answer on Stackoverflow
Solution 2 - PythonRichieHindleView Answer on Stackoverflow
Solution 3 - PythonAntti Haapala -- Слава УкраїніView Answer on Stackoverflow
Solution 4 - PythonSascha GottfriedView Answer on Stackoverflow
Solution 5 - PythonThorSummonerView Answer on Stackoverflow
Solution 6 - PythonJacob LylesView Answer on Stackoverflow