How to add package data recursively in Python setup.py?
PythonDistutilssetup.pyPython Problem Overview
I have a new library that has to include a lot of subfolders of small datafiles, and I'm trying to add them as package data. Imagine I have my library as so:
library
- foo.py
- bar.py
data
subfolderA
subfolderA1
subfolderA2
subfolderB
subfolderB1
...
I want to add all of the data in all of the subfolders through setup.py, but it seems like I manually have to go into every single subfolder (there are 100 or so) and add an init.py file. Furthermore, will setup.py find these files recursively, or do I need to manually add all of these in setup.py like:
package_data={
'mypackage.data.folderA': ['*'],
'mypackage.data.folderA.subfolderA1': ['*'],
'mypackage.data.folderA.subfolderA2': ['*']
},
I can do this with a script, but seems like a super pain. How can I achieve this in setup.py?
PS, the hierarchy of these folders is important because this is a database of material files and we want the file tree to be preserved when we present them in a GUI to the user, so it would be to our advantage to keep this file structure intact.
Python Solutions
Solution 1 - Python
The problem with the glob
answer is that it only does so much. I.e. it's not fully recursive. The problem with the copy_tree
answer is that the files that are copied will be left behind on an uninstall.
The proper solution is a recursive one which will let you set the package_data
parameter in the setup call.
I've written this small method to do this:
import os
def package_files(directory):
paths = []
for (path, directories, filenames) in os.walk(directory):
for filename in filenames:
paths.append(os.path.join('..', path, filename))
return paths
extra_files = package_files('path_to/extra_files_dir')
setup(
...
packages = ['package_name'],
package_data={'': extra_files},
....
)
You'll notice that when you do a pip uninstall package_name
, that you'll see your additional files being listed (as tracked with the package).
Solution 2 - Python
-
Use Setuptools instead of distutils.
-
Use data files instead of package data. These do not require
__init__.py
. -
Generate the lists of files and directories using standard Python code, instead of writing it literally:
data_files = [] directories = glob.glob('data/subfolder?/subfolder??/') for directory in directories: files = glob.glob(directory+'*') data_files.append((directory, files)) # then pass data_files to setup()
Solution 3 - Python
To add all the subfolders using package_data in setup.py: add the number of * entries based on you subdirectory structure
package_data={
'mypackage.data.folderA': ['*','*/*','*/*/*'],
}
Solution 4 - Python
Use glob to select all subfolders in your setup.py
...
packages=['your_package'],
package_data={'your_package': ['data/**/*']},
...
Solution 5 - Python
@gbonetti's answer, using a recursive glob pattern, i.e. **
, would be perfect.
However, as commented by @daniel-himmelstein, that does not work yet in setuptools package_data
.
So, for the time being, I like to use the following workaround, based on pathlib
's Path.glob():
def glob_fix(package_name, glob):
# this assumes setup.py lives in the folder that contains the package
package_path = Path(f'./{package_name}').resolve()
return [str(path.relative_to(package_path))
for path in package_path.glob(glob)]
This returns a list of path strings relative to the package path, as required.
Here's one way to use this:
setuptools.setup(
...
package_data={'my_package': [*glob_fix('my_package', 'my_data_dir/**/*'),
'my_other_dir/some.file', ...], ...},
...
)
The glob_fix()
can be removed as soon as setuptools supports **
in package_data
.
Solution 6 - Python
If you don't have any problem with getting your setup.py code dirty use distutils.dir_util.copy_tree
.
The whole problem is how to exclude files from it.
Heres some the code:
import os.path
from distutils import dir_util
from distutils import sysconfig
from distutils.core import setup
__packagename__ = 'x'
setup(
name = __packagename__,
packages = [__packagename__],
)
destination_path = sysconfig.get_python_lib()
package_path = os.path.join(destination_path, __packagename__)
dir_util.copy_tree(__packagename__, package_path, update=1, preserve_mode=0)
Some Notes:
setup(...)
but use copy_tree()
to extend the directory you want into the path of installation.Solution 7 - Python
I can suggest a little code to add data_files in setup():
data_files = []
start_point = os.path.join(__pkgname__, 'static')
for root, dirs, files in os.walk(start_point):
root_files = [os.path.join(root, i) for i in files]
data_files.append((root, root_files))
start_point = os.path.join(__pkgname__, 'templates')
for root, dirs, files in os.walk(start_point):
root_files = [os.path.join(root, i) for i in files]
data_files.append((root, root_files))
setup(
name = __pkgname__,
description = __description__,
version = __version__,
long_description = README,
...
data_files = data_files,
)
Solution 8 - Python
> I can do this with a script, but seems like a super pain. How can I achieve this in setup.py?
Here is a reusable, simple way:
Add the following function in your setup.py
, and call it as per the Usage instructions. This is essentially the generic version of the accepted answer.
def find_package_data(specs):
"""recursively find package data as per the folders given
Usage:
# in setup.py
setup(...
include_package_data=True,
package_data=find_package_data({
'package': ('resources', 'static')
}))
Args:
specs (dict): package => list of folder names to include files from
Returns:
dict of list of file names
"""
return {
package: list(''.join(n.split('/', 1)[1:]) for n in
flatten(glob('{}/{}/**/*'.format(package, f), recursive=True) for f in folders))
for package, folders in specs.items()}
Solution 9 - Python
I'm going to throw my solution in here in case anyone is looking for a clean way to include their compiled sphinx docs as data_files
.
setup.py
from setuptools import setup
import pathlib
import os
here = pathlib.Path(__file__).parent.resolve()
# Get documentation files from the docs/build/html directory
documentation = [doc.relative_to(here) for doc in here.glob("docs/build/html/**/*") if pathlib.Path.is_file(doc)]
data_docs = {}
for doc in documentation:
doc_path = os.path.join("your_top_data_dir", "docs")
path_parts = doc.parts[3:-1] # remove "docs/build/html", ignore filename
if path_parts:
doc_path = os.path.join(doc_path, *path_parts)
# create all appropriate subfolders and append relative doc path
data_docs.setdefault(doc_path, []).append(str(doc))
setup(
...
include_package_data=True,
# <sys.prefix>/your_top_data_dir
data_files=[("your_top_data_dir", ["data/test-credentials.json"]), *list(data_docs.items())]
)
With the above solution, once you install your package you'll have all the compiled documentation available at os.path.join(sys.prefix, "your_top_data_dir", "docs")
. So, if you wanted to serve the now-static docs using nginx you could add the following to your nginx file:
location /docs {
# handle static files directly, without forwarding to the application
alias /www/your_app_name/venv/your_top_data_dir/docs;
expires 30d;
}
Once you've done that, you should be able to visit {your-domain.com}/docs
and see your Sphinx documentation.
Solution 10 - Python
If you don't want to add custom code to iterate through the directory contents, you can use pbr
library, which extends setuptools
. See here for documentation on how to use it to copy an entire directory, preserving the directory structure:
Solution 11 - Python
You need to write a function to return all files and its paths , you can use the following
def sherinfind():
# Add all folders contain files or other sub directories
pathlist=['templates/','scripts/']
data={}
for path in pathlist:
for root,d_names,f_names in os.walk(path,topdown=True, onerror=None, followlinks=False):
data[root]=list()
for f in f_names:
data[root].append(os.path.join(root, f))
fn=[(k,v) for k,v in data.items()]
return fn
Now change the data_files in setup() as follows,
data_files=sherinfind()