How to add package data recursively in Python setup.py?

PythonDistutilssetup.py

Python Problem Overview


I have a new library that has to include a lot of subfolders of small datafiles, and I'm trying to add them as package data. Imagine I have my library as so:

 library
    - foo.py
    - bar.py
 data
   subfolderA
      subfolderA1
      subfolderA2
   subfolderB
      subfolderB1 
      ...

I want to add all of the data in all of the subfolders through setup.py, but it seems like I manually have to go into every single subfolder (there are 100 or so) and add an init.py file. Furthermore, will setup.py find these files recursively, or do I need to manually add all of these in setup.py like:

package_data={
  'mypackage.data.folderA': ['*'],
  'mypackage.data.folderA.subfolderA1': ['*'],
  'mypackage.data.folderA.subfolderA2': ['*']
   },

I can do this with a script, but seems like a super pain. How can I achieve this in setup.py?

PS, the hierarchy of these folders is important because this is a database of material files and we want the file tree to be preserved when we present them in a GUI to the user, so it would be to our advantage to keep this file structure intact.

Python Solutions


Solution 1 - Python

The problem with the glob answer is that it only does so much. I.e. it's not fully recursive. The problem with the copy_tree answer is that the files that are copied will be left behind on an uninstall.

The proper solution is a recursive one which will let you set the package_data parameter in the setup call.

I've written this small method to do this:

import os

def package_files(directory):
    paths = []
    for (path, directories, filenames) in os.walk(directory):
        for filename in filenames:
            paths.append(os.path.join('..', path, filename))
    return paths

extra_files = package_files('path_to/extra_files_dir')

setup(
    ...
    packages = ['package_name'],
    package_data={'': extra_files},
    ....
)

You'll notice that when you do a pip uninstall package_name, that you'll see your additional files being listed (as tracked with the package).

Solution 2 - Python

  1. Use Setuptools instead of distutils.

  2. Use data files instead of package data. These do not require __init__.py.

  3. Generate the lists of files and directories using standard Python code, instead of writing it literally:

     data_files = []
     directories = glob.glob('data/subfolder?/subfolder??/')
     for directory in directories:
         files = glob.glob(directory+'*')
         data_files.append((directory, files))
     # then pass data_files to setup()
    

Solution 3 - Python

To add all the subfolders using package_data in setup.py: add the number of * entries based on you subdirectory structure

package_data={
  'mypackage.data.folderA': ['*','*/*','*/*/*'],
}

Solution 4 - Python

Use glob to select all subfolders in your setup.py

...
packages=['your_package'],
package_data={'your_package': ['data/**/*']},
...

Solution 5 - Python

@gbonetti's answer, using a recursive glob pattern, i.e. **, would be perfect.

However, as commented by @daniel-himmelstein, that does not work yet in setuptools package_data.

So, for the time being, I like to use the following workaround, based on pathlib's Path.glob():

def glob_fix(package_name, glob):
    # this assumes setup.py lives in the folder that contains the package
    package_path = Path(f'./{package_name}').resolve()
    return [str(path.relative_to(package_path)) 
            for path in package_path.glob(glob)]

This returns a list of path strings relative to the package path, as required.

Here's one way to use this:

setuptools.setup(
    ...
    package_data={'my_package': [*glob_fix('my_package', 'my_data_dir/**/*'), 
                                 'my_other_dir/some.file', ...], ...},
    ...
)

The glob_fix() can be removed as soon as setuptools supports ** in package_data.

Solution 6 - Python

If you don't have any problem with getting your setup.py code dirty use distutils.dir_util.copy_tree.
The whole problem is how to exclude files from it.
Heres some the code:

import os.path
from distutils import dir_util
from distutils import sysconfig
from distutils.core import setup

__packagename__ = 'x' 
setup(
    name = __packagename__,
    packages = [__packagename__],
)

destination_path = sysconfig.get_python_lib()
package_path = os.path.join(destination_path, __packagename__)

dir_util.copy_tree(__packagename__, package_path, update=1, preserve_mode=0)

Some Notes:

  • This code recursively copy the source code into the destination path.
  • You can just use the same setup(...) but use copy_tree() to extend the directory you want into the path of installation.
  • The default paths of distutil installation can be found in it's API.
  • More information about copy_tree() module of distutils can be found here.

  • Solution 7 - Python

    I can suggest a little code to add data_files in setup():

    data_files = []
    
    start_point = os.path.join(__pkgname__, 'static')
    for root, dirs, files in os.walk(start_point):
        root_files = [os.path.join(root, i) for i in files]
        data_files.append((root, root_files))
    
    start_point = os.path.join(__pkgname__, 'templates')
    for root, dirs, files in os.walk(start_point):
        root_files = [os.path.join(root, i) for i in files]
        data_files.append((root, root_files))
    
    setup(
        name = __pkgname__,
        description = __description__,
        version = __version__,
        long_description = README,
        ...
        data_files = data_files,
    )
    

    Solution 8 - Python

    > I can do this with a script, but seems like a super pain. How can I achieve this in setup.py?

    Here is a reusable, simple way:

    Add the following function in your setup.py, and call it as per the Usage instructions. This is essentially the generic version of the accepted answer.

    def find_package_data(specs):
        """recursively find package data as per the folders given
    
        Usage:
            # in setup.py
            setup(...
                  include_package_data=True,
                  package_data=find_package_data({
                     'package': ('resources', 'static')
                  }))
    
        Args:
            specs (dict): package => list of folder names to include files from
    
        Returns:
            dict of list of file names
        """
        return {
            package: list(''.join(n.split('/', 1)[1:]) for n in
                          flatten(glob('{}/{}/**/*'.format(package, f), recursive=True) for f in folders))
            for package, folders in specs.items()}
    
    

    Solution 9 - Python

    I'm going to throw my solution in here in case anyone is looking for a clean way to include their compiled sphinx docs as data_files.

    setup.py

    from setuptools import setup
    import pathlib
    import os
    
    here = pathlib.Path(__file__).parent.resolve()
    
    # Get documentation files from the docs/build/html directory
    documentation = [doc.relative_to(here) for doc in here.glob("docs/build/html/**/*") if pathlib.Path.is_file(doc)]
    data_docs = {}
    for doc in documentation:
        doc_path = os.path.join("your_top_data_dir", "docs")
        path_parts = doc.parts[3:-1]  # remove "docs/build/html", ignore filename
        if path_parts:
            doc_path = os.path.join(doc_path, *path_parts)
        # create all appropriate subfolders and append relative doc path
        data_docs.setdefault(doc_path, []).append(str(doc))
    
    setup(
        ...
        include_package_data=True,
        # <sys.prefix>/your_top_data_dir
        data_files=[("your_top_data_dir", ["data/test-credentials.json"]), *list(data_docs.items())]
    )
    

    With the above solution, once you install your package you'll have all the compiled documentation available at os.path.join(sys.prefix, "your_top_data_dir", "docs"). So, if you wanted to serve the now-static docs using nginx you could add the following to your nginx file:

    location /docs {
        # handle static files directly, without forwarding to the application
        alias /www/your_app_name/venv/your_top_data_dir/docs;
        expires 30d;
    }
    

    Once you've done that, you should be able to visit {your-domain.com}/docs and see your Sphinx documentation.

    Solution 10 - Python

    If you don't want to add custom code to iterate through the directory contents, you can use pbr library, which extends setuptools. See here for documentation on how to use it to copy an entire directory, preserving the directory structure:

    https://docs.openstack.org/pbr/latest/user/using.html#files

    Solution 11 - Python

    You need to write a function to return all files and its paths , you can use the following

    def sherinfind():
        # Add all folders contain files or other sub directories 
        pathlist=['templates/','scripts/']
        data={}        
        for path in pathlist:
            for root,d_names,f_names in os.walk(path,topdown=True, onerror=None, followlinks=False):
                data[root]=list()
                for f in f_names:
                    data[root].append(os.path.join(root, f))                
        
        fn=[(k,v) for k,v in data.items()]    
        return fn
    

    Now change the data_files in setup() as follows,

    data_files=sherinfind()
    

    Attributions

    All content for this solution is sourced from the original question on Stackoverflow.

    The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

    Content TypeOriginal AuthorOriginal Content on Stackoverflow
    QuestionDashing Adam HughesView Question on Stackoverflow
    Solution 1 - PythonSandy ChapmanView Answer on Stackoverflow
    Solution 2 - PythonKevinView Answer on Stackoverflow
    Solution 3 - PythonmaheshView Answer on Stackoverflow
    Solution 4 - PythongbonettiView Answer on Stackoverflow
    Solution 5 - PythondjvgView Answer on Stackoverflow
    Solution 6 - PythonHeartagramirView Answer on Stackoverflow
    Solution 7 - PythonStanView Answer on Stackoverflow
    Solution 8 - PythonmiraculixxView Answer on Stackoverflow
    Solution 9 - PythonCaffeinatedMikeView Answer on Stackoverflow
    Solution 10 - PythonokahilakView Answer on Stackoverflow
    Solution 11 - PythonsherinView Answer on Stackoverflow