Download all files in a path on Jupyter notebook server

WgetJupyter NotebookJupyter

Wget Problem Overview


As a user in a class that runs Jupyter notebooks for assignments, I have access to the assignments via the web interface. I assume the assignments are stored somewhere in my personal space on the server, and so I should be able to download them. How can I download all files that are in my personal user space? (e.g., wget)

Here's the path structure:

https://urltoserver/user/username

There are several directories: assignments, data, etc.

https://urltoserver/user/username/assignments

https://urltoserver/user/username/data

...

I want to download all the folders (recursively). Just enough that I can launch whatever I see online locally. If there are some forbidden folders, then ok, skip those and download the rest.

Please specify the command exactly as I couldn't figure it out myself (I tried wget)

Wget Solutions


Solution 1 - Wget

Try running this as separate cell in one of your notebooks:

!tar chvfz notebook.tar.gz *

If you want to cover more folders up the tree, write ../ before the * for every step up the directory. The file notebook.tar.gz will be saved in the same folder as your notebook.

Solution 2 - Wget

I am taking Prof. Andrew Ng's Deeplearning.ai program via Coursera. The curriculum uses Jupyter Notebooks online. Along with the notebooks are folders with large files. Here's what I used to successfully download all assignments with the associated files and folders to my local Windows 10 PC.

Start with the following line of code as suggested in the post by Serzan Akhmetov above:

!tar cvfz allfiles.tar.gz *

This produces a tarball which, if small enough, can be downloaded from the Jupyter notebook itself and unzipped using 7-Zip. However, this course has individual files of size 100's of MB and folders with 100's of sample images. The resulting tarball is too large to download via browser.

So add one more line of code to split files into manageable chunk sizes as follows:

!split -b 50m allfiles.tar.gz allfiles.tar.gz.part.

This will split the archive into multiple parts each of size 50 Mb (or your preferred size setting). Each part will have an extension like allfiles.tar.gz.part.xx. Download each part as before.

The final task is to untar the multi-part archive. This is very simple with 7-Zip. Just select the first file in the series for extraction with 7-Zip. This is the file named allfiles.tar.gz.part.aa for the example used. It will pull all the necessary parts together as long as they are in the same folder.

Hope this helps add to Serzan's excellent answer above.

Solution 3 - Wget

You can create a new terminal from the "New" menu and call the command described on https://stackoverflow.com/a/47355754/8554972:

tar cvfz notebook.tar.gz *

The file notebook.tar.gz will be saved in the same folder as your notebook.

Solution 4 - Wget

The easiest way is to archive all content using tar, but there is also an API for files downloading.

GET	/files/_FILE_PATH_

To get all files in folder you can use:

GET	/api/contents/work

Example:

curl https://server/api/contents?token=your_token
curl https://server/files/path/to/file.txt?token=your_token --output some.file

Source: Jupyter Docs

Solution 5 - Wget

Try first to get the directory by:

import os
os.getcwd()

And then use snipped from https://stackoverflow.com/questions/1855095/how-to-create-a-zip-archive-of-a-directory. You can download complete directory by zipping it. Good luck!

Solution 6 - Wget

I don't think this is possible with wget, even with the wget -r option. You may have to download them individually (using the Download option in the dashboard view (which is only available on single, non-directory, non-running notebook items) if that is available to you.

However, it is likely that you are not able to download them since if your teacher is using grading software like nbgrader then the students having access to the notebooks themselves is undesirable - since the notebooks can contain information about the answers as well.

Solution 7 - Wget

from google.colab import files

files.download("/content/data.txt")

These lines might work if you are working in a google colab or Jupyter notebook.

The first line imports the library files The second one, downloads your created file, example:"data.txt" (your file name) located inside content folder.

Solution 8 - Wget

I've made a slightly update based on @Sun Bee's solution, and it will allow you to create multiple file backup with a timestamp subfix.

!tar cvfz allfiles-`date +"%Y%m%d-%H%M"`.tar.gz *

Solution 9 - Wget

you just need to do

zip -r filename.zip folder_name

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAliView Question on Stackoverflow
Solution 1 - WgetSerzhan AkhmetovView Answer on Stackoverflow
Solution 2 - WgetSun BeeView Answer on Stackoverflow
Solution 3 - WgetRafael MiquelinoView Answer on Stackoverflow
Solution 4 - WgetAlexanderView Answer on Stackoverflow
Solution 5 - WgetEddmikView Answer on Stackoverflow
Solution 6 - WgetLouise DaviesView Answer on Stackoverflow
Solution 7 - WgetSantiago TorresView Answer on Stackoverflow
Solution 8 - WgetericView Answer on Stackoverflow
Solution 9 - WgetMeet PatelView Answer on Stackoverflow