Download all files in a path on Jupyter notebook server
WgetJupyter NotebookJupyterWget Problem Overview
As a user in a class that runs Jupyter notebooks for assignments, I have access to the assignments via the web interface. I assume the assignments are stored somewhere in my personal space on the server, and so I should be able to download them. How can I download all files that are in my personal user space? (e.g., wget
)
Here's the path structure:
https://urltoserver/user/username
There are several directories: assignments, data, etc.
https://urltoserver/user/username/assignments
https://urltoserver/user/username/data
...
I want to download all the folders (recursively). Just enough that I can launch whatever I see online locally. If there are some forbidden folders, then ok, skip those and download the rest.
Please specify the command exactly as I couldn't figure it out myself (I tried wget
)
Wget Solutions
Solution 1 - Wget
Try running this as separate cell in one of your notebooks:
!tar chvfz notebook.tar.gz *
If you want to cover more folders up the tree, write ../
before the *
for every step up the directory. The file notebook.tar.gz will be saved in the same folder as your notebook.
Solution 2 - Wget
I am taking Prof. Andrew Ng's Deeplearning.ai program via Coursera. The curriculum uses Jupyter Notebooks online. Along with the notebooks are folders with large files. Here's what I used to successfully download all assignments with the associated files and folders to my local Windows 10 PC.
Start with the following line of code as suggested in the post by Serzan Akhmetov above:
!tar cvfz allfiles.tar.gz *
This produces a tarball which, if small enough, can be downloaded from the Jupyter notebook itself and unzipped using 7-Zip. However, this course has individual files of size 100's of MB and folders with 100's of sample images. The resulting tarball is too large to download via browser.
So add one more line of code to split files into manageable chunk sizes as follows:
!split -b 50m allfiles.tar.gz allfiles.tar.gz.part.
This will split the archive into multiple parts each of size 50 Mb (or your preferred size setting). Each part will have an extension like allfiles.tar.gz.part.xx
. Download each part as before.
The final task is to untar the multi-part archive. This is very simple with 7-Zip. Just select the first file in the series for extraction with 7-Zip. This is the file named allfiles.tar.gz.part.aa
for the example used. It will pull all the necessary parts together as long as they are in the same folder.
Hope this helps add to Serzan's excellent answer above.
Solution 3 - Wget
You can create a new terminal from the "New" menu and call the command described on https://stackoverflow.com/a/47355754/8554972:
tar cvfz notebook.tar.gz *
The file notebook.tar.gz will be saved in the same folder as your notebook.
Solution 4 - Wget
The easiest way is to archive all content using tar, but there is also an API for files downloading.
GET /files/_FILE_PATH_
To get all files in folder you can use:
GET /api/contents/work
Example:
curl https://server/api/contents?token=your_token
curl https://server/files/path/to/file.txt?token=your_token --output some.file
Source: Jupyter Docs
Solution 5 - Wget
Try first to get the directory by:
import os
os.getcwd()
And then use snipped from https://stackoverflow.com/questions/1855095/how-to-create-a-zip-archive-of-a-directory. You can download complete directory by zipping it. Good luck!
Solution 6 - Wget
I don't think this is possible with wget
, even with the wget -r
option. You may have to download them individually (using the Download option in the dashboard view (which is only available on single, non-directory, non-running notebook items) if that is available to you.
However, it is likely that you are not able to download them since if your teacher is using grading software like nbgrader then the students having access to the notebooks themselves is undesirable - since the notebooks can contain information about the answers as well.
Solution 7 - Wget
from google.colab import files
files.download("/content/data.txt")
These lines might work if you are working in a google colab or Jupyter notebook.
The first line imports the library files The second one, downloads your created file, example:"data.txt" (your file name) located inside content folder.
Solution 8 - Wget
I've made a slightly update based on @Sun Bee's solution, and it will allow you to create multiple file backup with a timestamp subfix.
!tar cvfz allfiles-`date +"%Y%m%d-%H%M"`.tar.gz *
Solution 9 - Wget
you just need to do
zip -r filename.zip folder_name