How does HTTP file upload work?

HttpFile Upload

Http Problem Overview


When I submit a simple form like this with a file attached:

<form enctype="multipart/form-data" action="http://localhost:3000/upload?upload_progress_id=12344" method="POST">
<input type="hidden" name="MAX_FILE_SIZE" value="100000" />
Choose a file to upload: <input name="uploadedfile" type="file" /><br />
<input type="submit" value="Upload File" />
</form>

How does it send the file internally? Is the file sent as part of the HTTP body as data? In the headers of this request, I don't see anything related to the name of the file.

I just would like the know the internal workings of the HTTP when sending a file.

Http Solutions


Solution 1 - Http

Let's take a look at what happens when you select a file and submit your form (I've truncated the headers for brevity):

POST /upload?upload_progress_id=12344 HTTP/1.1
Host: localhost:3000
Content-Length: 1325
Origin: http://localhost:3000
... other headers ...
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryePkpFF7tjBAqx29L

------WebKitFormBoundaryePkpFF7tjBAqx29L
Content-Disposition: form-data; name="MAX_FILE_SIZE"

100000
------WebKitFormBoundaryePkpFF7tjBAqx29L
Content-Disposition: form-data; name="uploadedfile"; filename="hello.o"
Content-Type: application/x-object

... contents of file goes here ...
------WebKitFormBoundaryePkpFF7tjBAqx29L--

NOTE: each boundary string must be prefixed with an extra --, just like in the end of the last boundary string. The example above already includes this, but it can be easy to miss. See comment by @Andreas below.

Instead of URL encoding the form parameters, the form parameters (including the file data) are sent as sections in a multipart document in the body of the request.

In the example above, you can see the input MAX_FILE_SIZE with the value set in the form, as well as a section containing the file data. The file name is part of the Content-Disposition header.

The full details are here.

Solution 2 - Http

> How does it send the file internally?

The format is called multipart/form-data, as asked at: https://stackoverflow.com/questions/4526273/what-does-enctype-multipart-form-data-mean

I'm going to:

  • add some more HTML5 references
  • explain why he is right with a form submit example

HTML5 references

There are three possibilities for enctype:

How to generate the examples

Once you see an example of each method, it becomes obvious how they work, and when you should use each one.

You can produce examples using:

Save the form to a minimal .html file:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8"/>
  <title>upload</title>
</head>
<body>
  <form action="http://localhost:8000" method="post" enctype="multipart/form-data">
  <p><input type="text" name="text1" value="text default">
  <p><input type="text" name="text2" value="a&#x03C9;b">
  <p><input type="file" name="file1">
  <p><input type="file" name="file2">
  <p><input type="file" name="file3">
  <p><button type="submit">Submit</button>
</form>
</body>
</html>

We set the default text value to a&#x03C9;b, which means aωb because ω is U+03C9, which are the bytes 61 CF 89 62 in UTF-8.

Create files to upload:

echo 'Content of a.txt.' > a.txt

echo '<!DOCTYPE html><title>Content of a.html.</title>' > a.html

# Binary file containing 4 bytes: 'a', 1, 2 and 'b'.
printf 'a\xCF\x89b' > binary

Run our little echo server:

while true; do printf '' | nc -l 8000 localhost; done

Open the HTML on your browser, select the files and click on submit and check the terminal.

nc prints the request received.

Tested on: Ubuntu 14.04.3, nc BSD 1.105, Firefox 40.

multipart/form-data

Firefox sent:

POST / HTTP/1.1
[[ Less interesting headers ... ]]
Content-Type: multipart/form-data; boundary=---------------------------735323031399963166993862150
Content-Length: 834

-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="text1"

text default
-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="text2"

aωb
-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="file1"; filename="a.txt"
Content-Type: text/plain

Content of a.txt.

-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="file2"; filename="a.html"
Content-Type: text/html

<!DOCTYPE html><title>Content of a.html.</title>

-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="file3"; filename="binary"
Content-Type: application/octet-stream

aωb
-----------------------------735323031399963166993862150--

For the binary file and text field, the bytes 61 CF 89 62 (aωb in UTF-8) are sent literally. You could verify that with nc -l localhost 8000 | hd, which says that the bytes:

61 CF 89 62

were sent (61 == 'a' and 62 == 'b').

Therefore it is clear that:

  • Content-Type: multipart/form-data; boundary=---------------------------735323031399963166993862150 sets the content type to multipart/form-data and says that the fields are separated by the given boundary string.

    But note that the:

    boundary=---------------------------735323031399963166993862150
    

    has two less dadhes -- than the actual barrier

    -----------------------------735323031399963166993862150
    

    This is because the standard requires the boundary to start with two dashes --. The other dashes appear to be just how Firefox chose to implement the arbitrary boundary. RFC 7578 clearly mentions that those two leading dashes -- are required:

> 4.1. "Boundary" Parameter of multipart/form-data

> As with other multipart types, the parts are delimited with a boundary delimiter, constructed using CRLF, "--", and the value of the "boundary" parameter.

application/x-www-form-urlencoded

Now change the enctype to application/x-www-form-urlencoded, reload the browser, and resubmit.

Firefox sent:

POST / HTTP/1.1
[[ Less interesting headers ... ]]
Content-Type: application/x-www-form-urlencoded
Content-Length: 51

text1=text+default&text2=a%CF%89b&file1=a.txt&file2=a.html&file3=binary

Clearly the file data was not sent, only the basenames. So this cannot be used for files.

As for the text field, we see that usual printable characters like a and b were sent in one byte, while non-printable ones like 0xCF and 0x89 took up 3 bytes each: %CF%89!

Comparison

File uploads often contain lots of non-printable characters (e.g. images), while text forms almost never do.

From the examples we have seen that:

  • multipart/form-data: adds a few bytes of boundary overhead to the message, and must spend some time calculating it, but sends each byte in one byte.

  • application/x-www-form-urlencoded: has a single byte boundary per field (&), but adds a linear overhead factor of 3x for every non-printable character.

Therefore, even if we could send files with application/x-www-form-urlencoded, we wouldn't want to, because it is so inefficient.

But for printable characters found in text fields, it does not matter and generates less overhead, so we just use it.

Solution 3 - Http

Send file as binary content (upload without form or FormData)

In the given answers/examples the file is (most likely) uploaded with a HTML form or using the FormData API. The file is only a part of the data sent in the request, hence the multipart/form-data Content-Type header.

If you want to send the file as the only content then you can directly add it as the request body and you set the Content-Type header to the MIME type of the file you are sending. The file name can be added in the Content-Disposition header. You can upload like this:

var xmlHttpRequest = new XMLHttpRequest();

var file = ...file handle...
var fileName = ...file name...
var target = ...target...
var mimeType = ...mime type...

xmlHttpRequest.open('POST', target, true);
xmlHttpRequest.setRequestHeader('Content-Type', mimeType);
xmlHttpRequest.setRequestHeader('Content-Disposition', 'attachment; filename="' + fileName + '"');
xmlHttpRequest.send(file);

If you don't (want to) use forms and you are only interested in uploading one single file this is the easiest way to include your file in the request.

Solution 4 - Http

I have this sample Java Code:

import java.io.*;
import java.net.*;
import java.nio.charset.StandardCharsets;

public class TestClass {
    public static void main(String[] args) throws IOException {
        ServerSocket socket = new ServerSocket(8081);
        Socket accept = socket.accept();
        InputStream inputStream = accept.getInputStream();
        
        InputStreamReader inputStreamReader = new InputStreamReader(inputStream, StandardCharsets.UTF_8);
        char readChar;
        while ((readChar = (char) inputStreamReader.read()) != -1) {
            System.out.print(readChar);
        }
        
        inputStream.close();
        accept.close();
        System.exit(1);
    }
}

and I have this test.html file:

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <title>File Upload!</title>
</head>
<body>
<form method="post" action="http://localhost:8081" enctype="multipart/form-data">
    <input type="file" name="file" id="file">
    <input type="submit">
</form>
</body>
</html>

and finally the file I will be using for testing purposes, named a.dat has the following content:

0x39 0x69 0x65

if you interpret the bytes above as ASCII or UTF-8 characters, they will actually will be representing:

9ie

So let 's run our Java Code, open up test.html in our favorite browser, upload a.dat and submit the form and see what our server receives:

POST / HTTP/1.1
Host: localhost:8081
Connection: keep-alive
Content-Length: 196
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Origin: null
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36
Content-Type: multipart/form-data; boundary=----WebKitFormBoundary06f6g54NVbSieT6y
DNT: 1
Accept-Encoding: gzip, deflate
Accept-Language: en,en-US;q=0.8,tr;q=0.6
Cookie: JSESSIONID=27D0A0637A0449CF65B3CB20F40048AF

------WebKitFormBoundary06f6g54NVbSieT6y
Content-Disposition: form-data; name="file"; filename="a.dat"
Content-Type: application/octet-stream

9ie
------WebKitFormBoundary06f6g54NVbSieT6y--

Well I am not surprised to see the characters 9ie because we told Java to print them treating them as UTF-8 characters. You may as well choose to read them as raw bytes..

Cookie: JSESSIONID=27D0A0637A0449CF65B3CB20F40048AF 

is actually the last HTTP Header here. After that comes the HTTP Body, where meta and contents of the file we uploaded actually can be seen.

Solution 5 - Http

> An HTTP message may have a body of data sent after the header lines. In a response, this is where the requested resource is returned to the client (the most common use of the message body), or perhaps explanatory text if there's an error. In a request, this is where user-entered data or uploaded files are sent to the server.

http://www.tutorialspoint.com/http/http_messages.htm

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Question0xSinaView Question on Stackoverflow
Solution 1 - HttptoddsundstedView Answer on Stackoverflow
Solution 2 - HttpCiro Santilli Путлер Капут 六四事View Answer on Stackoverflow
Solution 3 - HttpWiltView Answer on Stackoverflow
Solution 4 - HttpKoray TugayView Answer on Stackoverflow
Solution 5 - Httpflagg19View Answer on Stackoverflow