What does IFS= do in this bash loop: `cat file | while IFS= read -r line; do ... done`

BashIfs

Bash Problem Overview


I'm learning bash and I saw this construction:

cat file | while IFS= read -r line;
do
    ...
done

Can anyone explain what IFS= does? I know it's input field separator, but why is it being set to nothing?

Bash Solutions


Solution 1 - Bash

IFS does many things but you are asking about that particular loop.

The effect in that loop is to preserve leading and trailing white space in line. To illustrate, first observe with IFS set to nothing:

$ echo " this   is a test " | while IFS= read -r line; do echo "=$line=" ; done
= this   is a test =

The line variable contains all the white space it received on its stdin. Now, consider the same statement with the default IFS:

$ echo " this   is a test " | while read -r line; do echo "=$line=" ; done
=this   is a test=

In this version, the white space internal to the line is still preserved. But, the leading and trailing white space have been removed.

What does -r do in read -r?

The -r option prevents read from treating backslash as a special character.

To illustrate, we use two echo commands that supply two lines to the while loop. Observe what happens with -r:

$ { echo 'this \\ line is \' ; echo 'continued'; } | while IFS= read -r line; do echo "=$line=" ; done
=this \\ line is \=
=continued=

Now, observe what happens without -r:

$ { echo 'this \\ line is \' ; echo 'continued'; } | while IFS= read line; do echo "=$line=" ; done
=this \ line is continued=

Without -r, two changes happened. First, the double-backslash was converted to a single backslash. Second, the backslash on the end of the first line was interpreted as a line-continuation character and the two lines were merged into one.

In sum, if you want backslashes in the input to have special meaning, don't use -r. If you want backslashes in the input to be taken as plain characters, then use -r.

Multiple lines of input

Since read takes input one line at a time, IFS behaves affects each line of multiple line input in the same way that it affects single line input. -r behaves similarly with the exception that, without -r, multiple lines can be combined into one line using the trailing backslash as shown above.

The behavior with multiple line input, however, can be changed drastically using read's -d flag. -d changes the delimiter character that read uses to mark the end of an input line. For example, we can terminate lines with a tab character:

$ echo $'line one \n line\t two \n line three\t ends here'
line one 
 line    two 
 line three      ends here
$ echo $'line one \n line\t two \n line three\t ends here' | while IFS= read -r -d$'\t' line; do echo "=$line=" ; done
=line one 
 line=
= two 
 line three=

Here, the $'...' construct was used to enter special characters like newline, \n and tab, \t. Observe that with -d$'\t', read divides its input into "lines" based on tab characters. Anything after the final tab is ignored.

How to handle the most difficult file names

The most important use of the features described above is to process difficult file names. Since the one character that cannot appear in path/filenames is the null character, the null character can be used to separate a list of file names. As an example:

while IFS= read -r -d $'\0' file
do
    # do something to each file
done < <(find ~/music -type f -print0)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionbodacydoView Question on Stackoverflow
Solution 1 - BashJohn1024View Answer on Stackoverflow