How to split a delimited string into an array in awk?

Awk

Awk Problem Overview


How to split the string when it contains pipe symbols | in it. I want to split them to be in array.

I tried

echo "12:23:11" | awk '{split($0,a,":"); print a[3] a[2] a[1]}'

Which works fine. If my string is like "12|23|11" then how do I split them into an array?

Awk Solutions


Solution 1 - Awk

Have you tried:

echo "12|23|11" | awk '{split($0,a,"|"); print a[3],a[2],a[1]}'

Solution 2 - Awk

To split a string to an array in awk we use the function split():

awk '{split($0, array, ":")}'
#           \/  \___/  \_/
#           |     |     |
#       string    |     delimiter
#                 |
#               array to store the pieces

If no separator is given, it uses the FS, which defaults to the space:

$ awk '{split($0, array); print array[2]}' <<< "a:b c:d e"
c:d

We can give a separator, for example ::

$ awk '{split($0, array, ":"); print array[2]}' <<< "a:b c:d e"
b c

Which is equivalent to setting it through the FS:

$ awk -F: '{split($0, array); print array[1]}' <<< "a:b c:d e"
b c

In GNU Awk you can also provide the separator as a regexp:

$ awk '{split($0, array, ":*"); print array[2]}' <<< "a:::b c::d e
#note multiple :
b c

And even see what the delimiter was on every step by using its fourth parameter:

$ awk '{split($0, array, ":*", sep); print array[2]; print sep[1]}' <<< "a:::b c::d e"
b c
:::

Let's quote the man page of GNU awk:

> split(string, array [, fieldsep [, seps ] ]) > > Divide string into pieces separated by fieldsep and store the pieces in array and the separator strings in the seps array. The first piece is stored in array[1], the second piece in array[2], and so forth. The string value of the third argument, fieldsep, is a regexp describing where to split string (much as FS can be a regexp describing where to split input records). If fieldsep is omitted, the value of FS is used. split() returns the number of elements created. seps is a gawk extension, with seps[i] being the separator string between array[i] and array[i+1]. If fieldsep is a single space, then any leading whitespace goes into seps[0] and any trailing whitespace goes into seps[n], where n is the return value of split() (i.e., the number of elements in array).

Solution 3 - Awk

Please be more specific! What do you mean by "it doesn't work"? Post the exact output (or error message), your OS and awk version:

% awk -F\| '{
  for (i = 0; ++i <= NF;)
    print i, $i
  }' <<<'12|23|11'
1 12
2 23
3 11

Or, using split:

% awk '{
  n = split($0, t, "|")
  for (i = 0; ++i <= n;)
    print i, t[i]
  }' <<<'12|23|11'
1 12
2 23
3 11

Edit: on Solaris you'll need to use the POSIX awk (/usr/xpg4/bin/awk) in order to process 4000 fields correctly.

Solution 4 - Awk

I do not like the echo "..." | awk ... solution as it calls unnecessary fork and execsystem calls.

I prefer a Dimitre's solution with a little twist

awk -F\| '{print $3 $2 $1}' <<<'12|23|11'

Or a bit shorter version:

awk -F\| '$0=$3 $2 $1' <<<'12|23|11'

In this case the output record put together which is a true condition, so it gets printed.

In this specific case the stdin redirection can be spared with setting an [tag:awk] internal variable:

awk -v T='12|23|11' 'BEGIN{split(T,a,"|");print a[3] a[2] a[1]}'

I used [tag:ksh] quite a while, but in [tag:bash] this could be managed by internal string manipulation. In the first case the original string is split by internal terminator. In the second case it is assumed that the string always contains digit pairs separated by a one character separator.

T='12|23|11';echo -n ${T##*|};T=${T%|*};echo ${T#*|}${T%|*}
T='12|23|11';echo ${T:6}${T:3:2}${T:0:2}

The result in all cases is

112312

Solution 5 - Awk

Actually awk has a feature called 'Input Field Separator Variable' [link][1]. This is how to use it. It's not really an array, but it uses the internal $ variables. For splitting a simple string it is easier.

echo "12|23|11" | awk 'BEGIN {FS="|";} { print $1, $2, $3 }'

[1]: http://www.grymoire.com/Unix/Awk.html#toc-uh-15 "Tutorial"

Solution 6 - Awk

I know this is kind of old question, but I thought maybe someone like my trick. Especially since this solution not limited to a specific number of items.

# Convert to an array
_ITEMS=($(echo "12|23|11" | tr '|' '\n'))

# Output array items
for _ITEM in "${_ITEMS[@]}"; do
  echo "Item: ${_ITEM}"
done

The output will be:

Item: 12
Item: 23
Item: 11

Solution 7 - Awk

Joke? :)

How about echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'

This is my output:

p2> echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'
112312

so I guess it's working after all..

Solution 8 - Awk

echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'

should work.

Solution 9 - Awk

echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'

Solution 10 - Awk

The challenge: parse and store split strings with spaces and insert them into variables.

Solution: best and simple choice for you would be convert the strings list into array and then parse it into variables with indexes. Here's an example how you can convert and access the array.

Example: parse disk space statistics on each line:

sudo df -k | awk 'NR>1' | while read -r line; do
   #convert into array:
   array=($line)

   #variables:
   filesystem="${array[0]}"
   size="${array[1]}"
   capacity="${array[4]}"
   mountpoint="${array[5]}"
   echo "filesystem:$filesystem|size:$size|capacity:$capacity|mountpoint:$mountpoint"
done

#output:
filesystem:/dev/dsk/c0t0d0s1|size:4000|usage:40%|mountpoint:/
filesystem:/dev/dsk/c0t0d0s2|size:5000|usage:50%|mountpoint:/usr
filesystem:/proc|size:0|usage:0%|mountpoint:/proc
filesystem:mnttab|size:0|usage:0%|mountpoint:/etc/mnttab
filesystem:fd|size:1000|usage:10%|mountpoint:/dev/fd
filesystem:swap|size:9000|usage:9%|mountpoint:/var/run
filesystem:swap|size:1500|usage:15%|mountpoint:/tmp
filesystem:/dev/dsk/c0t0d0s3|size:8000|usage:80%|mountpoint:/export

Solution 11 - Awk

awk -F'['|'] -v '{print $1"\t"$2"\t"$3}' file <<<'12|23|11'

Solution 12 - Awk

code

awk -F"|" '{split($0,a); print a[1],a[2],a[3]}' <<< '12|23|11'

output

12 23 11

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMohamed SalighView Question on Stackoverflow
Solution 1 - AwkCalin Paul AlexandruView Answer on Stackoverflow
Solution 2 - AwkfedorquiView Answer on Stackoverflow
Solution 3 - AwkDimitre RadoulovView Answer on Stackoverflow
Solution 4 - AwkTrueYView Answer on Stackoverflow
Solution 5 - AwkSvenView Answer on Stackoverflow
Solution 6 - AwkQorbaniView Answer on Stackoverflow
Solution 7 - Awkduedl0rView Answer on Stackoverflow
Solution 8 - AwkcodaddictView Answer on Stackoverflow
Solution 9 - AwkSchildmeijerView Answer on Stackoverflow
Solution 10 - AwkavivamgView Answer on Stackoverflow
Solution 11 - AwkMarkView Answer on Stackoverflow
Solution 12 - AwkvcatafestaView Answer on Stackoverflow