How can I get the length of an array in awk?

Awk

Awk Problem Overview


This command

echo "hello world" | awk '{split($0, array, " ")} END{print length(array) }'

does not work for me and gives this error message

>awk: line 1: illegal reference to array array

Why?

Awk Solutions


Solution 1 - Awk

When you split an array, the number of elements is returned, so you can say:

echo "hello world" | awk '{n=split($0, array, " ")} END{print n }'
# ------------------------^^^--------------------------------^^

Output is:

2

Solution 2 - Awk

Mr. Ventimiglia's function requires a little adjustment to do the work (see the semicolon in for statement):

function alen(a, i) {
    for(i in a);
    return i
}

But don't work all the cases or times. That is because the manner that awk store and "see" the indexes of the arrays: they are associative and no necessarily contiguous (like C.) So, i does not return the "last" element.

To resolve it, you need to count:

function alen(a, i, k) {
    k = 0
    for(i in a) k++
    return k
}

And, in this manner, take care other index types of "unidimensional" arrays, where the index maybe an string. Please see: http://docstore.mik.ua/orelly/unix/sedawk/ch08_04.htm. For "multidimensional" and arbitrary arrays, see http://www.gnu.org/software/gawk/manual/html_node/Walking-Arrays.html#Walking-Arrays.

Solution 3 - Awk

I don't think the person is asking, "How do I split a string and get the length of the resulting array?" I think the command they provide is just an example of the situation where it arose. In particular, I think the person is asking 1) Why does length(array) provoke an error, and 2) How can I get the length of an array in awk?

The answer to the first question is that the length function does not operate on arrays in POSIX standard awk, though it does in GNU awk (gawk) and a few other variations. The answer to the second question is (if we want a solution that works in all variations of awk) to do a linear scan.

For example, a function like this:

function alen (a,     i) {
    for (i in a);
    return i;}

NOTE: The second parameter i warrants some explanation.

The way you introduce local variables in awk is as extra function parameters and the convention is to indicate this by adding extra spaces before these parameters. This is discussed in the GNU Awk manual here.

Solution 4 - Awk

In gawk you can use the function length():

$ gawk 'BEGIN{a[1]=1; a[2]=2; a[23]=45; print length(a)}'
3

$ gawk 'BEGIN{a[1]=1; a[2]=2; print length(a); a[23]=45; print length(a)}'
2
3

From The GNU Awk user's guide:

> With gawk and several other awk implementations, when given an array argument, the length() function returns the number of elements in the > array. (c.e.) This is less useful than it might seem at first, as > the array is not guaranteed to be indexed from one to the number of > elements in it. If --lint is provided on the command line (see > Options), gawk warns that passing an array argument is not portable. > If --posix is supplied, using an array argument is a fatal error (see > Arrays).

Solution 5 - Awk

Just want to point that:

  • Don't need to store the result of the split function in order to print it.

  • If separator is not supplied for the split, the default FS (blank space) will be used.

  • The END part is useless here.

      echo 'hello world' | awk '{print split($0, a)}'
    

Solution 6 - Awk

sample on MacOSX Lion to show used ports (output can be 192.168.111.130.49704 or ::1.49704) :

   netstat -a -n -p tcp | awk '/\.[0-9]+ / {n=split($4,a,"."); print a[n]}'

In this sample, that print the last array item of 4th column : "49704"

Solution 7 - Awk

Try this if you are not using gawk.

awk 'BEGIN{test="aaa bbb ccc";a=split(test, ff, " "); print ff[1]; print a; print ff[a]}'

Output:

aaa
3
ccc

8.4.4 Using split() to Create Arrays http://docstore.mik.ua/orelly/unix/sedawk/ch08_04.htm

Solution 8 - Awk

Here's a quick way for me to get length of array, init to zero length if non-existent, but don't overwrite any existing ones or accidentally add extra elements :

(g/mawk) 'function arrayinit(ar, x) { for(x in ar) {break}; return length(ar) };

The for loop basically has O(1) since it exits upon any existing element, regardless of sort order. My old way used to either test, or split empty string. This way saves the split step since the for loop perhaps that function implicitly.

This also works for pseudo multi-dim array like arr[x,y] or gawk arr[x][y] ones without having to worry whether "x" is a sub-array in the gawk sense.

Solution 9 - Awk

echo "hello world" | awk '{lng=split($0, array, " ")} END{print lng) }'

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionsoftghostView Question on Stackoverflow
Solution 1 - AwkshellterView Answer on Stackoverflow
Solution 2 - Awk0zkr PMView Answer on Stackoverflow
Solution 3 - AwkDavid A. VentimigliaView Answer on Stackoverflow
Solution 4 - AwkfedorquiView Answer on Stackoverflow
Solution 5 - AwkJuan Diego Godoy RoblesView Answer on Stackoverflow
Solution 6 - AwkTanguyView Answer on Stackoverflow
Solution 7 - AwkWeichao LiuView Answer on Stackoverflow
Solution 8 - AwkRARE Kpop ManifestoView Answer on Stackoverflow
Solution 9 - AwkelradoView Answer on Stackoverflow