How can I get the length of an array in awk?
AwkAwk Problem Overview
This command
echo "hello world" | awk '{split($0, array, " ")} END{print length(array) }'
does not work for me and gives this error message
>awk: line 1: illegal reference to array array
Why?
Awk Solutions
Solution 1 - Awk
When you split an array, the number of elements is returned, so you can say:
echo "hello world" | awk '{n=split($0, array, " ")} END{print n }'
# ------------------------^^^--------------------------------^^
Output is:
2
Solution 2 - Awk
Mr. Ventimiglia's function requires a little adjustment to do the work (see the semicolon in for statement):
function alen(a, i) {
for(i in a);
return i
}
But don't work all the cases or times. That is because the manner that awk store and "see" the indexes of the arrays: they are associative and no necessarily contiguous (like C.) So, i
does not return the "last" element.
To resolve it, you need to count:
function alen(a, i, k) {
k = 0
for(i in a) k++
return k
}
And, in this manner, take care other index types of "unidimensional" arrays, where the index maybe an string. Please see: http://docstore.mik.ua/orelly/unix/sedawk/ch08_04.htm. For "multidimensional" and arbitrary arrays, see http://www.gnu.org/software/gawk/manual/html_node/Walking-Arrays.html#Walking-Arrays.
Solution 3 - Awk
I don't think the person is asking, "How do I split a string and get the length of the resulting array?" I think the command they provide is just an example of the situation where it arose. In particular, I think the person is asking 1) Why does length(array) provoke an error, and 2) How can I get the length of an array in awk?
The answer to the first question is that the length function does not operate on arrays in POSIX standard awk, though it does in GNU awk (gawk) and a few other variations. The answer to the second question is (if we want a solution that works in all variations of awk) to do a linear scan.
For example, a function like this:
function alen (a, i) {
for (i in a);
return i;}
NOTE: The second parameter i warrants some explanation.
The way you introduce local variables in awk is as extra function parameters and the convention is to indicate this by adding extra spaces before these parameters. This is discussed in the GNU Awk manual here.
Solution 4 - Awk
In gawk
you can use the function length()
:
$ gawk 'BEGIN{a[1]=1; a[2]=2; a[23]=45; print length(a)}'
3
$ gawk 'BEGIN{a[1]=1; a[2]=2; print length(a); a[23]=45; print length(a)}'
2
3
From The GNU Awk user's guide:
> With gawk and several other awk implementations, when given an array argument, the length()
function returns the number of elements in the
> array. (c.e.) This is less useful than it might seem at first, as
> the array is not guaranteed to be indexed from one to the number of
> elements in it. If --lint is provided on the command line (see
> Options), gawk warns that passing an array argument is not portable.
> If --posix is supplied, using an array argument is a fatal error (see
> Arrays).
Solution 5 - Awk
Just want to point that:
-
Don't need to store the result of the
split
function in order to print it. -
If separator is not supplied for the split, the default
FS
(blank space) will be used. -
The
END
part is useless here.echo 'hello world' | awk '{print split($0, a)}'
Solution 6 - Awk
sample on MacOSX Lion to show used ports (output can be 192.168.111.130.49704 or ::1.49704) :
netstat -a -n -p tcp | awk '/\.[0-9]+ / {n=split($4,a,"."); print a[n]}'
In this sample, that print the last array item of 4th column : "49704"
Solution 7 - Awk
Try this if you are not using gawk.
awk 'BEGIN{test="aaa bbb ccc";a=split(test, ff, " "); print ff[1]; print a; print ff[a]}'
Output:
aaa
3
ccc
8.4.4 Using split() to Create Arrays http://docstore.mik.ua/orelly/unix/sedawk/ch08_04.htm
Solution 8 - Awk
Here's a quick way for me to get length of array, init to zero length if non-existent, but don't overwrite any existing ones or accidentally add extra elements :
(g/mawk) 'function arrayinit(ar, x) { for(x in ar) {break}; return length(ar) };
The for loop basically has O(1) since it exits upon any existing element, regardless of sort order. My old way used to either test, or split empty string. This way saves the split step since the for loop perhaps that function implicitly.
This also works for pseudo multi-dim array like arr[x,y] or gawk arr[x][y] ones without having to worry whether "x" is a sub-array in the gawk sense.
Solution 9 - Awk
echo "hello world" | awk '{lng=split($0, array, " ")} END{print lng) }'