Use awk to find average of a column

BashAwk

Bash Problem Overview


I'm attempting to find the average of the second column of data using awk for a class. This is my current code, with the framework my instructor provided:

#!/bin/awk

### This script currently prints the total number of rows processed.
### You must edit this script to print the average of the 2nd column
### instead of the number of rows.

# This block of code is executed for each line in the file
{
x=sum
read name
        awk 'BEGIN{sum+=$2}'
        # The script should NOT print out a value for each line
}
# The END block is processed after the last line is read
END {
        # NR is a variable equal to the number of rows in the file
        print "Average: " sum/ NR
        # Change this to print the Average instead of just the number of rows
}

and I'm getting an error that says:

awk: avg.awk:11:        awk 'BEGIN{sum+=$2}' $name
awk: avg.awk:11:            ^ invalid char ''' in expression

I think I'm close but I really have no idea where to go from here. The code shouldn't be incredibly complex as everything we've seen in class has been fairly basic. Please let me know.

Bash Solutions


Solution 1 - Bash

awk '{ sum += $2; n++ } END { if (n > 0) print sum / n; }'

Add the numbers in $2 (second column) in sum (variables are auto-initialized to zero by awk) and increment the number of rows (which could also be handled via built-in variable NR). At the end, if there was at least one value read, print the average.

awk '{ sum += $2 } END { if (NR > 0) print sum / NR }'

If you want to use the shebang notation, you could write:

#!/bin/awk

{ sum += $2 }
END { if (NR > 0) print sum / NR }

You can also control the format of the average with printf() and a suitable format ("%13.6e\n", for example).

You can also generalize the code to average the Nth column (with N=2 in this sample) using:

awk -v N=2 '{ sum += $N } END { if (NR > 0) print sum / NR }'

Solution 2 - Bash

Your specific error is with line 11:

awk 'BEGIN{sum+=$2}'

This is a line where awk is invoked, and its BEGIN block is specified - but you are already within a awk script, so you do not need to specify awk. Also you want to run sum+=$2 on each line of input, so you do not want it within a BEGIN block. Hence the line should simply read:

sum+=$2

You also do not need the lines:

x=sum
read name

the first just creates a synonym to sum named x and I'm not sure what the second does, but neither are needed.

This would make your awk script:

#!/bin/awk

### This script currently prints the total number of rows processed.
### You must edit this script to print the average of the 2nd column
### instead of the number of rows.

# This block of code is executed for each line in the file
{
    sum+=$2
    # The script should NOT print out a value for each line
}
# The END block is processed after the last line is read
END {
    # NR is a variable equal to the number of rows in the file
    print "Average: " sum/ NR
    # Change this to print the Average instead of just the number of rows
}

Jonathan Leffler's answer gives the awk one liner which represents the same fixed code, with the addition of checking that there are at least 1 lines of input (this stops any divide by zero error). If

Solution 3 - Bash

Try this:

ls -l  | awk -F : '{sum+=$5} END {print "AVG=",sum/NR}'

NR is an AWK builtin variable to count the no. of records

Solution 4 - Bash

awk 's+=$2{print s/NR}' table | tail -1

I am using tail -1 to print the last line which should have the average number...

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBen ZifkinView Question on Stackoverflow
Solution 1 - BashJonathan LefflerView Answer on Stackoverflow
Solution 2 - Bashimp25View Answer on Stackoverflow
Solution 3 - BashPradiptaView Answer on Stackoverflow
Solution 4 - BashiamauserView Answer on Stackoverflow