A semantics for Bash scripts?

Bash

Bash Problem Overview


More than any other language I know, I've "learned" Bash by Googling every time I need some little thing. Consequently, I can patchwork together little scripts that appear to work. However, I don't really know what's going on, and I was hoping for a more formal introduction to Bash as a programming language. For example: What is the evaluation order? what are the scoping rules? What is the typing discipline, e.g. is everything a string? What is the state of the program -- is it a key-value assignment of strings to variable names; is there more than that, e.g. the stack? Is there a heap? And so on.

I thought to consult the GNU Bash manual for this kind of insight, but it doesn't seem to be what I want; it's more of a laundry list of syntactic sugar rather than an explanation of the core semantic model. The million-and-one "bash tutorials" online are only worse. Perhaps I should first study sh, and understand Bash as a syntactic sugar on top of this? I don't know if this is an accurate model, though.

Any suggestions?

EDIT: I've been asked to provide examples of what ideally I'm looking for. A rather extreme example of what I would consider a "formal semantics" is this paper on "the essence of JavaScript". Perhaps a slightly less formal example is the Haskell 2010 report.

Bash Solutions


Solution 1 - Bash

A shell is an interface for the operating system. It is usually a more-or-less robust programming language in its own right, but with features designed to make it easy to interact specifically with the operating system and filesystem. The POSIX shell's (hereafter referred to just as "the shell") semantics are a bit of a mutt, combining some features of LISP (s-expressions have a lot in common with shell word splitting) and C (much of the shell's arithmetic syntax semantics comes from C).

The other root of the shell's syntax comes from its upbringing as a mishmash of individual UNIX utilities. Most of what are often builtins in the shell can actually be implemented as external commands. It throws many shell neophytes for a loop when they realize that /bin/[ exists on many systems.

$ if '/bin/[' -f '/bin/['; then echo t; fi # Tested as-is on OS X, without the `]`
t

wat?

This makes a lot more sense if you look at how a shell is implemented. Here's an implementation I did as an exercise. It's in Python, but I hope that's not a hangup for anyone. It's not terribly robust, but it is instructive:

#!/usr/bin/env python

from __future__ import print_function
import os, sys

'''Hacky barebones shell.'''

try:
  input=raw_input
except NameError:
  pass

def main():
  while True:
    cmd = input('prompt> ')
    args = cmd.split()
    if not args:
      continue
    cpid = os.fork()
    if cpid == 0:
      # We're in a child process
      os.execl(args[0], *args)
    else:
      os.waitpid(cpid, 0)

if __name__ == '__main__':
  main()

I hope the above makes it clear that the execution model of a shell is pretty much:

1. Expand words.
2. Assume the first word is a command.
3. Execute that command with the following words as arguments.

Expansion, command resolution, execution. All of the shell's semantics are bound up in one of these three things, although they're far richer than the implementation I wrote above.

Not all commands fork. In fact, there are a handful of commands that don't make a ton of sense implemented as externals (such that they would have to fork), but even those are often available as externals for strict POSIX compliance.

Bash builds upon this base by adding new features and keywords to enhance the POSIX shell. It is nearly compatible with sh, and bash is so ubiquitous that some script authors go years without realizing that a script may not actually work on a POSIXly strict system. (I also wonder how people can care so much about the semantics and style of one programming language, and so little for the semantics and style of the shell, but I diverge.)

Order of evaluation

This is a bit of a trick question: Bash interprets expressions in its primary syntax from left to right, but in its arithmetic syntax it follows C precedence. Expressions differ from expansions, though. From the EXPANSION section of the bash manual:

> The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution (done in a left-to-right fashion); word splitting; and pathname expansion.

If you understand wordsplitting, pathname expansion and parameter expansion, you are well on your way to understanding most of what bash does. Note that pathname expansion coming after wordsplitting is critical, because it ensures that a file with whitespace in its name can still be matched by a glob. This is why good use of glob expansions is better than parsing commands, in general.

Scope

Function scope

Much like old ECMAscript, the shell has dynamic scope unless you explicitly declare names within a function.

$ foo() { echo $x; }
$ bar() { local x; echo $x; }
$ foo

$ bar

$ x=123
$ foo
123
$ bar

$ 
Environment and process "scope"

Subshells inherit the variables of their parent shells, but other kinds of processes don't inherit unexported names.

$ x=123
$ ( echo $x )
123
$ bash -c 'echo $x'

$ export x
$ bash -c 'echo $x'
123
$ y=123 bash -c 'echo $y' # another way to transiently export a name
123

You can combine these scoping rules:

$ foo() {
>   local -x bar=123 # Export foo, but only in this scope
>   bash -c 'echo $bar'
> }
$ foo
123
$ echo $bar

$

Typing discipline

Um, types. Yeah. Bash really doesn't have types, and everything expands to a string (or perhaps a word would be more appropriate.) But let's examine the different types of expansions.

Strings

Pretty much anything can be treated as a string. Barewords in bash are strings whose meaning depends entirely on the expansion applied to it.

No expansion

It may be worthwhile to demonstrate that a bare word really is just a word, and that quotes change nothing about that.

$ echo foo
foo
$ 'echo' foo
foo
$ "echo" foo
foo
Substring expansion
$ fail='echoes'
$ set -x # So we can see what's going on
$ "${fail:0:-2}" Hello World
+ echo Hello World
Hello World

For more on expansions, read the Parameter Expansion section of the manual. It's quite powerful.

Integers and arithmetic expressions

You can imbue names with the integer attribute to tell the shell to treat the right hand side of assignment expressions as arithmetic. Then, when the parameter expands it will be evaluated as integer math before expanding to … a string.

$ foo=10+10
$ echo $foo
10+10
$ declare -i foo
$ foo=$foo # Must re-evaluate the assignment
$ echo $foo
20
$ echo "${foo:0:1}" # Still just a string
2
Arrays
Arguments and Positional Parameters

Before talking about arrays it might be worth discussing positional parameters. The arguments to a shell script can be accessed using numbered parameters, $1, $2, $3, etc. You can access all these parameters at once using "$@", which expansion has many things in common with arrays. You can set and change the positional parameters using the set or shift builtins, or simply by invoking the shell or a shell function with these parameters:

$ bash -c 'for ((i=1;i<=$#;i++)); do
>   printf "\$%d => %s\n" "$i" "${@:i:1}"
> done' -- foo bar baz
$1 => foo
$2 => bar
$3 => baz
$ showpp() {
>   local i
>   for ((i=1;i<=$#;i++)); do
>     printf '$%d => %s\n' "$i" "${@:i:1}"
>   done
> }
$ showpp foo bar baz
$1 => foo
$2 => bar
$3 => baz
$ showshift() {
>   shift 3
>   showpp "$@"
> }
$ showshift foo bar baz biz quux xyzzy
$1 => biz
$2 => quux
$3 => xyzzy

The bash manual also sometimes refers to $0 as a positional parameter. I find this confusing, because it doesn't include it in the argument count $#, but it is a numbered parameter, so meh. $0 is the name of the shell or the current shell script.

Arrays

The syntax of arrays is modeled after positional parameters, so it's mostly healthy to think of arrays as a named kind of "external positional parameters", if you like. Arrays can be declared using the following approaches:

$ foo=( element0 element1 element2 )
$ bar[3]=element3
$ baz=( [12]=element12 [0]=element0 )

You can access array elements by index:

$ echo "${foo[1]}"
element1

You can slice arrays:

$ printf '"%s"\n' "${foo[@]:1}"
"element1"
"element2"

If you treat an array as a normal parameter, you'll get the zeroth index.

$ echo "$baz"
element0
$ echo "$bar" # Even if the zeroth index isn't set

$ 

If you use quotes or backslashes to prevent wordsplitting, the array will maintain the specified wordsplitting:

$ foo=( 'elementa b c' 'd e f' )
$ echo "${#foo[@]}"
2

The main difference between arrays and positional parameters are:

  1. Positional parameters are not sparse. If $12 is set, you can be sure $11 is set, too. (It could be set to the empty string, but $# will not be smaller than 12.) If "${arr[12]}" is set, there's no guarantee that "${arr[11]}" is set, and the length of the array could be as small as 1.
  2. The zeroth element of an array is unambiguously the zeroth element of that array. In positional parameters, the zeroth element is not the first argument, but the name of the shell or shell script.
  3. To shift an array, you have to slice and reassign it, like arr=( "${arr[@]:1}" ). You could also do unset arr[0], but that would make the first element at index 1.
  4. Arrays can be shared implicitly between shell functions as globals, but you have to explicitly pass positional parameters to a shell function for it to see those.

It's often convenient to use pathname expansions to create arrays of filenames:

$ dirs=( */ )
Commands

Commands are key, but they're also covered in better depth than I can by the manual. Read the SHELL GRAMMAR section. The different kinds of commands are:

  1. Simple Commands (e.g. $ startx)
  2. Pipelines (e.g. $ yes | make config) (lol)
  3. Lists (e.g. $ grep -qF foo file && sed 's/foo/bar/' file > newfile)
  4. Compound Commands (e.g. $ ( cd -P /var/www/webroot && echo "webroot is $PWD" ))
  5. Coprocesses (Complex, no example)
  6. Functions (A named compound command that can be treated as a simple command)

Execution Model

The execution model of course involves both a heap and a stack. This is endemic to all UNIX programs. Bash also has a call stack for shell functions, visible via nested use of the caller builtin.

References:

  1. The SHELL GRAMMAR section of the bash manual
  2. The XCU Shell Command Language documentation
  3. The Bash Guide on Greycat's wiki.
  4. Advanced Programming in the UNIX Environment

Please make comments if you want me to expand further in a specific direction.

Solution 2 - Bash

The answer to your question "What is the typing discipline, e.g. is everything a string" Bash variables are character strings. But, Bash permits arithmetic operations and comparisons on variables when variables are integers. The exception to rule Bash variables are character strings is when said variables are typeset or declared otherwise

$ A=10/2
$ echo "A = $A"           # Variable A acting like a String.
A = 10/2

$ B=1
$ let B="$B+1"            # Let is internal to bash.
$ echo "B = $B"           # One is added to B was Behaving as an integer.
B = 2

$ A=1024                  # A Defaults to string
$ B=${A/24/STRING01}      # Substitute "24"  with "STRING01".
$ echo "B = $B"           # $B STRING is a string
B = 10STRING01

$ B=${A/24/STRING01}      # Substitute "24"  with "STRING01".
$ declare -i B
$ echo "B = $B"           # Declaring a variable with non-integers in it doesn't change the contents.
B = 10STRING01

$ B=${B/STRING01/24}      # Substitute "STRING01"  with "24".
$ echo "B = $B"
B = 1024

$ declare -i B=10/2       # Declare B and assigning it an integer value
$ echo "B = $B"           # Variable B behaving as an Integer
B = 5

Declare option meanings:

  • -a Variable is an array.
  • -f Use function names only.
  • -i The variable is to be treated as an integer; arithmetic evaluation is performed when the variable is assigned a value.
  • -p Display the attributes and values of each variable. When -p is used, additional options are ignored.
  • -r Make variables read-only. These variables cannot then be assigned values by subsequent assignment statements, nor can they be unset.
  • -t Give each variable the trace attribute.
  • -x Mark each variable for export to subsequent commands via the environment.

Solution 3 - Bash

The bash manpage has quite a bit more info than most manpages, and includes some of what you're asking for. My assumption after more than a decade of scripting bash is that, due to its' history as an extension of sh, it has some funky syntax (to maintain backward compatibility with sh).

FWIW, my experience has been like yours; although the various books (e.g., O'Reilly "Learning the Bash Shell" and similar) do help with the syntax, there are lots of strange ways of solving various problems, and some of them are not in the book and must be googled.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionjameshfisherView Question on Stackoverflow
Solution 1 - BashkojiroView Answer on Stackoverflow
Solution 2 - BashKeith ReynoldsView Answer on Stackoverflow
Solution 3 - BashphilwalkView Answer on Stackoverflow