Where and how is the _ (underscore) variable specified?

Ruby

Ruby Problem Overview


Most are aware of _’s special meaning in IRB as a holder for last return value, but that is not what I'm asking about here.

Instead, I’m asking about _ when used as a variable name in plain-old-Ruby-code. Here it appears to have special behavior, akin to a “don't care variable” (à la Prolog). Here are some useful examples illustrating its unique behavior:

lambda { |x, x| 42 }            # SyntaxError: duplicated argument name
lambda { |_, _| 42 }.call(4, 2) # => 42
lambda { |_, _| 42 }.call(_, _) # NameError: undefined local variable or method `_'
lambda { |_| _ + 1 }.call(42)   # => 43
lambda { |_, _| _ }.call(4, 2)  # 1.8.7: => 2
                                # 1.9.3: => 4
_ = 42
_ * 100         # => 4200
_, _ = 4, 2; _  # => 2

These were all run in Ruby directly (with putss added in)—not IRB—to avoid conflicting with its additional functionality.

This is all a result of my own experimentation though, as I cannot find any documentation on this behavior anywhere (admittedly it's not the easiest thing to search for). Ultimately, I'm curious how all of this works internally so I can better understand exactly what is special about _. So I’m asking for references to documentation, and, preferably, the Ruby source code (and perhaps RubySpec) that reveal how _ behaves in Ruby.

Note: most of this arose out of this discussion with @Niklas B.

Ruby Solutions


Solution 1 - Ruby

There is some special handling in the source to suppress the "duplicate argument name" error. The error message only appears in shadowing_lvar_gen inside parse.y, the 1.9.3 version looks like this:

static ID
shadowing_lvar_gen(struct parser_params *parser, ID name)
{
    if (idUScore == name) return name;
    /* ... */

and idUScore is defined in id.c like this:

REGISTER_SYMID(idUScore, "_");

You'll see similar special handling in warn_unused_var:

static void
warn_unused_var(struct parser_params *parser, struct local_vars *local)
{
    /* ... */
    for (i = 0; i < cnt; ++i) {
        if (!v[i] || (u[i] & LVAR_USED)) continue;
        if (idUScore == v[i]) continue;
        rb_compile_warn(ruby_sourcefile, (int)u[i], "assigned but unused variable - %s", rb_id2name(v[i]));
    }
}

You'll notice that the warning is suppressed on the second line of the for loop.

The only special handling of _ that I could find in the 1.9.3 source is above: the duplicate name error is suppressed and the unused variable warning is suppressed. Other than those two things, _ is just a plain old variable like any other. I don't know of any documentation about the (minor) specialness of _.

In Ruby 2.0, the idUScore == v[i] test in warn_unused_var is replaced with a call to is_private_local_id:

if (is_private_local_id(v[i])) continue;
rb_warn4S(ruby_sourcefile, (int)u[i], "assigned but unused variable - %s", rb_id2name(v[i]));

and is_private_local_id suppresses warnings for variables that begin with _:

if (name == idUScore) return 1;
/* ... */
return RSTRING_PTR(s)[0] == '_';

rather than just _ itself. So 2.0 loosens things up a bit.

Solution 2 - Ruby

_ is a valid identifier. Identifiers can't just contain underscores, they can also be an underscore.

_ = o = Object.new
_.object_id == o.object_id
# => true

You can also use it as method names:

def o._; :_ end
o._
# => :_

Of course, it is not exactly a readable name, nor does it pass any information to the reader about what the variable refers to or what the method does.

IRB, in particular, sets _ to the value of the last expression:

$ irb
> 'asd'
# => "asd"
> _
# => "asd"

As it is in the source code, it simply sets _ to the last value:

@workspace.evaluate self, "_ = IRB.CurrentContext.last_value"

Did some repository exploring. Here's what I found:

On the last lines of the file id.c, there is the call:

REGISTER_SYMID(idUScore, "_");

greping the source for idUScore gave me two seemingly relevant results:

shadowing_lvar_gen seems to be the mechanism through which the formal parameter of a block replaces a variable of the same name that exists in another scope. It is the function that seems to raise "duplicated argument name" SyntaxError and the "shadowing outer local variable" warning.

After greping the source for shadowing_lvar_gen, I found the following on the changelog for Ruby 1.9.3:

> Tue Dec 11 01:21:21 2007 Yukihiro Matsumoto <[email protected]> > > * parse.y (shadowing_lvar_gen): no duplicate error for "_".

Which is likely to be the origin of this line:

if (idUScore == name) return name;

From this, I deduce that in a situation such as proc { |_, _| :x }.call :a, :b, one _ variable simply shadows the other.


Here's the commit in question. It basically introduced these two lines:

if (!uscore) uscore = rb_intern("_");
if (uscore == name) return;

From a time when idUScore did not even exist, apparently.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAndrew MarshallView Question on Stackoverflow
Solution 1 - Rubymu is too shortView Answer on Stackoverflow
Solution 2 - RubyMatheus MoreiraView Answer on Stackoverflow