How does Array#sort work when a block is passed?

ArraysRubySorting

Arrays Problem Overview


I am having a problem understanding how array.sort{ |x,y| block } works exactly, hence how to use it?

An example from Ruby documentation:

   a = [ "d", "a", "e", "c", "b" ]
   a.sort                     #=> ["a", "b", "c", "d", "e"]
   a.sort { |x,y| y <=> x }   #=> ["e", "d", "c", "b", "a"]

Arrays Solutions


Solution 1 - Arrays

In your example

a.sort

is equivalent to

a.sort { |x, y| x <=> y }

As you know, to sort an array, you need to be able to compare its elements (if you doubt that, just try to implement any sort algorithm without using any comparison, no <, >, <= or >=).

The block you provide is really a function which will be called by the sort algorithm to compare two items. That is x and y will always be some elements of the input array chosen by the sort algorithm during its execution.

The sort algorithm will assume that this comparison function/block will meet the requirements for method <=>:

  • return -1 if x < y
  • return 0 if x = y
  • return 1 if x > y

Failure to provide an adequate comparison function/block will result in array whose order is undefined.

You should now understand why

a.sort { |x, y| x <=> y }

and

a.sort { |x, y| y <=> x }

return the same array in opposite orders.


To elaborate on what Tate Johnson added, if you implement the comparison function <=> on any of your classes, you gain the following

  1. You may include the module Comparable in your class which will automatically define for you the following methods: between?, ==, >=, <, <= and >.
  2. Instances of your class can now be sorted using the default (ie without argument) invocation to sort.

Note that the <=> method is already provided wherever it makes sense in ruby's standard library (Bignum, Array, File::Stat, Fixnum, String, Time, etc...).

Solution 2 - Arrays

When you have an array of, let's say, integers to sort, it's pretty straightforward for sort method to order the elements properly - smaller numbers first, bigger at the end. That's when you use ordinary sort, with no block.

But when you are sorting other objects, it may be needed to provide a way to compare (each) two of them. Let's say you have an array of objects of class Person. You probably can't tell if object bob is greater than object mike (i.e. class Person doesn't have method <=> implemented). In that case you'd need to provide some code to explain in which order you want these objects sorted to sort method. That's where the block form kicks in.

people.sort{|p1,p2| p1.age <=> p2.age}
people.sort{|p1,p2| p1.children.count <=> p2.children.count}

etc. In all these cases, sort method sorts them the same way - the same algorithm is used. What is different is comparison logic.

Solution 3 - Arrays

@OscarRyz reply cleared up a lot for me on the question on how the sort works, esp

 { |x, y| y <=> x }

Based on my understanding I am providing here what the state of the array would be after each comparison for above block results.

Note: Got the reference of printing the values of block paramaters e1, e2 from ruby-forum

1.9.3dev :001 > a = %w(d e a w f k)
1.9.3dev :003 > a.sort { |e1, e2| p [e2, e1]; e2 <=> e1 }
["w", "d"]
["k", "w"]
["k", "d"]
["k", "e"]
["k", "f"]
["k", "a"]
["f", "a"]
["d", "f"]
["d", "a"]
["d", "e"]
["e", "f"]
 => ["w", "k", "f", "e", "d", "a"]

A guessed array state at runtime after each comparison:

 [e2, e1]    Comparsion Result       Array State
["w", "d"]      1                   ["w", "e", "a", "d", "f", "k"]
["k", "w"]     -1                   ["w", "e", "a", "d", "f", "k"]
["k", "d"]      1                   ["w", "e", "a", "k", "f", "d"]
["k", "e"]      1                   ["w", "k", "a", "e", "f", "d"]  
["k", "f"]      1                   ["w", "k", "a", "e", "f", "d"]    
["k", "a"]      1                   ["w", "k", "a", "e", "f", "d"]  
["f", "a"]      1                   ["w", "k", "f", "e", "a", "d"]  
["d", "f"]     -1                   ["w", "k", "f", "e", "a", "d"]  
["d", "a"]      1                   ["w", "k", "f", "e", "d", "a"]  
["d", "e"]     -1                   ["w", "k", "f", "e", "d", "a"]  
["e", "f"]     -1                   ["w", "k", "f", "e", "d", "a"] (Result)

Thanks,

Jignesh

Solution 4 - Arrays

<=> is a method is ruby that returns ( self.<=>( argument ) )

  • -1 if self < argument
  • 0 if self == argument
  • 1 if self > argument

x and y are items of array. If no block is provided, the sort function uses x<=>y, otherwise the result of the block says if x should be before y.

array.sort{|x, y| some_very_complicated_method(x, y) }

Here if some_very_complicated_method(x, y) returns smth that is < 0, x is considered < than y and so on...

Solution 5 - Arrays

Some miscellaneous points:

  • x and y are called block parameters. The sort method basically says "I'll give you x and y, you determine whether x or y should come first, and I'll look after the boring stuff with regards to sorting"
  • <=> is called a spaceship operator.

Solution 6 - Arrays

In:

a.sort {|x,y| y <=> x }   #=> ["e", "d", "c", "b", "a"]

>what is x and y?

x and y are the elements being compared by the sorting algorithm.

This is useful to define for custom classes which element should be before the other.

For basic data ( numbers, strings , date, etc ) the natural order is predefined, but for customer element ( ie Employee ) you define who goes before who in a comparison. This block give you the chance to define that.

>and what happens at y<=>x?

There, they are comparing the elements in descending order ( those with "higher" value will go first ) rather than the natural order ( x<=>y )

The <=> method stands for "compareTo" and return 0 if the elements are equivalent, or < 0 if x goes before than y or > 0 if x goes after y

Solution 7 - Arrays

I believe |x,y| y<=>x is comparing two elements at a time in descending order, as seen in: http://www.ruby-doc.org/core-1.9.3/Array.html#method-i-3C-3D-3E Say with [ "d", "a", "e", "c", "b" ], "d" and "a" appear to be compared first. Then since it is descending, both remain in the same order because d evaluates to less than a. Then d and e are evaluated. "e" is moved to "d"'s position. Without knowing the internal workings of the c code it is not possible to know where is d moved to but I figure this process continues until all elements are sorted. The c functions:

           VALUE
rb_ary_cmp(VALUE ary1, VALUE ary2)
{
    long len;
    VALUE v;

    ary2 = rb_check_array_type(ary2);
    if (NIL_P(ary2)) return Qnil;
    if (ary1 == ary2) return INT2FIX(0);
    v = rb_exec_recursive_paired(recursive_cmp, ary1, ary2, ary2);
    if (v != Qundef) return v;
    len = RARRAY_LEN(ary1) - RARRAY_LEN(ary2);
    if (len == 0) return INT2FIX(0);
    if (len > 0) return INT2FIX(1);
    return INT2FIX(-1);
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionIbrahim HusseinView Question on Stackoverflow
Solution 1 - ArraysbltxdView Answer on Stackoverflow
Solution 2 - ArraysMladen JablanovićView Answer on Stackoverflow
Solution 3 - ArraysJignesh GohelView Answer on Stackoverflow
Solution 4 - ArraysDraco AterView Answer on Stackoverflow
Solution 5 - ArraysAndrew GrimmView Answer on Stackoverflow
Solution 6 - ArraysOscarRyzView Answer on Stackoverflow
Solution 7 - ArraysMichael FView Answer on Stackoverflow