How to count identical string elements in a Ruby array
RubyArraysCountElementRuby Problem Overview
I have the following Array = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
How do I produce a count for each identical element?
Where:
"Jason" = 2, "Judah" = 3, "Allison" = 1, "Teresa" = 1, "Michelle" = 1?
or produce a hash Where:
Where: hash = { "Jason" => 2, "Judah" => 3, "Allison" => 1, "Teresa" => 1, "Michelle" => 1 }
Ruby Solutions
Solution 1 - Ruby
Ruby v2.7+ (latest)
As of ruby v2.7.0 (released December 2019), the core language now includes Enumerable#tally
- a new method, designed specifically for this problem:
names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
names.tally
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}
Ruby v2.4+ (currently supported, but older)
The following code was not possible in standard ruby when this question was first asked (February 2011), as it uses:
Object#itself
, which was added to Ruby v2.2.0 (released December 2014).Hash#transform_values
, which was added to Ruby v2.4.0 (released December 2016).
These modern additions to Ruby enable the following implementation:
names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
names.group_by(&:itself).transform_values(&:count)
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}
Ruby v2.2+ (deprecated)
If using an older ruby version, without access to the above mentioned Hash#transform_values
method, you could instead use Array#to_h
, which was added to Ruby v2.1.0 (released December 2013):
names.group_by(&:itself).map { |k,v| [k, v.length] }.to_h
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}
For even older ruby versions (<= 2.1
), there are several ways to solve this, but (in my opinion) there is no clear-cut "best" way. See the other answers to this post.
Solution 2 - Ruby
names.inject(Hash.new(0)) { |total, e| total[e] += 1 ;total}
gives you
{"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}
Solution 3 - Ruby
names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
counts = Hash.new(0)
names.each { |name| counts[name] += 1 }
# => {"Jason" => 2, "Teresa" => 1, ....
Solution 4 - Ruby
Now using Ruby 2.2.0 you can leverage the itself
method.
names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
counts = {}
names.group_by(&:itself).each { |k,v| counts[k] = v.length }
# counts > {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}
Solution 5 - Ruby
Ruby 2.7+
Ruby 2.7 is introducing Enumerable#tally
for this exact purpose. There's a good summary here.
In this use case:
array.tally
# => { "Jason" => 2, "Judah" => 3, "Allison" => 1, "Teresa" => 1, "Michelle" => 1 }
Docs on the features being released are here.
Solution 6 - Ruby
There's actually a data structure which does this: MultiSet
.
Unfortunately, there is no MultiSet
implementation in the Ruby core library or standard library, but there are a couple of implementations floating around the web.
This is a great example of how the choice of a data structure can simplify an algorithm. In fact, in this particular example, the algorithm even completely goes away. It's literally just:
Multiset.new(*names)
And that's it. Example, using https://GitHub.Com/Josh/Multimap/:
require 'multiset'
names = %w[Jason Jason Teresa Judah Michelle Judah Judah Allison]
histogram = Multiset.new(*names)
# => #<Multiset: {"Jason", "Jason", "Teresa", "Judah", "Judah", "Judah", "Michelle", "Allison"}>
histogram.multiplicity('Judah')
# => 3
Example, using http://maraigue.hhiro.net/multiset/index-en.php:
require 'multiset'
names = %w[Jason Jason Teresa Judah Michelle Judah Judah Allison]
histogram = Multiset[*names]
# => #<Multiset:#2 'Jason', #1 'Teresa', #3 'Judah', #1 'Michelle', #1 'Allison'>
Solution 7 - Ruby
Enumberable#each_with_object
saves you from returning the final hash.
names.each_with_object(Hash.new(0)) { |name, hash| hash[name] += 1 }
Returns:
=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}
Solution 8 - Ruby
This works.
arr = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
result = {}
arr.uniq.each{|element| result[element] = arr.count(element)}
Solution 9 - Ruby
The following is a slightly more functional programming style:
array_with_lower_case_a = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
hash_grouped_by_name = array_with_lower_case_a.group_by {|name| name}
hash_grouped_by_name.map{|name, names| [name, names.length]}
=> [["Jason", 2], ["Teresa", 1], ["Judah", 3], ["Michelle", 1], ["Allison", 1]]
One advantage of group_by
is that you can use it to group equivalent but not exactly identical items:
another_array_with_lower_case_a = ["Jason", "jason", "Teresa", "Judah", "Michelle", "Judah Ben-Hur", "JUDAH", "Allison"]
hash_grouped_by_first_name = another_array_with_lower_case_a.group_by {|name| name.split(" ").first.capitalize}
hash_grouped_by_first_name.map{|first_name, names| [first_name, names.length]}
=> [["Jason", 2], ["Teresa", 1], ["Judah", 3], ["Michelle", 1], ["Allison", 1]]
Solution 10 - Ruby
names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
Hash[names.group_by{|i| i }.map{|k,v| [k,v.size]}]
# => {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}
Solution 11 - Ruby
a = [1, 2, 3, 2, 5, 6, 7, 5, 5]
a.each_with_object(Hash.new(0)) { |o, h| h[o] += 1 }
# => {1=>1, 2=>2, 3=>1, 5=>3, 6=>1, 7=>1}
Credit Frank Wambutt
Solution 12 - Ruby
Lots of great implementations here.
But as a beginner I would consider this the easiest to read and implement
names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
name_frequency_hash = {}
names.each do |name|
count = names.count(name)
name_frequency_hash[name] = count
end
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}
The steps we took:
- we created the hash
- we looped over the
names
array - we counted how many times each name appeared in the
names
array - we created a key using the
name
and a value using thecount
It may be slightly more verbose (and performance wise you will be doing some unnecessary work with overriding keys), but in my opinion easier to read and understand for what you want to achieve
Solution 13 - Ruby
This is more a comment than an answer, but a comment wouldn't do it justice. If you do Array = foo
, you crash at least one implementation of IRB:
C:\Documents and Settings\a.grimm>irb
irb(main):001:0> Array = nil
(irb):1: warning: already initialized constant Array
=> nil
C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:3177:in `rl_redisplay': undefined method `new' for nil:NilClass (NoMethodError)
from C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:3873:in `readline_internal_setup'
from C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4704:in `readline_internal'
from C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4727:in `readline'
from C:/Ruby19/lib/ruby/site_ruby/1.9.1/readline.rb:40:in `readline'
from C:/Ruby19/lib/ruby/1.9.1/irb/input-method.rb:115:in `gets'
from C:/Ruby19/lib/ruby/1.9.1/irb.rb:139:in `block (2 levels) in eval_input'
from C:/Ruby19/lib/ruby/1.9.1/irb.rb:271:in `signal_status'
from C:/Ruby19/lib/ruby/1.9.1/irb.rb:138:in `block in eval_input'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:189:in `call'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:189:in `buf_input'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:103:in `getc'
from C:/Ruby19/lib/ruby/1.9.1/irb/slex.rb:205:in `match_io'
from C:/Ruby19/lib/ruby/1.9.1/irb/slex.rb:75:in `match'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:287:in `token'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:263:in `lex'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:234:in `block (2 levels) in each_top_level_statement'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:230:in `loop'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:230:in `block in each_top_level_statement'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `catch'
from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `each_top_level_statement'
from C:/Ruby19/lib/ruby/1.9.1/irb.rb:153:in `eval_input'
from C:/Ruby19/lib/ruby/1.9.1/irb.rb:70:in `block in start'
from C:/Ruby19/lib/ruby/1.9.1/irb.rb:69:in `catch'
from C:/Ruby19/lib/ruby/1.9.1/irb.rb:69:in `start'
from C:/Ruby19/bin/irb:12:in `<main>'
C:\Documents and Settings\a.grimm>
That's because Array
is a class.
Solution 14 - Ruby
arr = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
arr.uniq.inject({}) {|a, e| a.merge({e => arr.count(e)})}
Time elapsed 0.028 milliseconds
interestingly, stupidgeek's implementation benchmarked:
Time elapsed 0.041 milliseconds
and the winning answer:
Time elapsed 0.011 milliseconds
:)
Solution 15 - Ruby
With ruby 2.6 you can do:
names.to_h{ |name| [name, names.count(name)] }
gives you:
{"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}