has_and_belongs_to_many, avoiding dupes in the join table

Ruby on-RailsActiverecord

Ruby on-Rails Problem Overview


I have a pretty simple HABTM set of models

class Tag < ActiveRecord::Base 
   has_and_belongs_to_many :posts
end 

class Post < ActiveRecord::Base 
   has_and_belongs_to_many :tags
   
   def tags= (tag_list) 
      self.tags.clear 
      tag_list.strip.split(' ').each do 
        self.tags.build(:name => tag) 
      end
   end 
end 

Now it all works alright except that I get a ton of duplicates in the Tags table.

What do I need to do to avoid duplicates (bases on name) in the tags table?

Ruby on-Rails Solutions


Solution 1 - Ruby on-Rails

Prevent duplicates in the view only (Lazy solution)

The following does not prevent writing duplicate relationships to the database, it only ensures find methods ignore duplicates.

In Rails 5:

has_and_belongs_to_many :tags, -> { distinct }

Note: Relation#uniq was depreciated in Rails 5 (commit)

In Rails 4

has_and_belongs_to_many :tags, -> { uniq }

Prevent duplicate data from being saved (best solution)

Option 1: Prevent duplicates from the controller:

post.tags << tag unless post.tags.include?(tag)

However, multiple users could attempt post.tags.include?(tag) at the same time, thus this is subject to race conditions. This is discussed here.

For robustness you can also add this to the Post model (post.rb)

def tag=(tag)
  tags << tag unless tags.include?(tag)
end

Option 2: Create a unique index

The most foolproof way of preventing duplicates is to have duplicate constraints at the database layer. This can be achieved by adding a unique index on the table itself.

rails g migration add_index_to_posts
# migration file
add_index :posts_tags, [:post_id, :tag_id], :unique => true
add_index :posts_tags, :tag_id

Once you have the unique index, attempting to add a duplicate record will raise an ActiveRecord::RecordNotUnique error. Handling this is out of the scope of this question. View this SO question.

rescue_from ActiveRecord::RecordNotUnique, :with => :some_method

Solution 2 - Ruby on-Rails

In addition the suggestions above:

  1. add :uniq to the has_and_belongs_to_many association
  2. adding unique index on the join table

I would do an explicit check to determine if the relationship already exists. For instance:

post = Post.find(1)
tag = Tag.find(2)
post.tags << tag unless post.tags.include?(tag)

Solution 3 - Ruby on-Rails

You can pass the :uniq option as described in the documentation. Also note that the :uniq options doesn't prevent the creation of duplicate relationships, it only ensures accessor/find methods will select them once.

If you want to prevent duplicates in the association table you should create an unique index and handle the exception. Also validates_uniqueness_of doesn't work as expected because you can fall into the case a second request is writing to the database between the time the first request checks for duplicates and writes into the database.

Solution 4 - Ruby on-Rails

In Rails4:

class Post < ActiveRecord::Base 
  has_and_belongs_to_many :tags, -> { uniq }

(beware, the -> { uniq } must be directly after the relation name, before other params)

Rails documentation

Solution 5 - Ruby on-Rails

Set the uniq option:

class Tag < ActiveRecord::Base 
   has_and_belongs_to_many :posts , :uniq => true
end 

class Post < ActiveRecord::Base 
   has_and_belongs_to_many :tags , :uniq => true

Solution 6 - Ruby on-Rails

I would prefer to adjust the model and create the classes this way:

class Tag < ActiveRecord::Base 
   has_many :taggings
   has_many :posts, :through => :taggings
end 

class Post < ActiveRecord::Base 
   has_many :taggings
   has_many :tags, :through => :taggings
end

class Tagging < ActiveRecord::Base 
   belongs_to :tag
   belongs_to :post
end

Then I would wrap the creation in logic so that Tag models were reused if it existed already. I'd probably even put a unique constraint on the tag name to enforce it. That makes it more efficient to search either way since you can just use the indexes on the join table (to find all posts for a particular tag, and all tags for a particular post).

The only catch is that you can't allow renaming of tags since changing the tag name would affect all uses of that tag. Make the user delete the tag and create a new one instead.

Solution 7 - Ruby on-Rails

I worked around this by creating a before_save filter that fixes stuff up.

class Post < ActiveRecord::Base 
   has_and_belongs_to_many :tags
   before_save :fix_tags

   def tag_list= (tag_list) 
      self.tags.clear 
      tag_list.strip.split(' ').each do 
        self.tags.build(:name => tag) 
      end
   end  

    def fix_tags
      if self.tags.loaded?
        new_tags = [] 
        self.tags.each do |tag|
          if existing = Tag.find_by_name(tag.name) 
            new_tags << existing
          else 
            new_tags << tag
          end   
        end
    
        self.tags = new_tags 
      end
    end

end

It could be slightly optimised to work in batches with the tags, also it may need some slightly better transactional support.

Solution 8 - Ruby on-Rails

To me work

  1. adding unique index on the join table

  2. override << method in the relation

     has_and_belongs_to_many :groups do
       def << (group)
         group -= self if group.respond_to?(:to_a)
         super group unless include?(group)
       end
     end
    

Solution 9 - Ruby on-Rails

This is really old but I thought I'd share my way of doing this.

class Tag < ActiveRecord::Base 
    has_and_belongs_to_many :posts
end 

class Post < ActiveRecord::Base 
    has_and_belongs_to_many :tags
end

In the code where I need to add tags to a post, I do something like:

new_tag = Tag.find_by(name: 'cool')
post.tag_ids = (post.tag_ids + [new_tag.id]).uniq

This has the effect of automatically adding/removing tags as necessary or doing nothing if that's the case.

Solution 10 - Ruby on-Rails

Extract the tag name for security. Check whether or not the tag exists in your tags table, then create it if it doesn't:

name = params[:tag][:name]
@new_tag = Tag.where(name: name).first_or_create

Then check whether it exists within this specific collection, and push it if it doesn't:

@taggable.tags << @new_tag unless @taggable.tags.exists?(@new_tag)

Solution 11 - Ruby on-Rails

You should add an index on the tag :name property and then use the find_or_create method in the Tags#create method

docs

Solution 12 - Ruby on-Rails

Just add a check in your controller before adding the record. If it does, do nothing, if it doesn't, add a new one:

u = current_user
a = @article
if u.articles.exists?(a)

else
  u.articles << a
end

More: "4.4.1.14 collection.exists?(...)" http://edgeguides.rubyonrails.org/association_basics.html#scopes-for-has-and-belongs-to-many

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSam SaffronView Question on Stackoverflow
Solution 1 - Ruby on-RailsJeremy LynchView Answer on Stackoverflow
Solution 2 - Ruby on-RailsspyleView Answer on Stackoverflow
Solution 3 - Ruby on-RailsSimone CarlettiView Answer on Stackoverflow
Solution 4 - Ruby on-RailscyrilchampierView Answer on Stackoverflow
Solution 5 - Ruby on-RailsJoshua CheekView Answer on Stackoverflow
Solution 6 - Ruby on-RailsJeff WhitmireView Answer on Stackoverflow
Solution 7 - Ruby on-RailsSam SaffronView Answer on Stackoverflow
Solution 8 - Ruby on-RailsJose Fuentes DelgadoView Answer on Stackoverflow
Solution 9 - Ruby on-RailsJaveedView Answer on Stackoverflow
Solution 10 - Ruby on-Railsdav1dhuntView Answer on Stackoverflow
Solution 11 - Ruby on-RailsajbrausView Answer on Stackoverflow
Solution 12 - Ruby on-RailsMatthew BennettView Answer on Stackoverflow