Convert a Nokogiri document to a Ruby Hash

XmlRubyHashNokogiriLibxml Ruby

Xml Problem Overview


Is there an easy way to convert a Nokogiri XML document to a Hash?

Something like Rails' Hash.from_xml.

Xml Solutions


Solution 1 - Xml

If you want to convert a Nokogiri XML document to a hash, just do the following:

require 'active_support/core_ext/hash/conversions'
hash = Hash.from_xml(nokogiri_document.to_s)

Solution 2 - Xml

Here's a far simpler version that creates a robust Hash that includes namespace information, both for elements and attributes:

require 'nokogiri'
class Nokogiri::XML::Node
  TYPENAMES = {1=>'element',2=>'attribute',3=>'text',4=>'cdata',8=>'comment'}
  def to_hash
    {kind:TYPENAMES[node_type],name:name}.tap do |h|
      h.merge! nshref:namespace.href, nsprefix:namespace.prefix if namespace
      h.merge! text:text
      h.merge! attr:attribute_nodes.map(&:to_hash) if element?
      h.merge! kids:children.map(&:to_hash) if element?
    end
  end
end
class Nokogiri::XML::Document
  def to_hash; root.to_hash; end
end

Seen in action:

xml = '<r a="b" xmlns:z="foo"><z:a>Hello <b z:m="n" x="y">World</b>!</z:a></r>'
doc = Nokogiri::XML(xml)
p doc.to_hash
#=> {
#=>   :kind=>"element",
#=>   :name=>"r",
#=>   :text=>"Hello World!",
#=>   :attr=>[
#=>     {
#=>       :kind=>"attribute",
#=>       :name=>"a", 
#=>       :text=>"b"
#=>     }
#=>   ], 
#=>   :kids=>[
#=>     {
#=>       :kind=>"element", 
#=>       :name=>"a", 
#=>       :nshref=>"foo", 
#=>       :nsprefix=>"z", 
#=>       :text=>"Hello World!", 
#=>       :attr=>[], 
#=>       :kids=>[
#=>         {
#=>           :kind=>"text", 
#=>           :name=>"text", 
#=>           :text=>"Hello "
#=>         },
#=>         {
#=>           :kind=>"element", 
#=>           :name=>"b", 
#=>           :text=>"World", 
#=>           :attr=>[
#=>             {
#=>               :kind=>"attribute", 
#=>               :name=>"m", 
#=>               :nshref=>"foo", 
#=>               :nsprefix=>"z", 
#=>               :text=>"n"
#=>             },
#=>             {
#=>               :kind=>"attribute", 
#=>               :name=>"x", 
#=>               :text=>"y"
#=>             }
#=>           ], 
#=>           :kids=>[
#=>             {
#=>               :kind=>"text", 
#=>               :name=>"text", 
#=>               :text=>"World"
#=>             }
#=>           ]
#=>         },
#=>         {
#=>           :kind=>"text", 
#=>           :name=>"text", 
#=>           :text=>"!"
#=>         }
#=>       ]
#=>     }
#=>   ]
#=> }

Solution 3 - Xml

I use this code with libxml-ruby (1.1.3). I have not used nokogiri myself, but I understand that it uses libxml-ruby anyway. I would also encourage you to look at ROXML (http://github.com/Empact/roxml/tree) which maps xml elements to ruby objects; it is built atop libxml.

# USAGE: Hash.from_libxml(YOUR_XML_STRING)
require 'xml/libxml'
# adapted from 
# http://movesonrails.com/articles/2008/02/25/libxml-for-active-resource-2-0

class Hash 
  class << self
        def from_libxml(xml, strict=true) 
          begin
            XML.default_load_external_dtd = false
            XML.default_pedantic_parser = strict
            result = XML::Parser.string(xml).parse 
            return { result.root.name.to_s => xml_node_to_hash(result.root)} 
          rescue Exception => e
			# raise your custom exception here
          end
        end 

        def xml_node_to_hash(node) 
          # If we are at the root of the document, start the hash 
          if node.element? 
           if node.children? 
              result_hash = {} 

              node.each_child do |child| 
                result = xml_node_to_hash(child) 

                if child.name == "text"
                  if !child.next? and !child.prev?
                    return result
                  end
                elsif result_hash[child.name.to_sym]
                    if result_hash[child.name.to_sym].is_a?(Object::Array)
                      result_hash[child.name.to_sym] << result
                    else
                      result_hash[child.name.to_sym] = [result_hash[child.name.to_sym]] << result
                    end
                  else 
                    result_hash[child.name.to_sym] = result
                  end
                end

              return result_hash 
            else 
              return nil 
           end 
           else 
            return node.content.to_s 
          end 
        end          
    end
end

Solution 4 - Xml

Use Nokogiri to parse XML response to ruby hash. It's pretty fast.

doc = Nokogiri::XML(response_body) 
Hash.from_xml(doc.to_s)

Solution 5 - Xml

I found this while trying to simply convert XML to Hash (not in Rails). I was thinking I would use Nokogiri, but ended up going with Nori.

Then my code was trival:

response_hash = Nori.parse(response)

Other users have pointed out that this does not work. I have not verified, but it seems that the parse method has been moved from the class to the instance. My code above worked at some point. New (unverified) code would be:

response_hash = Nori.new.parse(response)

Solution 6 - Xml

If you define something like this in your configuration:

ActiveSupport::XmlMini.backend = 'Nokogiri'

it includes a module in Nokogiri and you gain the to_hash method.

Solution 7 - Xml

Have a look at the simple mix-in I made for the Nokogiri XML Node.

http://github.com/kuroir/Nokogiri-to-Hash

Here's a usage example:

require 'rubygems'
require 'nokogiri'
require 'nokogiri_to_hash'
html = '
  <div id="hello" class="container">
    <p>Hello! visit my site <a href="http://kuroir.com">Kuroir.com</a></p>
  </div>
'
p Nokogiri.HTML(html).to_hash
=> [{:div=>{:class=>["container"], :children=>[{:p=>{:children=>[{:a=>{:href=>["http://kuroir.com"], :children=>[]}}]}}], :id=>["hello"]}}]

Solution 8 - Xml

If the node you've selected in Nokogiri consists of only one tag, you can extract the keys, values and zip them into one hash, like so:

  @doc ||= Nokogiri::XML(File.read("myxmldoc.xml"))
  @node = @doc.at('#uniqueID') # this works if this selects only one node
  nodeHash = Hash[*@node.keys().zip(@node.values()).flatten]

See http://www.ruby-forum.com/topic/125944 for more info on Ruby array merging.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionIvanView Question on Stackoverflow
Solution 1 - XmlGuillaume RoderickView Answer on Stackoverflow
Solution 2 - XmlPhrogzView Answer on Stackoverflow
Solution 3 - XmlA.AliView Answer on Stackoverflow
Solution 4 - XmlPythonDevView Answer on Stackoverflow
Solution 5 - XmlJohn HinneganView Answer on Stackoverflow
Solution 6 - XmlPierre SchambacherView Answer on Stackoverflow
Solution 7 - XmlMarioRicaldeView Answer on Stackoverflow
Solution 8 - XmljuanfeView Answer on Stackoverflow