Natural Language Processing in Ruby

RubyArtificial IntelligenceNlp

Ruby Problem Overview


I'm looking to do some sentence analysis (mostly for twitter apps) and infer some general characteristics. Are there any good natural language processing libraries for this sort of thing in Ruby?

Similar to https://stackoverflow.com/questions/870460/java-is-there-a-good-natural-language-processing-library but for Ruby. I'd prefer something very general, but any leads are appreciated!

Ruby Solutions


Solution 1 - Ruby

Three excellent and mature NLP packages are Stanford Core NLP, Open NLP and LingPipe. There are Ruby bindings to the Stanford Core NLP tools (GPL license) as well as the OpenNLP tools (Apache License).

On the more experimental side of things, I maintain a Text Retrieval, Extraction and Annotation Toolkit (Treat), released under the GPL, that provides a common API for almost every NLP-related gem that exists for Ruby. The following list of Treat's features can also serve as a good reference in terms of stable natural language processing gems compatible with Ruby 1.9.

  • Text segmenters and tokenizers (punkt-segmenter, tactful_tokenizer, srx-english, scalpel)
  • Natural language parsers for English, French and German and named entity extraction for English (stanford-core-nlp).
  • Word inflection and conjugation (linguistics), stemming (ruby-stemmer, uea-stemmer, lingua, etc.)
  • WordNet interface (rwordnet), POS taggers (rbtagger, engtagger, etc.)
  • Language (whatlanguage), date/time (chronic, kronic, nickel), keyword (lda-ruby) extraction.
  • Text retrieval with indexation and full-text search (ferret).
  • Named entity extraction (stanford-core-nlp).
  • Basic machine learning with decision trees (decisiontree), MLPs (ruby-fann), SVMs (rb-libsvm) and linear classification (tomz-liblinear-ruby-swig).
  • Text similarity metrics (levenshtein-ffi, fuzzy-string-match, tf-idf-similarity).

Not included in Treat, but relevant to NLP: hotwater (string distance algorithms), yomu (binders to Apache Tiki for reading .doc, .docx, .pages, .odt, .rtf, .pdf), graph-rank (an implementation of GraphRank).

Solution 2 - Ruby

There are some things at Ruby Linguistics and some links therefrom, though it doesn't seem anywhere close to what NLTK is for Python, yet.

Solution 3 - Ruby

You can always use jruby and use the java libraries.

EDIT: The ability to do ruby natively on the jvm and easily leverage java libraries is a big plus for rubyists. This is a good option that should be considered in a situation like this.

https://stackoverflow.com/questions/895893/which-nlp-toolkit-to-use-in-java

Solution 4 - Ruby

I found an excellent article detailing some NLP algorithms in Ruby here. This includes stemmers, date time parsers and grammar parsers.

Solution 5 - Ruby

TREAT – the Text REtrieval and Annotation Toolkit – is the most comprehensive toolkit I know of for Ruby: https://github.com/louismullie/treat/wiki/

Solution 6 - Ruby

Also consider using SaaS APIs like MonkeyLearn. You can easily train text classifiers with machine learning and integrate via an API. There's a Ruby SDK available.

Besides creating your own classifiers, you can pick pre-created modules for sentiment analysis, topic classification, language detection and more. We also have extractors like keyword extraction and entities, and we'll keep adding more public modules.

Other nice features:

  • You have a GUI to create/test algorithms.
  • Algorithms run really fast in our cloud computing platform.
  • You can integrate with Ruby or any other programming language.

Solution 7 - Ruby

I maintain a list of Ruby Natural Language Processing resources (libraries, APIs, and presentations) on GitHub that covers the libraries listed in the other answers here as well as some additional libraries.

Solution 8 - Ruby

Try this one

https://github.com/louismullie/stanford-core-nlp

About stanford-core-nlp gem

This gem provides high-level Ruby bindings to the Stanford Core NLP package, a set natural language processing tools for tokenization, sentence segmentation, part-of-speech tagging, lemmatization, and parsing of English, French and German. The package also provides named entity recognition and coreference resolution for English.

http://nlp.stanford.edu/software/corenlp.shtml demo page http://nlp.stanford.edu:8080/corenlp/

Solution 9 - Ruby

You need to be much more specific about what these "general characteristics" are.

In NLP "general characteristics" of a sentence can mean a million different things - sentiment analysis (ie, the attitude of the speaker), basic part of speech tagging, use of personal pronoun, does the sentence contain active or passive verbs, what's the tense and voice of the verbs...

I don't mind if you're vague about describing it, but if we don't know what you're asking it's highly unlikely we can be specific in helping you.

My general suggestion, especially for NLP, is you should get the tool best designed for the job instead of limiting yourself to a specific language. Limiting yourself to a specific language is fine for some tasks where the general tools are implemented everywhere, but NLP is not one of those.

The other issue in working with Twitter is a great deal of the sentences there will be half baked or compressed in strange and wonderful ways - which most NLP tools aren't trained for. To help there, the NUS SMS Corpus consists of "about 10,000 SMS messages collected by students". Due to the similar restrictions and usage, analysing that may be helpful in your explorations with Twitter.

If you're more specific I'll try and list some tools that will help.

Solution 10 - Ruby

I would check out Mark Watson's free book Practical Semantic Web and Linked Data Applications, Java, Scala, Clojure, and JRuby Edition. He has chapters on NLP using java, clojure, ruby, and scala. He also provides links to the resources you need.

Solution 11 - Ruby

For people looking for something more lightweight and simple to implement this option worked well for me.

https://github.com/yohasebe/engtagger

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJoey RobertView Question on Stackoverflow
Solution 1 - Rubyuser2398029View Answer on Stackoverflow
Solution 2 - RubyAlex MartelliView Answer on Stackoverflow
Solution 3 - RubyjshenView Answer on Stackoverflow
Solution 4 - RubyJoey RobertView Answer on Stackoverflow
Solution 5 - RubyzanbriView Answer on Stackoverflow
Solution 6 - RubyRaul GarretaView Answer on Stackoverflow
Solution 7 - Rubydiasks2View Answer on Stackoverflow
Solution 8 - RubyLohith MVView Answer on Stackoverflow
Solution 9 - RubySmerityView Answer on Stackoverflow
Solution 10 - RubyAdam DView Answer on Stackoverflow
Solution 11 - RubyJohnSalzaruloView Answer on Stackoverflow