How to train the Stanford Parser with Genia Corpus?

JavaNlpStanford Nlp

Java Problem Overview


I have some problems to create a new model for Stanford Parser.

I have also downloaded the last version from Stanford: http://nlp.stanford.edu/software/lex-parser.shtml

And here, Genia Corpus in 2 formats, xml and ptb (Penn Treebank).

Standford Parser can train with ptd files ; then I downloaded Genia Corpus, because I want to work with biomedical text:

http://categorizer.tmit.bme.hu/~illes/genia_ptb/</s> (link no longer available) (genia_ptb.tar.gz)

Then, I have a short Main class to get dependency representation of one biomedical sentence:

    String treebankPath = "/stanford-parser-2012-05-22/genia_ptb/GENIA_treebank_v1/ptb";
    
    Treebank tr = op.tlpParams.diskTreebank();
    tr.loadPath(treebankPath);	
    LexicalizedParser lpc=LexicalizedParser.trainFromTreebank(tr,op);

I have tried different ways, but always get the same result.

I have an error in the last line. This is my output:

Currently Fri Jun 01 15:02:57 CEST 2012
Options parameters:
useUnknownWordSignatures 2
smoothInUnknownsThreshold 100
smartMutation false
useUnicodeType false
unknownSuffixSize 1
unknownPrefixSize 1
flexiTag true
useSignatureForKnownSmoothing false
parserParams edu.stanford.nlp.parser.lexparser.EnglishTreebankParserParams
forceCNF false
doPCFG true
doDep false
freeDependencies false
directional true
genStop true
distance true
coarseDistance false
dcTags false
nPrune false
Train parameters: smooth=false PA=true GPA=false selSplit=true (400.0; deleting [VP^SQ, VP^VP, VP^SINV, VP^NP]) mUnary=1 mUnaryTags=false sPPT=false tagPA=true tagSelSplit=false (0.0) rightRec=true leftRec=false collinsPunc=false markov=true mOrd=2 hSelSplit=true (10) compactGrammar=3 postPA=false postGPA=false selPSplit=false (0.0) tagSelPSplit=false (0.0) postSplitWithBase=false fractionBeforeUnseenCounting=0.5 openClassTypesThreshold=50 preTransformer=null taggedFiles=null
Using EnglishTreebankParserParams splitIN=4 sPercent=true sNNP=0 sQuotes=false sSFP=false rbGPA=false j#=false jJJ=false jNounTags=false sPPJJ=false sTRJJ=false sJJCOMP=false sMoreLess=false unaryDT=true unaryRB=true unaryPRP=false reflPRP=false unaryIN=false sCC=1 sNT=false sRB=false sAux=2 vpSubCat=false mDTV=2 sVP=3 sVPNPAgr=false sSTag=0 mVP=false sNP%=0 sNPPRP=false dominatesV=1 dominatesI=false dominatesC=false mCC=0 sSGapped=4 numNP=false sPoss=1 baseNP=1 sNPNNP=0 sTMP=1 sNPADV=1 cTags=true rightPhrasal=false gpaRootVP=false splitSbar=0 mPPTOiIN=0
Binarizing trees...done. Time elapsed: 141 ms
Extracting PCFG...done. Time elapsed: 56 ms
Compiling grammar...done Time elapsed: 1 ms
Extracting Lexicon...Exception in thread "main" edu.stanford.nlp.util.ReflectionLoading$ReflectionLoadingException: edu.stanford.nlp.util.MetaClass$ClassCreationException: java.lang.ClassNotFoundException: edu.stanford.nlp.parser.lexparser.EnglishUnknownWordModelTrainer
	at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(ReflectionLoading.java:39)
	at edu.stanford.nlp.parser.lexparser.BaseLexicon.initializeTraining(BaseLexicon.java:335)
	at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserFromTreebank(LexicalizedParser.java:800)
	at edu.stanford.nlp.parser.lexparser.LexicalizedParser.trainFromTreebank(LexicalizedParser.java:226)
	at edu.stanford.nlp.parser.lexparser.LexicalizedParser.trainFromTreebank(LexicalizedParser.java:237)
	at ABravoDemo.main(ABravoDemo.java:35)
Caused by: edu.stanford.nlp.util.MetaClass$ClassCreationException: java.lang.ClassNotFoundException: edu.stanford.nlp.parser.lexparser.EnglishUnknownWordModelTrainer
	at edu.stanford.nlp.util.MetaClass.createFactory(MetaClass.java:353)
	at edu.stanford.nlp.util.MetaClass.createInstance(MetaClass.java:370)
	at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(ReflectionLoading.java:37)
	... 5 more
Caused by: java.lang.ClassNotFoundException: edu.stanford.nlp.parser.lexparser.EnglishUnknownWordModelTrainer
	at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:303)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:316)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:169)
	at edu.stanford.nlp.util.MetaClass$ClassFactory.construct(MetaClass.java:119)
	at edu.stanford.nlp.util.MetaClass$ClassFactory.<init>(MetaClass.java:192)
	at edu.stanford.nlp.util.MetaClass$ClassFactory.<init>(MetaClass.java:53)
	at edu.stanford.nlp.util.MetaClass.createFactory(MetaClass.java:349)
	... 7 more

How could I create a new model with this corpus ?

Java Solutions


Solution 1 - Java

As andrucz stated in his comment, the real cause of your problem seems to stem from a missing class.

Try checking whether you correctly imported your library ( and make sure that it contains the class EnglishUnknownWordModelTra‌​iner in edu.stanford.nlp.parser.lexparser.

(If you're using Maven, verify that you correctly added the dependency - a quick google brougt this up : Stanford Parser Maven Repo )

Solution 2 - Java

Did the NLP library install correctly? Check in the logs to verify there are no errors. Most of the times this issue comes when there the stanford nltk library does not install correctly.

A quick way to check is by running the GUI to try out the parser if that runs successfully then the library installed correctly otherwise if that throws errors then you know your installation was poor.

The Stanford website also mentions this take a look:

If you're new to parsing, you can start by running the GUI to try out the parser. Scripts are included for linux (lexparser-gui.sh) and Windows (lexparser-gui.bat). Take a look at the Javadoc lexparser package documentation and LexicalizedParser class documentation. (Point your web browser at the index.html file in the included javadoc directory and navigate to those items.) Look at the parser FAQ for answers to common questions. If none of that helps, please see our email guidelines for instructions on how to reach us for further assistance.

Solution 3 - Java

Check whether you have correctly imported library and make sure that it is containing the class {EnglishUnknownWordModelTra‌​iner} and also make sure that version you downloaded properly works with Genia Corps.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionnathanView Question on Stackoverflow
Solution 1 - JavaMaximilian SchirmView Answer on Stackoverflow
Solution 2 - JavaBinny PezaView Answer on Stackoverflow
Solution 3 - JavaDivya MishraView Answer on Stackoverflow