hypwsd: October 2006

Documentation for NaiveBayes Model

Usage:

Feature Extraction:
perl featureExtract.pl --mode=[test2/test3/train] --pos --col

Numerate Training File:
perl numericTranslate.pl --file=[inputfile] --mode=[test/train]

CrossTraining File Generation:
perl ctFileGenerator.pl

Training:
perl naiveBayesTrain.pl --file=[inputfile] --option=[option]

Testing:
perl naiveBayesTesting.pl --file=[inputfile] --mode=[ct/t] --option=[option]

How to use STModified

This post is to record down how to use all the scripts in STModified.

Extract Features
--featureExtract.pl --mode=[train/test2/test3] --pos --col
Numerate Feature File
--numericTranslate.pl --file=[input] --mode=[train/test]
NaiveBayesTrain
--naiveBayesTrain.pl --file=[input] --option=[option]
SenseTopicTrain
--naiveBayesTrain.pl --file=[input] --option=[option]
--cSenseTopicTrain.c [input] [option] [numtopics]
SenseTopicModifiedTrain
--naiveBayesTrain.pl --file=[input] --option=[option]
--cSTMTrain.c [input][option][numtopics][nbweight]
NaiveBayesTest
--naiveBayesTest.pl --file=[input] --mode=[ct/t] --option=[option]
SenseTopicTest
--senseTopicTest.pl --file=[input] --mode=[ct/t] --option=[option]
SenseTopicModifiedTest
--senseTopicTestM.pl --file=[input] --mode=[ct/t] --option=[option]

Bug Found In EM Code

I found a bug in my rewritten EM Code.

When I was updating u, the denominator was wrong. I shouldnt use num of features in it, as u is dependent on the position of feature. I should use the number of distinctive features on that position instead. so that the summation can be 1.

the retested using topic number = 20 on senseval2 is 62.2% (higher than the wrong 61.9%)

still need to turn on/off pos/word to see the effect.

HYP Status Update

Code:

1) Finished porting to tembusu
2) Combined SenseTopic with NaiveBayes

ToDo:

1) Analsye p(sense|topic), p(feature|topic)
2) Build a model combining NaiveBayes and SenseTopic
3) Build a model using Sentence Topic
4) Build a WordNet Sense1 Model as another Basecase
5) Take a look at DSO Corpus and the possibity of merging it with SemCor

Senseval2 Result(official scorer):
Most Frequent: 61.7%
NaiveBayes: 64.1%
SenseTopic: 61.9%

Sensetopic(lcontex, lpos both on) Crosstraining Result:
Most Frequent : 65.73%
2 topics: 67.18%
8 topics: 68.20%
10 topics: 68.26%
17 topics: 68.38%
19 topics: 68.39%
20 topics: 69.23%
21 topics: 68.41%
23 topics: 68.42%
25 topics: 69.21%
30 topics: 68.45%
40 topics: 69.20%
60 topics: 68.49%
80 topics: 68.50%

hypwsd

About Me

Monday, October 30, 2006

Documentation for NaiveBayes Model

Friday, October 20, 2006

How to use STModified

Thursday, October 19, 2006

Bug Found In EM Code

Wednesday, October 18, 2006

HYP Status Update