hypwsd

Name:
Location: Singapore, Singapore, Singapore

Monday, October 30, 2006

Documentation for NaiveBayes Model

Usage:

Feature Extraction:
perl featureExtract.pl --mode=[test2/test3/train] --pos --col

Numerate Training File:
perl numericTranslate.pl --file=[inputfile] --mode=[test/train]

CrossTraining File Generation:
perl ctFileGenerator.pl

Training:
perl naiveBayesTrain.pl --file=[inputfile] --option=[option]

Testing:
perl naiveBayesTesting.pl --file=[inputfile] --mode=[ct/t] --option=[option]

Friday, October 20, 2006

How to use STModified

This post is to record down how to use all the scripts in STModified.
  1. Extract Features
    --featureExtract.pl --mode=[train/test2/test3] --pos --col
  2. Numerate Feature File
    --numericTranslate.pl --file=[input] --mode=[train/test]
  3. NaiveBayesTrain
    --naiveBayesTrain.pl --file=[input] --option=[option]
  4. SenseTopicTrain
    --naiveBayesTrain.pl --file=[input] --option=[option]
    --cSenseTopicTrain.c [input] [option] [numtopics]
  5. SenseTopicModifiedTrain
    --naiveBayesTrain.pl --file=[input] --option=[option]
    --cSTMTrain.c [input][option][numtopics][nbweight]
  6. NaiveBayesTest
    --naiveBayesTest.pl --file=[input] --mode=[ct/t] --option=[option]
  7. SenseTopicTest
    --senseTopicTest.pl --file=[input] --mode=[ct/t] --option=[option]
  8. SenseTopicModifiedTest
    --senseTopicTestM.pl --file=[input] --mode=[ct/t] --option=[option]

Thursday, October 19, 2006

Bug Found In EM Code

I found a bug in my rewritten EM Code.

When I was updating u, the denominator was wrong. I shouldnt use num of features in it, as u is dependent on the position of feature. I should use the number of distinctive features on that position instead. so that the summation can be 1.

the retested using topic number = 20 on senseval2 is 62.2% (higher than the wrong 61.9%)

still need to turn on/off pos/word to see the effect.

Wednesday, October 18, 2006

HYP Status Update

Code:

1) Finished porting to tembusu
2) Combined SenseTopic with NaiveBayes

ToDo:

1) Analsye p(sense|topic), p(feature|topic)
2) Build a model combining NaiveBayes and SenseTopic
3) Build a model using Sentence Topic
4) Build a WordNet Sense1 Model as another Basecase
5) Take a look at DSO Corpus and the possibity of merging it with SemCor

Senseval2 Result(official scorer):
Most Frequent: 61.7%
NaiveBayes: 64.1%
SenseTopic: 61.9%

Sensetopic(lcontex, lpos both on) Crosstraining Result:
Most Frequent : 65.73%
2 topics: 67.18%
8 topics: 68.20%
10 topics: 68.26%
17 topics: 68.38%
19 topics: 68.39%
20 topics: 69.23%
21 topics: 68.41%
23 topics: 68.42%
25 topics: 69.21%
30 topics: 68.45%
40 topics: 69.20%
60 topics: 68.49%
80 topics: 68.50%