I have followed in the documentation from nltk book (chapter 6 and 7) and other ideas to train my own model for named entity recognition. After building a feature function and ClassifierBasedTagger like this:
class NamedEntityChunker(ChunkParserI): def __init__(self, train_sents, feature_detector=features, **kwargs): assert isinstance(train_sents, Iterable) tagged_sents = [[((w,t),c) for (w,t,c) in tree2conlltags(sent)] for sent in train_sents] #other possible option: self.feature_detector = features self.tagger = ClassifierBasedTagger(tagged_sents, feature_detector=feature_detector, **kwargs) def parse(self, tagged_sent): chunks = self.tagger.tag(tagged_sent) iob_triplets = [(w, t, c) for ((w, t), c) in chunks] # Transform the list of triplets to nltk.Tree format return conlltags2tree(iob_triplets)
I am having problems when caling the classifiertagger from another script where I load my traning and test data. I call the classifier using a portion from my training data for testing purpose from:
chunker = NamedEntityChunker(training_samples[:500])
No matter what I change in my classifier I keept getting the error:
self.tagger = ClassifierBasedTagger(tagged_sents, feature_detector=feature_detector, **kwargs) TypeError: __init__() got multiple values for argument 'feature_detector'
What am I doing wrong here, I supossed the feature function is working fine and I don't have to pass anything else when calling NamedEntityChunker().
my second question, is there a way to save the model being trained and reuse it lataer, how can I approach this? This is a follow up of my last question on training data
Thanks for any advise