天道酬勤,学无止境

pylucene

Finding a single fields terms with Lucene (PyLucene)

I'm fairly new to Lucene's Term Vectors - and want to make sure my term gathering is as efficient as it possibly can be. I'm getting the unique terms and then retrieving the docFreq() of the term to perform faceting. I'm gathering all documents terms from the index using: lindex = SimpleFSDirectory(File(indexdir)) ireader = IndexReader.open(lindex, True) terms = ireader.terms() #Returns TermEnum This works fine, but is there a way to only return terms for specific fields (across all documents) - wouldn't that be more efficient? Such as: ireader.terms(Field="country")

2021-06-10 23:59:35    分类:问答    python   lucene   facet   pylucene

writing a custom anaylzer in pylucene/inheritance using jcc?

I want to write a custom analyzer in pylucene. Usually in java lucene , when you write a analyzer class , your class inherits lucene's Analyzer class. but pylucene uses jcc , the java to c++/python compiler. So how do you let a python class inherit from a java class using jcc ,and especially how do you write a custom pylucene analyzer? Thanks.

2021-06-10 02:55:48    分类:问答    python   pylucene   jcc

PyLucene 中的 DelimitedPayloadFilter?(DelimitedPayloadFilter in PyLucene?)

问题 我正在尝试使用 pylucene 从 http://searchhub.org/2010/04/18/refresh-getting-started-with-payloads/ 实现 java 的 python 版本。 我的分析器在对 DelimitedTokenFilter 的 init 调用中产生一个 lucene.InvalidArgsError 课程在下面,非常感谢任何帮助。 使用 pylucene 3.6 构建的 JAR 文件编译的 java 版本工作正常。 import lucene class PayloadAnalyzer(lucene.PythonAnalyzer): encoder = None def __init__(self, encoder): lucene.PythonAnalyzer.__init__(self) self.encoder = encoder def tokenStream(self, fieldName, reader): result = lucene.WhitespaceTokenizer( lucene.Version.LUCENE_CURRENT, reader ) result = lucene.LowerCaseFilter( lucene.Version.LUCENE_CURRENT, result )

2021-06-04 23:01:08    分类:技术分享    analyzer   payload   pylucene

Building Pylucene on ubuntu 14.04(trusty tahr)

As per the installation instructions, JCC is successfully built. Dependencies Installed were: ant, openjdk-7-jdk, python-setuptools, python-dev. Then procedding to make pylucene, in "Makefile" i choose specs corresponding to Ubuntu 11. # Linux (Ubuntu 11.10 64-bit, Python 2.7.2, OpenJDK 1.7, setuptools 0.6.16) # Be sure to also set JDK['linux2'] in jcc's setup.py to the JAVA_HOME value # used below for ANT (and rebuild jcc after changing it). PREFIX_PYTHON=/usr ANT=JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 /usr/bin/ant PYTHON=$(PREFIX_PYTHON)/bin/python JCC=$(PYTHON) -m jcc --shared NUM

2021-06-03 06:18:01    分类:问答    python   ubuntu   lucene   ubuntu-14.04   pylucene

DelimitedPayloadFilter in PyLucene?

I am trying to implement a python version of the java from http://searchhub.org/2010/04/18/refresh-getting-started-with-payloads/ using pylucene. My analyzer is producing an lucene.InvalidArgsError on the init call to the DelimitedTokenFilter The class is below, and any help is greatly appreciated. The java version compiled with the JAR files from the pylucene 3.6 build works fine. import lucene class PayloadAnalyzer(lucene.PythonAnalyzer): encoder = None def __init__(self, encoder): lucene.PythonAnalyzer.__init__(self) self.encoder = encoder def tokenStream(self, fieldName, reader): result =

2021-05-11 22:55:05    分类:问答    analyzer   payload   pylucene