update 2021.10: minor fixes. Tested OK with PyLucene 8.9
update 2019.10: add support for PyLucene 8.1.1
update 2019.03: add support for Pylucene 7.7.1+ and clarify some steps
update 2017.07: add support for Python 3
Prerequisites:
Install Python 2.7.x or Python 3.4+ or Anaconda (Python 2.7.x or Python 3.4+)
Install python packages: numpy, scipy and gensim
Install JDK 1.8 (64 bit) and set environment variables
*The latest JDK 1.8.x is recommended
*JDK 10+ may be incompatible
*Anaconda is recommended since some people seems to have difficulties with raw Python
*If you use raw Python instead of Anaconda, then you may need to add environment variables such as PYTHONHOME and PYTHONPATH to ensure that python can be called on command-line prompt. For Anaconda user, it is recommended to use Anaconda prompt.
- JAVA_HOME=C:\jdk1.8.0_06 (or other path)
- add %JAVA_HOME%\bin to PATH
- CLASSPATH=.;%JAVA_HOME%\lib;%JAVA_HOME%\lib\tools.jar
- add CLASSPATH as an environment variable
- add %JAVA_HOME%\jre\bin\server to PATH
- Install one of the following libraries subject to your environment:
- Visual C++ library for Python 2.7
- (www.microsoft.com/en-us/download/details.aspx?id=44266)
- Visual C++ 2015 Build Tools for Python
- (http://landinghub.visualstudio.com/visual-cpp-build-tools)
- Visual Studio 2015+ (community version is enough. Choose c++ related tools and all versions of SDK to install in the VS installer)
- (https://visualstudio.microsoft.com)
Step 1. Install Apache Ant and set environment variables
1. specify ANT_HOME and add ANT_HOME as an environment variable
2. add %ANT_HOME%\bin to PATH
Step 2. Install cygwin 64 and set environment variables
0. When installing cygwin, choose “Devel” from “Default” to “Installed”
*the space of all packages under “Devel” category are quite large (30~70 GB). “Default” with some basic gcc/g++ tools, libraries and make/cmake utilities are already enough for PyLucene installation.
*choose “Debug” from “INSTALL” to “Unstall” or “Default” can save a lot of space.
1. specify CYGWIN_HOME and add CYGWIN_HOME as an environment variable
2. add %CYGWIN_HOME%\bin to PATH
restart your computer
Step 3. Download source code of PyLucene and extract, we obtain a directory named ‘PyLucene-6.5’
* For PyLucene 7 user, the latest Pylucene 7.7.1 is recommended because
the previous PyLucene 7.x may encouter errors during the installation
process.
Step 4. Open Anaconda prompt, execute command under directory JCC
1. python setup.py build
2. python setup.py install
restart your computer
for Linux user, edit line 71 of setup.py to specify your JAVA home (For example, ‘linux’: JAVAHOME) or simply set a JCC_JDK environment variable sharing same value with JAVA_HOME.
Step 5. Edit PyLucene-6.5/Makefile
first comment the default configuration (by adding ‘#’) like the following
# Mac OS X 10.12 (64-bit Python 2.7, Java 1.8)
#PREFIX_PYTHON=/Users/vajda/apache/pylucene/_install
#ANT=/Users/vajda/tmp/apache-ant-1.9.3/bin/ant
#PYTHON=$(PREFIX_PYTHON)/bin/python
#JCC=$(PYTHON) -m jcc.__main__ –shared –arch x86_64
#NUM_FILES=8
And then insert the following configuration (the following are just examples)
PREFIX_PYTHON=D:/Progra~2/Anaconda2
ANT=D:/apache-ant-1.9.7/bin/ant
JAVA_HOME=C:/Progra~1/Java/jdk1.8.0_101
PYTHON=$(PREFIX_PYTHON)/python.exe
JCC=$(PYTHON) -m jcc
NUM_FILES=8
*for PyLucene 8 installation, set NUM_FILES=10. If encounter [WinError 267] when executing ‘make install’, execute ‘make clean’ and then ‘make install’ again.
*if the path contains blank space, you need to replace it by dos path like ‘C:/PROGRA~1’
*if you create a Anaconda environment, then change PREFIX_PYTHON to the root directory of this environment. (e.g Anaconda2/envs/ENVIRONMENT_NAME)
Step 6. Execute command ‘make’ under directory ‘PyLucene-6.5.0’ to build the whole project
*use command “make -j2” or “make -j4” to speed up
Step 7. Execute command ‘make install’ under directory ‘PyLucene-6.5.0’
Frequently Encountered Problems
1. Python (actually ‘setuptools’ package) cannot call Visual C++ (MSVC) to compile the project
solution: Rewrite PYTHON_PATH\Lib\distutils\distutils.cfg like the following
[build]
compiler=msvc
[build_ext]
compiler=msvc
2. It fails when compiling JCC
solution: open setup.py and
find line ‘win32’: [“/EHsc”, “/D_CRT_SECURE_NO_WARNINGS”],
replace this line by
‘win32’: [“/EHsc”, “/D_CRT_SECURE_NO_WARNINGS”,”/bigobj”],
3. [WinError 267] the directory name is invalid
replace NUM_FILES=8 by NUM_FILES=10 in Makefile.
execute ‘make clean’, and then execute ‘make’ to compile the whole project again
4. JCC_JDK not found (may encounter on Linux)
set “JCC_JDK” as environment variable. The value is same as JAVA_HOME
5. encounter “dynamic mode does not define module export function” when “import lucene” in Python prompt (may encounter on linux)
remove lucene and jcc packages by execute “pip uninstall lucene” and “pip uninstall jcc”
edit pylucene-6.5/jcc/setup.py. replace the line 71 by ‘linux’:JAVAHOME,
remove jcc/build folder and execute “make clean” under pylucene-6.5 directory.
install jcc and pylucene again