Presentation is loading. Please wait.

Presentation is loading. Please wait.

Writing a Perl XS swig interface to the CLucene C++ text search engine

Similar presentations


Presentation on theme: "Writing a Perl XS swig interface to the CLucene C++ text search engine"— Presentation transcript:

1 Writing a Perl XS swig interface to the CLucene C++ text search engine
Peter Edwards Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

2 Introduction Peter Edwards ~ background
Subject ~ writing a Perl XS swig interface to the CLucene C++ text search engine Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

3 Aims Give an idea of the process involved in selecting and using an external library from Perl Introduction to extending Perl using XS, swig, GNU autotools Entertainment Audience: What is your background and interest? Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

4 Topics Understanding the Problem The Answer (at a high level)
Technical Options Investigating Options Writing a perl / C++ Interface Layers and Components Lessons Learned Process Extending Perl Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

5 Terms Perl ~ Pathologically Eclectic Rubbish Lister $_ = "wftedskaebjgdpjgidbsmnjgc"; tr/a-z/oh, turtleneck Phrase Jar!/; print; Perl XS ~ eXternal Subroutine allows a perl program to call a C language subroutine XS is also the “glue” language specifying the calling interface contains complex “perlguts” stuff that will destroy your sanity SWIG ~ Simplified Wrapper and Interface Generator makes it easy to call a C/C++ library from many languages (perl, python, ruby, PHP…) C++ ~ Object Oriented version of C programming language text search ~ boolean searching of stemmed words, wildcards CLucene ~ C++ text search engine based on Java Lucene Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

6 Understanding the Problem
Recruitment software written in Perl 20,000+ candidate Word CVs/resumes Boolean searching using words or partial words and wildcards e.g. (“BA” or “MA”) and “literature” Combined with SQL searching e.g. geographic area, skill profile codes, pay rate Speed < 2 seconds Old system used dtSearch proprietary s/w Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

7 The Answer (at a high level)
Load Convert candidate CVs from Word to text using wvWare (OpenOffice) converter Index text against candidate no. Search Search text -> cand nos -> SQL temp table Normal SQL search on other criteria Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

8 Technical Options (at 2003/4)
Proprietary dtSearch ~ cost; hard to get cand nos out; Windows interface when perl app is Web Open Source Java Lucene ~ slow but good API and power C++ CLucene ~ alpha quality rewrite of Lucene in Visual C++ as degree project by Ben van Klinken Perl CPAN (PLucene etc.) below Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

9 Investigating Perl Options
Wrote test harness to load 1000 CVs then do some searches Tried about 5 CPAN modules PLucene search speed okay for small volumes but exponential increase in insert time >60 seconds per insert Why? Tokenises doc, multi-lingual word stemming, adds doc id to reverse lookup index for each stem token Other modules faster but search options weak Need to look further Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

10 Investigating CLucene
Wrote similar C++ test harness Speed good: search 20,000 CVs <1 second load 3 CVs per sec (mostly Word->text) Code written as VC++ degree project and registered at SourceForge Jimmy Pritts changed layout and added GNU autoconf files configure.ac Makefile.in to let it build cross-platform on Windows, cygwin, Linux Had C DLL interface used by PHP wrapper Decided to write Perl wrapper Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

11 Interfacing Perl to C++
When I wrote this wrapper, Perl to C++ interfacing via XS or SWIG was tricky and despite the optimism expressed at I had difficulties mapping the CLucene API to XS Reasons: C++ namespace mangling; object and method mapping; C++ memory garbage collection So I decided to go via the C DLL wrapper to hide this complexity Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

12 Perl XS Always start with h2xs utility Code is C with macro extensions
Write C code (XSUBs) Call internal Perl routines (perlguts) to create variables, allocate arrays… newSViv(IV), sv_setiv(SV*, IV) ~ scalar integer variable Complicated Nyarlathotep / “Crawling Chaos” Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

13 Enter SWIG Creates XS for you from a .i definition file
Parses C/C++ .h header files to get types and function prototypes Allows for inline C/XS code Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

14 Swig XS Sample From argv.i
// Creates a new Perl array and places a NULL-terminated char ** into it %typemap(out) char ** { AV *myav; SV **svs; int i = 0,len = 0; /* Figure out how many elements we have */ while ($1[len]) len++; svs = (SV **) malloc(len*sizeof(SV *)); for (i = 0; i < len ; i++) { svs[i] = sv_newmortal(); sv_setpv((SV*)svs[i],$1[i]); }; myav = av_make(len,svs); free(svs); $result = newRV((SV*)myav); sv_2mortal($result); argvi++; } Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

15 Diagram of Layers Perl OO Wrapper CLucene.pm Low Level Perl
CLuceneWrap.pm SWIG generated SWIG XS C Code clucene_wrap.c C DLL Interface clucene_dll.o CLucene C++ Library clucene.so Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

16 CLucene C++ Interface src/CLucene/search/SearchHeader.h: #include "CLucene/StdHeader.h" #ifndef _lucene_search_SearchHeader_ #define _lucene_search_SearchHeader_ #include "CLucene/index/IndexReader.h“ using namespace lucene::index; namespace lucene{ namespace search{ //predefine classes class Searcher; class Query; class Hits; class HitDoc { public: float_t score; int_t id; lucene::document::Document* doc; HitDoc* next; // in doubly-linked cache HitDoc* prev; // in doubly-linked cache HitDoc(const float_t s, const int_t i); ~HitDoc(); }; Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

17 CLucene C DLL Interface
src/wrappers/dll/clucene_dll.h: #ifndef _DLL_CLUCENE #define _DLL_CLUCENE #include "CLucene/CLConfig.h" #ifdef _UNICODE //unicode methods # define CL_UNLOCK CL_U_Unlock # define CL_OPEN CL_U_Open # define CL_DOCUMENT_INFO CL_U_Document_Info # define CL_ADD_FILE CL_U_Add_File CLUCENEDLL_API int CL_U_Unlock(const wchar_t* dir); CLUCENEDLL_API int CL_U_Delete(const int resource, const wchar_t* query, const wchar_t* field); CLUCENEDLL_API int CL_U_Add_Field(const int resource, const wchar_t* fie ld, const wchar_t* value, const int value_length, const int store, const int ind ex, const int token); Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

18 SWIG Definition File clucene.i
%module "FulltextSearch::CLuceneWrap" %{ #include "clucene_dllp.h" %} // our definitions for CLucene variables and functions %include "clucene_perl.h" //%include "clucene_dll.h" // could use this but then would need to call CL_N_Se arch not CL_SEARCH etc. %include typemaps.i %include argv.i // helper functions where pointers to result buffers are expected // would be better done with a %typemap(out) if I knew enough about perlguts %inline %{ int val_len; char * val; int CL_GetField1(int resource, char * field) { return CL_GETFIELD(resource,field,&val,&val_len); } Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

19 SWIG-Generated XS CLuceneWrap.pm
# This file was automatically generated by SWIG package FulltextSearch::CLuceneWrap; require Exporter; require DynaLoader; @ISA = qw(Exporter DynaLoader); package FulltextSearch::CLuceneWrapc; bootstrap FulltextSearch::CLuceneWrap; @EXPORT = qw( ); # BASE METHODS sub TIEHASH { my ($classname,$obj) return bless $obj, $classname; } sub CLEAR { } # FUNCTION WRAPPERS package FulltextSearch::CLuceneWrap; *CL_OPEN = *FulltextSearch::CLuceneWrapc::CL_OPEN; *CL_CLOSE = *FulltextSearch::CLuceneWrapc::CL_CLOSE; # VARIABLE STUBS *clucene_perl = *FulltextSearch::CLuceneWrapc::clucene_perl; *NULL = *FulltextSearch::CLuceneWrapc::NULL; *val_len = *FulltextSearch::CLuceneWrapc::val_len; *val = *FulltextSearch::CLuceneWrapc::val; *errstr = *FulltextSearch::CLuceneWrapc::errstr; Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

20 SWIG-Generated XS clucene_wrap.c
#ifdef __cplusplus extern "C" { #endif XS(_wrap_CL_OPEN) { { char *arg1 ; int arg2 = (int) 1 ; int result; int argvi = 0; dXSARGS; if ((items < 1) || (items > 2)) { SWIG_croak("Usage: CL_OPEN(path,create);"); } if (!SvOK((SV*) ST(0))) arg1 = 0; else arg1 = (char *) SvPV(ST(0), PL_na); if (items > 1) { arg2 = (int) SvIV(ST(1)); result = (int)CL_OPEN(arg1,arg2); ST(argvi) = sv_newmortal(); sv_setiv(ST(argvi++), (IV) result); XSRETURN(argvi); fail: ; croak(Nullch); Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

21 CLucene.pm Perl OO Wrapper
Back into the realms of sanity Normal OO package with methods Calls XS wrapper functions sub open { my $this = shift; my %arg my $path = $arg{path} || $this->{path} || confess "path undefined"; my $create = anyof ( $arg{create}, $this->{create}, 0 ); $this->{resource} = FulltextSearch::CLuceneWrap::CL_OPEN ( $path, $creat e ) or confess "Failed to CL_OPEN $this->{path} create $create errst r ".$this->errstrglobal(); $this->{path} = $path; $this; } Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

22 Build Environment Uses GNU autotools and m4 macro processor Definition files configure.ac ~ top level build definitions Makefile.am ~ makefile flags definitions Programs libtool ~ generalised library building aclocal ~ builds aclocal.m4 from configure.ac autoconf ~ reads configure.ac to create configure script autoheader ~ creates C header defines for configure automake ~ creates Makefile.in from Makefile.am autoreconf ~ manually remake whole tree of GNU build files Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

23 Bootstrap shell script
#!/bin/sh # Bootstrap the CLucene installation. mkdir -p ./build/gcc/config set -x libtoolize --force --copy --ltdl --automake aclocal autoconf autoheader automake -a --copy --foreign Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

24 Autoconf configure.ac file
dnl Process this file with autoconf to produce a configure script. dnl Written by Jimmy Pritts. dnl initialize autoconf and automake AC_INIT([clucene], [1]) AC_PREREQ([2.54]) AC_CONFIG_SRCDIR([src/CLucene.h]) AC_CONFIG_AUX_DIR([./build/gcc/config]) AC_CONFIG_HEADERS([config.h]) AM_INIT_AUTOMAKE dnl Check for existence of a C and C++ compilers. AC_PROG_CC AC_PROG_CXX dnl Check for headers AC_HEADER_DIRENT dnl Configure libtool. AC_PROG_LIBTOOL dnl option to use UTF-8 as internal 8-bit charset to support characters in Unicodeâ AC_ARG_ENABLE(utf8, AC_HELP_STRING([--enable-utf8],[UTF-8 as internal 8-bit charset to support characters in Unicodeâ ¢ (default=no)]), [AC_DEFINE([UTF8],[],[use UTF-8 as internal 8-bit charset to support characters in Unicodeâ ¢])],enable_utf8=no) AM_CONDITIONAL(USEUTF8, test x$enable_utf8 = xyes) AC_CONFIG_FILES([Makefile src/Makefile examples/Makefile examples/demo/Makefile examples/tests/Makefile examples/util/Makefile wrappers/Makefile wrappers/dll/Makefile wrappers/dll/dlltest/Makefile]) AC_OUTPUT Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

25 Makefile.am files src/Makefile.am: AUTOMAKE_OPTIONS = 1.6 include_HEADERS = CLucene.h lsrcdir = $(top_srcdir)/src/CLucene lib_LTLIBRARIES = libclucene.la libclucene_la_SOURCES = include CLucene/analysis/Makefile.am include CLucene/analysis/standard/Makefile.am include CLucene/debug/Makefile.am include CLucene/document/Makefile.am include CLucene/index/Makefile.am include CLucene/queryParser/Makefile.am include CLucene/search/Makefile.am include CLucene/store/Makefile.am include CLucene/util/Makefile.am include CLucene/Makefile.am ./Makefile.am: ## Makefile.am -- Process this file with automake to produce Makefile.in INCLUDES = -I$(top_srcdir) SUBDIRS = src wrappers examples . src/CLucene/document/Makefile.am: documentdir = $(lsrcdir)/document dochdir = $(includedir)/CLucene/document libclucene_la_SOURCES += $(documentdir)/DateField.cpp libclucene_la_SOURCES += $(documentdir)/Document.cpp libclucene_la_SOURCES += $(documentdir)/Field.cpp doch_HEADERS = $(documentdir)/*.h Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

26 Recap We saw how and why I selected an external Perl library
We looked at GNU autotools to provide a cross-platform build environment We investigated the layers of code needed to interface perl to a C++ library ~ SWIG, C, XS inline helpers, low and high level Perl modules Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

27 Lessons Learned Start off a new external library using GNU autotools and keeping in mind that the API should be easy to use through SWIG Use SWIG not XS to wrap a C/C++ library Always use h2xs to start a Perl extension Open Source feedback and testing are more valuable than you expect (2 s this week alone) Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019

28 Where to Get More Information
Perl XS C++ / XS SWIG Lucene CLucene Autoconf Book “Extending and Embedding Perl”, Jenness & Couzens (Manning, 2002) Any Questions These slides are at Perl XS and SWIG interface to CLucene C++ text search engine 19/02/2019


Download ppt "Writing a Perl XS swig interface to the CLucene C++ text search engine"

Similar presentations


Ads by Google