Quantum Leap Pattern Matching A New High-Performance Quick Search- Style Algorithm Bruce W. WatsonDerrick KourieLoek Cleophas Stellenbosch University
Aim and contents Problem Solution sketch and code Examples z choices Benchmarking Conclusions Future work
Problem Single keyword exact pattern matching Given text t and pattern p (lengths n, m resp.), Find all occurences of p in t Recall Sunday’s Quick Search (QS) Shift bounded above by m+1 This is a family of algorithms – could’ve been Horspool
Ensuring z is worthwhile
Possible values for z
Consequences of z choices
Benchmarking 17-inch Macbook Pro, Intel Core i7 Quad-core. C Code, g++ LLVM, –O3 Bible and Ecoli (each approx 4MB) from SMART Random p taken from t Per m = 1,... 32, 256, 1024: – 30 randomly selected patterns 5 runs over the same data
Best case QLQS versus QS
Conclusions QLQS outperforms QS in most cases with an appropriate choice of z QLQS significantly outperforms when p and t alphabets are disjoint Large z choices appear to violate m+1 principle but QLQS does same table lookups as QS Significant instruction-level parallelism QLQS is as simple as QS Shift tables are easily computed First left to right algorithm using backward shifts? QLQS is speculative execution (take a Quantum Leap/shift, then check if it was valid)
Future work Probabilistic QLQS – validity a z shift not checked. Coarse-grained parallelism Benchmark QLQS using two dimensional shift tables (ZT and BR) Characterize QLQS on CPUs with little ILP Use Quantum Leap principle in other Boyer- Moore style algorithms multiple keyword, regex, tree, … Shift tables in QLQS formally derived in a correctness-by- construction formalism.
Thanks!