Dr. Chris Musselle – Consultant R Meets Julia Dr Chris Musselle
Dr. Chris Musselle – Consultant Outline Julia – What, So What, When? Julia – Where its currently at Julia and R Case Study: Calculating String Similarity
Dr. Chris Musselle – Consultant - julialang.org A flexible dynamic language appropriate for scientific and numerical computing. Arrived Feb 2012 after 2 years development at MIT. Julia released Aug Free and open source (MIT Licensed)
Dr. Chris Musselle – Consultant Language Features Performance comparable to compiled languages. Designed with distributed computing in mind. Dynamic typing, optional declaration, Multiple dispatch. Libs written in Julia, git based package management. Direct calling of C and Fortran libraries. Interactive REPL “Read-Eval-Print-Loop”
Dr. Chris Musselle – Consultant The Vision “We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as MATLAB, as good at gluing programs together as the shell. … something that provides the distributed power of Hadoop - without the kilobytes of boilerplate Java and XML” --- Julia’s Authors Source:
Dr. Chris Musselle – Consultant Too Good to be True? Scientific computing, though requiring high performance, have shifted to use dynamic languages. More productive. Human time for expensive than CPU time. Many advancements in compiler techniques and language design over the years e.g. JIT. Can now greatly mitigate the performance trade-off associated with a dynamic language. But has required building from the ground up.
Dr. Chris Musselle – Consultant So How Fast is Fast? Source:
Dr. Chris Musselle – Consultant Where’s Julia at now? Standard Library Core Syntax, Collections and Data Structures Linear Algebra, BLAS, Sparse Matrices Package Manager Graphics Unit and Functional Testing Profiling External Packages Total of 384 external packages written by 138 primary authors.
Dr. Chris Musselle – Consultant Who Uses it? JuliaLang – The Core language JuliaStats – Statistics JuliaOpt – Numerical Optimization Library JuliaSparse – Sparse Matrix Solvers JuliaDiff – Differentiation Tools JuliaWeb – Web stack tools JuliaGPU – GPU computing JuliaQuant – Financial Analysis Libraries JuliaAstro / JuliaQuantum – Astronomy/Physics/Chemistry
Dr. Chris Musselle – Consultant When to Use it? Julia allows fast prototyping of code, that is also fast to execute. Best used to code up bespoke algorithms. Julia ecosystem is in its infancy, majority of packages focus on numerical computation. May need to re-implement ‘tools’ from scratch e.g. parsers / data structures / algorithms etc.
Dr. Chris Musselle – Consultant Julia and R? Calling R from Julia: Calling Julia from R: System calls – New session each time
Dr. Chris Musselle – Consultant Case Study: String Similarity (Edit Distance) The number of “edit” operations between two strings where an edit is: An insertion A deletion A substitution E.g. Edits between sitting and Kitten Substitute “ s ” for “ k ” at position 1 Substitute “ i ” for “ e ” at position 5 Insert “ g ” at position 6
Dr. Chris Musselle – Consultant Case Study: String Similarity (Edit Distance) This particular formulation is known as the Levenshtein Distance. Used the optimised “dynamic programing” approach. Pseudocode available at Applications Spell checking Computational Biology Natural Language Processing Speech Recognition
Dr. Chris Musselle – Consultant Case Study: String Similarity (Edit Distance) Compared 5 different approaches: R_lev - Written purely in R. R_adist - Using the built in adist function in R Julia – Written purely in Julia Python_np_lev – Written in Python (using numpy) Python_c_lev – Python wrapper to a C function
Dr. Chris Musselle – Consultant Results
Dr. Chris Musselle – Consultant Results (minus R lev)
Dr. Chris Musselle – Consultant Key Results Pure R implementation was over 10 times slower that adist and Python and 33 time slower than Julia. Found Julia 2.5 to 3 times faster than Python and R Reading line by line <<< Reading in all at once Python + numpy ~ R’s built in adist
Dr. Chris Musselle – Consultant Summary Julia – Certainly has great potential Strengths – numerical computation in a dynamic “REPL” language with clean syntax Weakness’s – Playing catch-up with tools and libraries. Early days for integration with other languages. Julia Other language good though. Don’t prototype your next algorithm in R if speed matters! Found Julia 2.5 to 3 times faster than Python and R
Dr. Chris Musselle – Consultant Thank You For Your Attention Any Questions? - julialang.org Calling R from Julia: Calling Julia from R: Edit distance:
Dr. Chris Musselle – Consultant What’s Next? Accepted GSoC projects 2014 Libgit2 support Linear algebra for generic types Julia + Light Table – IDE development IJulia Interactive Widgets 3D Visualization Package for Julia