Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin.

Similar presentations


Presentation on theme: "Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin."— Presentation transcript:

1 Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin

2 Who am I? Engineer at Goldman Sachs Technical lead for GS Collections – A replacement for the Java Collections Framework

3 Goals Scala programs ought to perform as well as Java programs, but Scala is a little slower

4 Goals From RedBlackTree.scala /* * Forcing direct fields access using the @inline annotation * helps speed up various operations (especially * smallest/greatest and update/delete). * * Unfortunately the direct field access is not guaranteed to * work (but works on the current implementation of the Scala * compiler). * * An alternative is to implement the these classes using plain * old Java code... */

5 Goals Scala programs ought to perform as well as Java programs, but Scala is a little slower Highlight a few performance problems that matter to me Suggest fixes, borrowing ideas and technology from GS Collections

6 Goals GS Collections and Scala’s Collections are similar Mutable and immutable interfaces with common parent Similar iteration patterns at hierarchy root Traversable and RichIterable Lazy evaluation ( view, asLazy ) Parallel-lazy evaluation ( par, asParallel )

7 Agenda GS Collections and Scala’s Collections are different Persistent data structures Hash tables Primitive specialization Fork-join

8 Persistent Data Structures

9 Mutable collections are similar in GS Collections and Scala – though we’ll look at some differences Immutable collections are quite different Scala’s immutable collections are persistent GS Collections’ immutable collections are not

10 Persistent Data Structures Wikipedia: In computing, a persistent data structure is a data structure that always preserves the previous version of itself when it is modified.

11 Persistent Data Structures Wikipedia: In computing, a persistent data structure is a data structure that always preserves the previous version of itself when it is modified. Operations yield new updated structures when they are “modified”

12 Persistent Data Structures Typical examples: List and Sorted Set

13 Persistent Data Structures Typical examples: List and Sorted Set “Add” by constructing O(log n) new nodes from the leaf to a new root. Scala sorted set  binary tree GSC immutable sorted set  covered later

14 Persistent Data Structures Wikipedia:

15 Persistent Data Structures Typical examples: List and Sorted Set “Prepend” by constructing a new head in O(1) time. LIFO structure. Scala immutable list  linked list or stack GSC immutable list  array backed

16 Persistent Data Structures Extremely important in purely functional languages All collections are immutable Must have good runtime complexity No one seems to miss them in GS Collections

17 Persistent Data Structures Proposal: Mutable, Persistent Immutable, and Plain Immutable Mutable: Same as always Persistent: Use when you want structural sharing (Plain) Immutable: Use when you’re done mutating or when the data never mutates

18 Persistent Data Structures Plain old immutable data structures Not persistent (no structural sharing) “Adding” is much slower: O(n) Speed benefits for everything else Huge memory benefits

19 Persistent Data Structures Immutable array-backed list Immutable  trimmed array No nodes  1/6 memory Array backed  cache locality, should parallelize well

20 Persistent Data Structures Measured with Java 8 64-bit and compressed oops

21 Persistent Data Structures Performance assumptions: Iterating through an array should be faster than iterating through a linked list Linked lists won’t parallelize well with.par No surprises in the results – so we’ll skip github.com/goldmansachs/gs-collections/tree/master/jmh-tests github.com/goldmansachs/gs-collections/tree/master/memory-tests

22 Persistent Data Structures Immutable array-backed sorted set Immutable  trimmed, sorted array No nodes  ~1/8 memory Array backed  cache locality

23 Persistent Data Structures

24 Assumptions about contains: May be faster when it’s a binary search in an array (good cache locality) Will be about the same in mutable / immutable tree (all nodes have same structure)

25 Persistent Data Structures

26 Iteration

27 Persistent Data Structures val scalaImmutable: immutable.TreeSet[Int] = immutable.TreeSet.empty[Int] ++ (0 to SIZE) def serial_immutable_scala(): Unit = { val count: Int = this.scalaImmutable.view.filter(each => each % 10000 != 0).map(String.valueOf).map(Integer.valueOf).count(each => (each + 1) % 10000 != 0) if (count != 999800) { throw new AssertionError } }

28 Persistent Data Structures val scalaImmutable: immutable.TreeSet[Int] = immutable.TreeSet.empty[Int] ++ (0 to SIZE) def serial_immutable_scala(): Unit = { val count: Int = this.scalaImmutable.view.filter(each => each % 10000 != 0).map(String.valueOf).map(Integer.valueOf).count(each => (each + 1) % 10000 != 0) if (count != 999800) { throw new AssertionError } } Lazy evaluation so we don’t create intermediate collections

29 Persistent Data Structures val scalaImmutable: immutable.TreeSet[Int] = immutable.TreeSet.empty[Int] ++ (0 to SIZE) def serial_immutable_scala(): Unit = { val count: Int = this.scalaImmutable.view.filter(each => each % 10000 != 0).map(String.valueOf).map(Integer.valueOf).count(each => (each + 1) % 10000 != 0) if (count != 999800) { throw new AssertionError } } Lazy evaluation so we don’t create intermediate collections Terminating in.toList or.toBuffer would be dominated by collection creation

30 Persistent Data Structures private final ImmutableSortedSet gscImmutable = SortedSets.immutable.withAll(Interval.zeroTo(SIZE)); @Benchmark public void serial_immutable_gsc() { int count = this.gscImmutable.asLazy().select(each -> each % 10_000 != 0).collect(String::valueOf).collect(Integer::valueOf).count(each -> (each + 1) % 10_000 != 0); if (count != 999_800) { throw new AssertionError(); } }

31 Persistent Data Structures Assumption: Iterating over an array is somewhat faster

32 Persistent Data Structures

33 Next up, parallel-lazy evaluation this.scalaImmutable.view  this.scalaImmutable.par this.gscImmutable.asLazy()  this.gscImmutable.asParallel(this.executorService, BATCH_SIZE)

34 Persistent Data Structures Assumption: Scala’s tree should parallelize moderately well Assumption: GSC’s array should parallelize very well

35 Persistent Data Structures Measured on an 8 core Linux VM Intel Xeon E5-2697 v2 8x

36 Persistent Data Structures Scala’s immutable.TreeSet doesn’t override.par, so parallel is slower than serial Some tree operations (like filter ) are hard to parallelize TreeSet.count() should be easy to parallelize using fork/join with some work New assumption: Mutable trees won’t parallelize well either

37 Persistent Data Structures Measured on an 8 core Linux VM Intel Xeon E5-2697 v2 8x

38 Persistent Data Structures GS Collections follow up: – TreeSortedSet delegates to java.util.TreeSet. Write our own tree and override.asParallel() Scala follow up: – Override.par

39 Persistent Data Structures Proposal: Mutable, Persistent, and Immutable All three are useful in different cases How common is it to share structure, really?

40 Hash Tables

41 Scala’s immutable.HashMap is a hash array mapped trie (persistent structure) Wikipedia: “achieves almost hash table-like speed while using memory much more economically” Let’s see

42 Hash Tables Scala’s mutable.HashMap is backed by an array of Entry pairs java.util.HashMap.Entry caches the hashcode UnifiedMap is backed by Object[], flattened, alternating K and V ImmutableUnifiedMap is backed by UnifiedMap

43 Hash Tables

44 @Benchmark public void get() { int localSize = this.size; String[] localElements = this.elements; Map localScalaMap = this.scalaMap; for (int i = 0; i < localSize; i++) { if (!localScalaMap.get(localElements[i]).isDefined()) { throw new AssertionError(i); } } }

45 Hash Tables

46

47 @Benchmark public mutable.HashMap mutableScalaPut() { int localSize = this.size; String[] localElements = this.elements; mutable.HashMap map = new PresizableHashMap<>(localSize); for (int i = 0; i < localSize; i++) { map.put(localElements[i], "dummy"); } return map; }

48 Hash Tables HashMap ought to have a constructor for pre-sizing class PresizableHashMap[K, V](val _initialSize: Int) extends scala.collection.mutable.HashMap[K, V] { private def initialCapacity = if (_initialSize == 0) 1 else smallestPowerOfTwoGreaterThan( (_initialSize.toLong * 1000 / _loadFactor).asInstanceOf[Int]) private def smallestPowerOfTwoGreaterThan(n: Int): Int = if (n > 1) Integer.highestOneBit(n - 1) << 1 else 1 table = new Array(initialCapacity) threshold = ((initialCapacity.toLong * _loadFactor) / 1000).toInt }

49 Hash Tables scala.collection.mutable.HashSet is backed by an array using open-addressing java.util.HashSet is implemented by delegating to a HashMap. UnifiedSet is backed by Object[], either elements or arrays of collisions ImmutableUnifiedSet is backed by UnifiedSet

50 Hash Tables

51 Primitive Specialization

52 Boxing is expensive – Reference + Header + alignment Scala has specialization, but none of the collections are specialized If you cannot afford wrappers you can: – Use primitive arrays (filter, map, etc. will auto-box) – Use a Java Collections library

53 Primitive Specialization

54 Proposal: Primitive lists, sets, and maps in Scala Not Traversable – outside the collections hierarchy Fix Specialization on Functions (lambdas) – Want for-comprehensions to work well

55 Fork/Join

56 Parallel / Lazy / Scala val list = this.integers.par.filter(each => each % 10000 != 0).map(String.valueOf).map(Integer.valueOf).filter(each => (each + 1) % 10000 != 0).toBuffer Assert.assertEquals(999800, list.size)

57 Parallel / Lazy / GSC MutableList list = this.integersGSC.asParallel(this.executorService, BATCH_SIZE).select(each -> each % 10_000 != 0).collect(String::valueOf).collect(Integer::valueOf).select(each -> (each + 1) % 10_000 != 0).toList(); Verify.assertSize(999_800, list);

58 Parallel / Lazy / JDK List list = this.integersJDK.parallelStream().filter(each -> each % 10_000 != 0).map(String::valueOf).map(Integer::valueOf).filter(each -> (each + 1) % 10_000 != 0).collect(Collectors.toList()); Verify.assertSize(999_800, list);

59 Stacked computation ops/s (higher is better) 8x

60 Fork-Join Merge Intermediate results are merged in a tree Merging is O(n log n) work and garbage

61 Fork-Join Merge Amount of work done by last thread is O(n)

62 Parallel / Lazy / GSC MutableList list = this.integersGSC.asParallel(this.executorService, BATCH_SIZE).select(each -> each % 10_000 != 0).collect(String::valueOf).collect(Integer::valueOf).select(each -> (each + 1) % 10_000 != 0).toList(); Verify.assertSize(999_800, list); ParallelIterable.toList() returns a CompositeFastList, a List with O(1) implementation of addAll()

63 Parallel / Lazy / GSC public final class CompositeFastList { private final FastList > lists = FastList.newList(); public boolean addAll(Collection collection) { FastList collectionToAdd = collection instanceof FastList ? (FastList ) collection : new FastList (collection); this.lists.add(collectionToAdd); return true; }... }

64 CompositeFastList Merge Merging is O(1) work per batch CFL

65 Fork/Join Fork-join is general purpose but always requires merge work We can get better performance through specialized data structures meant for combining

66 Summary GS Collections and Scala’s Collections are different Persistent data structures Hash tables Primitive specialization Fork-join

67 More Information @motlin @GoldmanSachs stackoverflow.com/questions/tagged/gs-collections github.com/goldmansachs/gs-collections/tree/master/jmh-tests github.com/goldmansachs/gs-collections/tree/master/memory-tests github.com/goldmansachs/gs-collections-kata infoq.com/presentations/java-streams-scala-parallel-collections

68 Learn more at GS.com/Engineering © 2015 Goldman Sachs. This presentation should not be relied upon or considered investment advice. Goldman Sachs does not warrant or guarantee to anyone the accuracy, completeness or efficacy of this presentation, and recipients should not rely on it except at their own risk. This presentation may not be forwarded or disclosed except with this disclaimer intact.

69 Q&A


Download ppt "Scala Collections Performance GS.com/Engineering March, 2015 Craig Motlin."

Similar presentations


Ads by Google