uPortal Performance & Memory Issues Scott Battaglia Rutgers, the State University of New Jersey
Description of Problem Amount of memory consumed by uPortal grows consistently Continues to consume memory until there is no memory left Application stops working properly and hangs Consistent with definition of a memory leak
Background Launched myRutgers on uPortal 2.3 Issue was not seen in our QA Seeing issue in production since November 2004
Background Also seen in production by: Yale University University of Louisiana at Lafayette University of California at Irvine Cornell University
Temporary Workaround Monitor memory usage of uPortal When memory drops below 5% bounce JVM.
Issues with Workaround May be too aggressive In some cases, JVM may be able to garbage collect Causes users on that JVM to lose their session If miss window of opportunity to restart, can take down Apache also
Issues with Workaround Ultimately, does nothing to resolve memory issue. Just makes it barely livable
History of Fixes Removed caching of IPersons from PersonDirectory CError and CSecureInfo now pass events to wrapped channels. Restrict access to ChannelFactory’s channel cache, synchronized instantiateChannel method. Guest sessions created on time out AbstractMultithreadedChannels were not cleaning out their channel state maps (2 of them).
But…. 3 Months later, issue still exists. Previous steps solved memory leaks but still more exist. The search continues…
What’s Happening Today Renewed effort to search for memory leaks Initial Steps taken: Retooling of Load Tests Production Snapshots Incremental Updates Re-affirming that loadtest system matches production system
Retooling of Load Tests Attempt to mimic more closely what a user does in production. More custom layouts Less people logging out Hitting more popular channels more aggressively
Retooling of Load Tests Attempt to accomplish same throughput Determine average user session length Determine rate at which users access system
Retooling of Load Tests Bought test system with same specs/setup as production systems Ensure database optimizations are the same Ensure uPortal configuration is the same (i.e. StatsRecorder)
Production Snapshots Only seeing issue in production Need to capture production snapshots JVM Heap Size initially set at 2 GB
Production Snapshots Lowered JVM Heap Size to 128 MB on machine Allows us to compare snapshots When memory reaches 10% take it out of load balancing rotation Garbage Collect
Production Snapshots Capture snapshot Wait past session timeout Currently set at 15 minutes Garbage Collect again Take new snapshot Analyze Snapshot
Production Snapshots What do they tell us? They help us determine what objects are still in memory Tells us how much memory they are using Tells us how much memory items they reference are using
Understanding the Snapshots Use YourKit Java Profiler to capture memory snapshots YourKit consists of two parts: Component that runs on server Local application to open memory snapshots
Understanding the Snapshots YourKit tells us: Reports incoming and outgoing references Totals for objects of each type How much memory they consume Allows us to compare snapshots, showing the deltas of each object type. uPortal community has about 20 licenses for YourKit
Understanding the Snapshots Name Objects Shallow Size Retained Size
Understanding the Snapshots Trace the path to the root of the Garbage Collector Option of seeing first path or multiple paths In screenshot, we see first five
Understanding the Snapshots Example of object from “Retained Size” Only reason this object still exists is because XRTreeFrag has not been GCed.
Understanding the Snapshots Comparison of two snapshots (users vs. no users) See that XRTreeFrag retains number of objects
Understanding the Snapshots Also comparison of (users vs. no users) See that UserInstance gets garbage collected, as does ChannelStaticData, etc.
Incremental Updates In order to determine the impact of changes to the uPortal framework, we’ve adopted an incremental update approach. We apply one “fix” at a time, and monitor its impact.
Incremental Updates Currently in production… Threadpool switch from homegrown to Backport Concurrent Finalizer in UBC_Webmail In the queue… Update to AuthorizationImpl
What’s Happening Today Recently, flurry of activity on JASIG- DEV list about memory issues. Backport Concurrent Threadpool AuthorizationImpl Finalizers in UBC_Webmail
What’s Happening Today Backport Concurrent Thread Library Issues with current threadpool Potential for deadlock or infinite loop Potential for cleanup to fail in thread workers UnboundedThreadpool that extends BoundedThreadpool
What’s Happening Today Backport Concurrent Thread Library (cont) Action Item Aaron wrote patch against HEAD to replace thread library Rutgers manually applied patch to and placed into production. Result: Undetermined: Most students were on Spring Break Preliminary results indicate may offer performance benefit rather than memory leak fix
What’s Happening Today AuthorizationImpl Current Issues Retaining references to principals No explicit removal of principal from cache Copying of map on each newPrincipal call that results in a new principal
What’s Happening Today AuthorizationImpl Action Item Rutgers volunteered to provide fix for HEAD Fix consists of replacing current newPrincipal method and replacing HashMap with a cache Patch is scheduled to be loadtested and placed into production Patch is scheduled to be committed to uPortal HEAD on successful test and deployment
What’s Happening Today AuthorizationImpl Consequences of Changes Introduced a CacheFactory Not specific to any one part of uPortal CacheFactory is interface (plug your own in!) Default CacheFactory using WhirlyCache Allows for declaring cache settings and policy in XML Allows for fine-grained caching strategies for each part of uPortal
What’s Happening Today UBC_Webmail Issue Finalizers are not properly cleaning up Action Item Rutgers has volunteered to refactor Finalizers
Continuing the Search… Rutgers, and other members of the uPortal community continue to search for the answer to the memory leaks
What can we do to help? Finalizer should be a last resort If a viable open source project exists that fills the requirements, consider using that Be aware of proper caching (where its needed vs. where its not needed, weak & soft references, etc.) Avoid circular references wherever possible
The End (finally!) Any questions, comments, concerns?