Presentation is loading. Please wait.

Presentation is loading. Please wait.

OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research.

Similar presentations


Presentation on theme: "OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research."— Presentation transcript:

1 OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research

2 Goal Demonstrate 100 searches/second on our 50 million record WorldCat database residing on a small Beowulf Cluster

3 Beowulf Cluster 24 nodes –2 2.8GHtz Xeon CPUs –4 GB of memory 80 GB of disk on 23 application nodes 130 GB of disk on root node

4 Database 50 million records 69 partitions (~700,000 records) –3 partitions per application node Partitioned by popularity Searched using OCLC Researchs Open Source Gwen and Pears toolkits

5 Architecture 1 Tomcat on each application node 3 SRW/U databases configured for each Tomcat 1 client application on the root node

6 Trial #1 SRW client searching 69 databases Result: 2 searches/second (437ms/search) Ganglia Cluster Report shows the root node glowing red and the application nodes a peaceful blue

7 Trial #2 SRU client with scanned response searching 69 databases Result: 25 searches/second (40ms/search) Ganglia Cluster Report still shows the root node glowing red and the application nodes a peaceful blue

8 Trial #3 SRW client with hand built XML and scanned response searching 69 databases Result: 21 searches/second (46ms/search) Ganglia Cluster Report still shows the root node glowing red and the application nodes a peaceful blue SRW dropped

9 Rearchitecture Problem: Ganglia Reports indicate that the client is the bottleneck Solution: Put a 3-way federator on each Tomcat (a virtual database for the client) and have the client search 23 databases instead of 69

10 Result SRU client: 71 searches/second (14 ms) Hand-built SRW client: 33 searches/second (30ms) Original SRW client: 6 searches/second(164) Ganglia cluster report still shows root node red, but application nodes are now green and yellow

11 Rearchitecture Create a virtual 23-way database on each Tomcat that will federate searches from the 23 virtual 3-way databases Put one of these on each Tomcat Create a new client that sends searches on threads to each available 23-way database

12 Result With 23 threads, 172 searches/second –Average response time of 170ms The Ganglia report showed all nodes running red


Download ppt "OCLC Online Computer Library Center Parallel Text Searching on a Beowulf Cluster using SRW Ralph LeVan OCLC Research."

Similar presentations


Ads by Google