Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding cacheable areas in your Web Site using Python and Selenium David Elfi Intel.

Similar presentations


Presentation on theme: "Finding cacheable areas in your Web Site using Python and Selenium David Elfi Intel."— Presentation transcript:

1 Finding cacheable areas in your Web Site using Python and Selenium David Elfi Intel

2 What does this session talk about?  Python  Performance  Web applications  Hands on session

3 Caching  Hot topic in web applications because -Better response time across geo distribution -Better scalability  Difficult to focus at development time  Help developers to improve response time

4 Source: Steve Souders – Cache is King!Cache is King!

5 What to do  Find text areas repeated in a web resource (page, json response, other dynamic resources) in order to split them in different responses  Use Cache-Control, Expires and ETag HTTP Headers for caching control  Identify all the dependencies for a given URL -Even AJAX calls

6 Proposed Solution  Take snapshots in different points in time -Use selenium for: -Download ALL the content -Needs to run JS code for Ajax  Compare the snapshots looking for similarities -Split the similar text in different HTTP responses

7 Solution – Snapshots  Selenium through a forward proxy Proxy Twisted Data Web Server Store Content

8 Running Selenium – Snapshots  Call Selenium from Python  Use of WebDriver >>> from selenium import webdriver >>> >>> br = webdriver.Firefox() >>> >>> br.get(“http://www.intel.com”) >>> >>> br.close()

9 Twisted Proxy -Snapshots class CacheProxyClient(proxy.ProxyClient): def connectionMade(self): # Connection Made. Prepare object properties def handleHeader(self, key, value): # Save response header. def handleResponsePart(self, buf): # Store response data. def handleResponseEnd(self): # Finished response transmission. Store it class CacheProxyClientFactory(proxy.ProxyClientFactory): protocol = CacheProxyClient class CacheProxyRequest(proxy.ProxyRequest): protocols = dict(http=CacheProxyClientFactory) class CacheProxy(proxy.Proxy): requestFactory = CacheProxyRequest class CacheProxyFactory(http.HTTPFactory): protocol = CacheProxy

10 Selenium + Twisted - Snapshots  Run Selenium using Proxy >>> from selenium import webdriver >>> fp = webdriver.FirefoxProfile() >>> fp.set_preference("network.proxy.type", 1) >>> fp.set_preference("network.proxy.http", "localhost") >>> fp.set_preference("network.proxy.http_port", 8080) >>> br = webdriver.Firefox(firefox_profile=fp)

11 Selenium + Twisted - Snapshots  Configure Twisted and run Selenium in an internal Twisted thread from twisted.internet import endpoints, reactor endpoint = endpoints.serverFromString(reactor, "tcp:%d:interface=%s" % (8080, "localhost")) d = endpoint.listen(CacheProxyFactory()) reactor.callInThread( runSelenium, url_str) reactor.run()

12 All together running

13 1 1 n n 3 3 2 2 = 1 = 2 = n Comparison method Output

14 Comparison ''' Equal sequence searcher ''' def matchingString(s1, s2): '''Compare 2 sequence of strings and return the matching sequences concatenated''' from difflib import SequenceMatcher matcher = SequenceMatcher(None, s1, s2) output = "" for (i,_,n) in matcher.get_matching_blocks(): output += s1[i:i+n] return output def matchingStringSequence( seq ): ''' Compare between pairs up to final result ''' try: matching = seq[0] for s in seq[1:len(seq)]: matching = matchingString(matching, s) return matching except TypeError: return ""

15 Next Steps  Split similar texts in different HTTP responses  Set Cache-Control -Public -Private -No-cache  Set Expires -Depending on the time it should be cache  Set ETag -If response is big and does change too often

16 Advanced Features to be done  Detect cache invalidation time from snapshots  SSL supports  Wait for all AJAX calls  Selenium Scripting -Authenticated URLs -Full feature sequence

17 Summary  If caching areas has not been identified previous to development, this code could save time and effort in doing so  Caching areas need to be analyzed for looking best cache method (server cache, CDN, browser caching)  Refactoring for maximizing caching data is the next step

18

19 Thank you! david.r.elfi@intel.com @elfoTech


Download ppt "Finding cacheable areas in your Web Site using Python and Selenium David Elfi Intel."

Similar presentations


Ads by Google