Industrial Project (234313) Final Presentation “App Analyzer” Deliver the right apps users want! (VMware) Students: Edward Khachatryan & Elina Zharikov Supervisors: Yoel Calderon, Yan Aksenfeld
The Problem IT administrator doesn’t know which applications need to be managed Apps not installed by Mirage User profile User data Machine identity Drivers Base layer Network Optimized Synchronization & Streaming Application layer(s) Mirage Servers & Single Instance Stores
Goals Find the optimal combination of Base and App layers for a given organization Produce reports for the administrator HR Desktops IT Desktops Finance Apps HR Apps IT Apps Finance Desktops Single Base Layer Windows 7 Antivirus Common Apps
Methodology Research clustering algorithms Connect to Mirage Database on SQL Server Parse UTF encoded XML data Process and analyze the data Build custom reports
Methodology Research and choose the right set of tools ◦ Python libraries: scikit-learn for clustering algorithms lxml for parsing UTF encoded XML SQLAlchemy for SQL interaction pandas for gluing it all together ◦ Microsoft SQL Report Builder for custom reports ◦ VMWare Mirage web interface for GUI
Achievements Quick and efficient data analysis: the desired results can be generated in just a few minutes User friendly experience: a variety of reports can be produced in a matter of few clicks Integration with the existing VMWare Mirage platform A variety of parameters to customize the output
Examples
Examples
Examples
Examples
Examples
Examples Live demonstration…
Conclusions DBSCAN is a fast clustering algorithm. It’s scalable for large datasets and works well with Boolean vectors data. Instead of the usual Euclidian distance, it’s better to work with metrics intended for boolean-valued vector spaces, such as Jaccard, Sokal-Sneath or Dice. Using open source libraries saves a lot of valuable time. Microsoft SQL Report Builder is a great WYSIWYG tool for building custom reports
Progress Recap 31.3 – Kickoff Meeting – Research period: reading materials on clustering algorithms – Installing Microsoft SQL Server, restoring a VMWare Mirage database, querying and parsing the data from the database – Creating a filtering module to clean up the raw application list: uniting applications by their name, product ID or upgrade code, filtering out unimportant applications. Finalizing the criteria for Base Layer apps.
Progress Recap – Focusing on 4 clustering algorithms (K-Means, Agglomerative, DBSCAN, Birch), testing various parameters and metrics on different databases – Midway meeting – Continuing the aforementioned tests, focusing strictly on DBSCAN – Setting up and configuring a virtual machine running Windows Server with VMWare Mirage and Microsoft SQL Server Reporting Services.
Progress Recap ◦ Learning to use Microsoft SSRS, the Report Builder tool and Mirage web interface. ◦ Moving the Python IDE and SQL databases to the virtual machine. ◦ Actually exporting our results to SQL instead of CSV and text files. ◦ Building a sample report – Building custom reports according to the given guidelines – Improving reports’ appearance, fixing bugs, parameterizing the Python code.