Presentation is loading. Please wait.

Presentation is loading. Please wait.

Connected Components in Software Networks Miloš Savić, Mirjana Ivanović, Miloš Radovanović Department of Mathematics and Informatics Faculty of Science.

Similar presentations


Presentation on theme: "Connected Components in Software Networks Miloš Savić, Mirjana Ivanović, Miloš Radovanović Department of Mathematics and Informatics Faculty of Science."— Presentation transcript:

1 Connected Components in Software Networks Miloš Savić, Mirjana Ivanović, Miloš Radovanović Department of Mathematics and Informatics Faculty of Science University of Novi Sad

2 Content Introduction Data collection Experiments and results Conclusions

3 Introduction - software networks - Two levels of software complexity: - internal complexity of software entities (classes, functions...) - structural complexity of dependencies between entities Class collaboration networks: nodes: classes/interfaces links: OO relationships Static call graphs: nodes: functions/procedures links: call-return relationships

4 Introduction - connected components - Connected component: set of mutually reachable nodes Giant connected component: contains the vast majority of nodes Directed networks: strongly connected components weakly connected components

5 Introduction - theory of complex networks - Random graphs: - Poisson degree distribution - ER model (static + uniform attachment) Scale-free networks: - power-law degree distribution - BA model (growth + preferential attachment) Exponential networks: - exponential degree distribution - Model A (growth + uniform attachment)

6 Introduction - motivations - Model A: test complementary cummulative in/out/total degree distributions of giant weakly connected components againts a power-law and an exponential distribution “robust yet fragile”: investigate topological stability of giant weakly connected components “hierarchical small-worlds, scale-free networks from optimal design”: determine size of strongly connected components

7 Data collection Class collaboration networks: - Ant, Tomcat, Lucene, JavaCC, JDK - extractor – Yaccne Static call graphs: - gcc, kernel component of Linux kernel - extractor – Doxygen + our.dot aggregator

8 Experiments and results - giant weakly connected components - Networklwcc-size [%]wcc2-size [%] Linux93.570.29 Gcc99.670.33 Jdk98.400.05 Ant97.810.25 Tomcat94.170.86 Lucene96.890.56 Javacc97.461.26 Comparable networks sampled by ER, BA and Model A contain GWCC.

9 Experiments and results - degree distribution of GWCCs -

10 Experiments and results - Implications - Theoretical implications: model that can reproduce connectivity pattern characteristic to software systems Related to software engineering: in-degree = degree of class/function reuse out-degree = degree of class/function aggregation

11 Experiments and results - theoretical implications - Superposition model (growth + preferential attachment for out-going links + uniform attachment for in-coming links)

12 Experiments and results - Analytical solution of the superposition model - Continuum approach: “Mean field theory for scale-free random networks”, (Barabási et al, ’99) D in /D out – number of in-coming/out-going links introduced by each node

13 Experiments and results - Implications related to SE - First combinatorial principle of graph theory: Avg(reuse) = Avg(aggregation) But: Dispersion(reuse)  ∞ as N  ∞ Dispersion(aggregation) ~ Avg(aggregation) 2 Conslusions: 1. Software systems exhibit a characteristic scale of code aggregation, but there is no characteristic scale of code reuse. 2. Highly reused entities tend to be more reused. 3. Predictability of code reuse and unpredictability of code aggregation as software system evolve.

14 Experiments and results - Topological stability of GWCCs - Experiments: - removal of one node: to check the existence of articulation points - successive removal of preferential nodes: to check the fragility - successive removal of nodes at random: to check the robustness After each removal, size of the largest weakly connected component is measured fc-pref/fc-rnd: critical fraction of nodes that needed to be removed in order to destroy giant weakly connected component when preferential/random node removal scheme is applied

15 Experiments and results - Articulation points - Software networks contain APs: [2.91% - 15.50%] of network size BA model: D total – number of links introduced by each node D total = 1  num(AP) in the range [31% - 35.4%] D total > 1  num(AP) = 0 BAU model: - D total is not constant value but random variable such that P{D total = 1} > 0 - Modification does not affect scale-free properties of degree distributions and produces APs

16 Experiments and results - preferential node removal - Software networks are extremely vulnerable: f c (software network) < f c (BAU) < f c (EXP) < f c (RND)

17 Experiments and results - random node removal- Software networks (except Linux) never lose GWCCs The same situation is for comparable networks generated by theoretical models Linux static call graphs is scale-free, random errors sensitive network: f c (Linux) < f c (RND) < f c (EXP) < f c (BAU) Large real-world networks: f c (RND) < f c (rw-net)

18 Experiments and results - strongly connected components - Linux: SCCs as a minor effect Other networks: no GSCC, but have relatively large SCCs topological sort cannot be made  there is no elegant systematic testing strategy NetworkNUMMAX1[%]MAX2[%]MAX3[%] Linux20.0886330.044316- GCC812.622420.6528840.435256 JDK4819.805196.9264072.32684 Ant2118.791061.8396851.576873 Tomcat337.9187822.0304571.725888 Lucene813.994171.1661810.874636 JavaCC110.38961--

19 Largest strongly connected component in GCC’s giant weakly connected component containing 116 mutually reachable nodes

20 Conclusions Out-degree sequences of software networks can be better modeled with an exponential distribution than a power-law Scale-free software networks contain articulation points Software networks are extremely vulnerable to the removal of highest degree nodes, and (except Linux) share the same level of robustness as comparable networks generated by theoretical models

21 Conclusions Linux static call graph is an interesting and intriguing example of a scale-free network which does not display tolerance against random errors Software networks contain relatively large cyclic dependencies - substructures that does not reflect optimal design and hierarchical small-worldliness

22 Connected Components in Software Networks Miloš Savić, Mirjana Ivanović, Miloš Radovanović Department of Mathematics and Informatics Faculty of Science University of Novi Sad


Download ppt "Connected Components in Software Networks Miloš Savić, Mirjana Ivanović, Miloš Radovanović Department of Mathematics and Informatics Faculty of Science."

Similar presentations


Ads by Google