Presentation is loading. Please wait.

Presentation is loading. Please wait.

Collage Score Card & Software defect prediction

Similar presentations


Presentation on theme: "Collage Score Card & Software defect prediction"— Presentation transcript:

1 Collage Score Card & Software defect prediction
Prepared by: Meetkumar Patel Srivats Srinivasan GUIDANCE BY: Prof. Meiliu Lu

2 Agenda Data Warehousing Project Data Mining Project
Background Introduction Technologies Explored Implementation Steps Future scope Data Mining Project Objective Algorithm Applied Demo Learning Experience References

3 Background Source website : , Two datasets : Collage Scorecard Software Defect Prediction dataset Collage Scorecard dataset : Data from 17 attributes,37835 entries Software Defect Prediction dataset: 22 attributes,1100 entries

4 Introduction The primary objective of our project is to design data mart. We have used Star schema to generate it. This data mart answers questions related to US universities. The primary users of the Data Mart would be High School Students.

5 Technologies Explored
Data Preprocessing Microsoft Excel Spreadsheet MySQL Server Data Mart MsSQL Server Java OLAP Operations SQL Server Queries

6 Implementation Steps Data Cleaning and Preprocessing Data Mart
Querying Tool

7 Data Cleaning and Preprocessing
Original data had 80,000 rows and 1700 columns, we trimmed data to rows and 17 related columns. Add missing values using SQL Script. Since 5 years data are there we added year column for segregation.

8 Data Mart Data mart is implemented on star schema base
Data Mart provided following information to user University details on basis of below attributes University ID Programs Type of Degree SAT & AWT scores Region State

9 Highest Degree Degree_ID Degree_Name State State_ID State_Name Fact Table University_ID State_ID Degree_ID PDegree_ID Region_ID Program_ID Scores University University_ID University_name Zip Website Predominant Degree PDegree_ID PDegreee_Name Region Region_ID Region_Name Program Program_ID Program_Name Star Schema

10

11 Future Scope Privileged user can insert new records in future
Integrate Google Maps for location and directions Develop Web based and Mobile based environment.

12 Objective Mining data to extract knowledge from available data.
Analyze the behavior of different data mining tools. This project focus on the high-performance fault/error predictors based on data mining technique such as Random Forests and the algorithms based on a new computational intelligence approach.

13 Data Mining Tools Used Classification Algorithm Weka Rapid Miner J48
Random Tree Logistic

14 Data Mining We will use attributes like cyclomatic complexity, essential complexity, design complexity, total number of operators, total no. of operands, volume, program length, difficulty, intelligence , effort , line count etc. Mining these attributes to study how they affect the quality of software to be produced. The final result using these attributes is to predict if its a defect or not. {true, false}.

15 Data Mining Pre- processing data –
The collected data were noisy, missing useful info and inconsistent. First step was the Data preparation processes that consist of checking the data distribution and outliers, dealing with empty or missing values, enriching data, and transforming data into analyzable formats should be employed to improve data quality and to thus enable effective data mining.

16 Data Mining Algorithm Implementation
Firstly, the algorithm is implemented in WEKA to gain the “Root Mean square error” and then used the Rapid Miner to obtain the graphical output. The lesser the “Root Mean square error” the efficient the algorithm is with the particular data set.

17 Data Mining (WEKA) J48

18 Data Mining (WEKA) Naïve Bayesian

19 Data Mining (WEKA) Random Tree

20 Data Mining (Rapid Miner)

21 Data Mining

22 Learning Experience Analytical processing
Learned different data mining tools like Weka, rapid Miner Learned about real time application for different data mining algorithms

23 DEMO

24 Conclusion Weka predicted the “Root Mean Square error” on basis of which few algorithms were shortlisted. But, Weka wasn’t able to show the graphical representation sound and clear. So, Rapid Miner came into consideration through which we were able to simplify the graphs and able predict the probability of defect with ease.

25 References


Download ppt "Collage Score Card & Software defect prediction"

Similar presentations


Ads by Google