Presentation is loading. Please wait.

Presentation is loading. Please wait.

Text Mining SEC Filings for Fraud Detection Fletcher Glancy ISQS 7342.

Similar presentations


Presentation on theme: "Text Mining SEC Filings for Fraud Detection Fletcher Glancy ISQS 7342."— Presentation transcript:

1 Text Mining SEC Filings for Fraud Detection Fletcher Glancy ISQS 7342

2 Research Issues 1.Can fraud be detected from SEC filings? 2.Can text mining provide a methodology for detection of potential fraud? 3.If text mining can provide an indication of potential fraud, which algorithm gives the best performance? Fletcher Glancy12/2/2008

3 Brief Background Corporate governance fraud has been a major concern, i.e., Enron, WorldCom, HealthSouth. Detection has been after many years of abuse. Most techniques involve ratio analysis. Churyk et al. used Context Analysis to detect fraud in MDA of 10K filings. Fletcher Glancy12/2/2008

4 Potential Strengths of Text Mining TM can be automated. The results can be used for further data mining. TM eliminates researcher bias that is potentially present in Context Analysis. Fletcher Glancy12/2/2008

5 Potential Problems/Weakness There is no context in text mining, only statistics. It is difficult to understand the relationships with a document-term matrix. Unable to handle negatives or punctuation. Fletcher Glancy12/2/2008

6 Narrow the Focus - Negatives Antonyms – Word Opposites. Negatives – not good = bad. Interference by articles. Not a good day. Interference by modifiers. Not highly motivated. Fletcher Glancy12/2/2008

7 Possible Data Preparation Options Preprocessing to remove articles. Convert punctuation to text. Replace ‘;’ with semicolon. Combine following noun with “not”. Not highly motivated becomes highly not_motivated. Create not_noun and replace with antonym. not_dead is replaced with alive. Fletcher Glancy12/2/2008

8 Testing Data Preparation Options Select/Create text database. – 10K Notes and MDA. – Firms that have received AAER. Preprocess with each alternative individually and cumulative. Create document text matrix and SVD. Fletcher Glancy12/2/2008

9 Testing Data Preparation Options Calculate variance of document set using SVD. Create logistic regression using set SVD and calculate variance. Test for predictability using validation set. Fletcher Glancy12/2/2008

10 Questions? Welcome to my potential dissertation topic! Fletcher Glancy12/2/2008


Download ppt "Text Mining SEC Filings for Fraud Detection Fletcher Glancy ISQS 7342."

Similar presentations


Ads by Google