Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data and Accounting

Similar presentations


Presentation on theme: "Big Data and Accounting"— Presentation transcript:

1 Big Data and Accounting
A Resource for the Florida Association of Accounting Educators

2

3 How big is Big Data?

4 How Large is Large?

5 GOOGLE processed 20 petabytes per day
2008 GOOGLE processed 20 petabytes per day 2016 using Moore’s Law, estimated 150 petabytes per day of Google data Moores law suggests exponential growth, observation by Co-founder of Intel who observed doubling of technology growth every 18 months.

6 Unknown Hypothesis Decision
We use data to help form discoveries from the unknown to bridge the gap between a grey area and a decision. Definition of accounting

7 Statistical Analysis offers many
opportunities to draw correlations and conclusions from data Insert weird correlation here

8 Every item of data, every reaction, every click, every recorded wink in a shopping aisle, every second on an assembly line could assist managers a new and different decisions and branch off into new opportunities.

9 How do we define Big Data?

10 Sources: Labs, Forbes http://www. forbes
Volume – benefits from having large amounts of data, better models; one of the best places to store data is Hadoop. Facebook uses Hadoop Velocity – can you cross the road? Would you cross the road if a traffic picture was 5 minutes old? How fast can the information be stored/delivered. Variety – Take unstructured data and create ordered meaning. We will speak of this later. Veracity – Accuracy

11 How do we find the part of the Big Data that is most useful?

12 Star Trek actor George Takei recently held a contest on Facebook, asking entrants to describe the hardest thing to explain about life today to a time traveler from the 1950s. The winner: "I possess a device in my pocket that is capable of accessing the entirety of information known to man. I use it to look at pictures of cats and get in arguments with strangers."

13 Noise in Big Data is meaningless data
Noise in Big Data is meaningless data. Junk within a litany of information. A Signal in Big Data is a useful in making a business decision or an important objective factor in analytics. The nuggets of information, within our cat pictures.

14 The Signal and The Noise by Nate Silver,
Website fivethirtyeight.com

15 There are a lot of exciting ways to help us make connections, correlations, and discoveries. However, while making these connections we must be vigilant to look for ways to separate signals from noise. One great way to start is with simply classification theorems. A popular way to start is with Bayes Theorem. Sherlock holmes

16 How do we quantify Big Data?

17

18 To sift through mountains data, Alan Turing used cipher machines incorporated with Bayes Theorem to decode Nazi German communication. A similar approach can be used to uncover trends and make useful business decisions based on data. Bayes theorem is a great introduction to Data Analytics and can be incorporated in early classes.

19 Bayes Theorem is useful in understanding how data items can relate to each other.
It can also be used for understanding statistical outcomes. When one outcome is affected or related to another it’s a Bayesian inference

20 BAYES THEOREM AUDIT EXAMPLE:
A COMPANY BUDGET AUDIT REVEALS THAT 4% OF DEPARTMENT BUDGETS CONTAIN ERRORS. A PROGRAM IS DEVELOPED TO ANALYZE BUDGETS. THE TEST IS ABLE TO IDENTIFY ERRORS 98 % OF THE TIME IN BUDGETS WITH ERRORS AND 5% OF THE TIME IN BUDGETS WITHOUT ERRORS IF A BUDGET IS MARKED WITH AN ERROR, WHAT IS THE PROBABILITY THAT THE BUDGET ACTUALLY HAS AN ERROR:

21 Only 45% marked wrong will actually have an error

22 How has Data Analysis evolved?

23 The limitation of a decision tree is that many decisions are formulated by more and are not classified/structured. A Neural Network, patterned after our brains, expands these variables in decision making

24 Example: Does the banking loan customer read Terms and Conditions
Example: Does the banking loan customer read Terms and Conditions? Or just check the box?

25 Big Data allows you to “slice and dice” consumer behavior
Big Data allows you to “slice and dice” consumer behavior. For example, when consumers are filling out an online loan application, software can track the number of seconds spent on completing the application. Software could potentially use the behavior of failure to read the terms and conditions as a credit granting factor. Perhaps the consumer is too cavalier and a higher credit risk if they do not take the time to read the terms and conditions in the application and simply check the box. Article Source:

26 How do Accountants use Big Data

27 Accountants have traditionally harvested structured data
Accountants have traditionally harvested structured data. It is ordered and classified. It can be neatly tagged into a relational database through fields and tables. Structured data is what we are used to quantifying in every day accounting: analyzing financial statement information, XBRL reporting (more later), ERP reporting.

28 Unstructured data is harder to quantify
Unstructured data is harder to quantify. It cannot be neatly tagged into a relationship database through ordered fields and tables. Unstructured data is not what we are used to quantifying in every day accounting: Social media, MD&A statements in financial statements, observations from sensors, text, video, and other feedback items on mobile devices.

29 “Unstructured data represent the largest proportion of existing data and the greatest opportunity for exploiting Big Data,” Journal of Information Systems, American Accounting Association Ref:

30 Unstructured = unclassified, untagged, ungrouped

31 Time out for a Pop Quiz: Can you name these icons?

32 YouTube Facebook LinkedIn Twitter Instagram Google Chrome Whatsapp Yelp Snapchat

33 Every click, every view, every sign up is unstructured data.
The internet has 3.17 Billion Users, the average user has 5.54 social media accounts; Google processes 40,000 search inquiries per second. Every click, every view, every sign up is unstructured data. Statistics Source:

34 One way unstructured Data can be mined and quantified through tf-idf
Tf-idf stands for “term frequency” and “inverse document frequency” Image Credit: TD-IDF Research Paper:

35 TF- IDF Query Example The brown fox jumps over the log. Then the brown fox lies down on the grass TF Most frequent term: “the” – 4 instances Query for: “brown fox” – 2 instances IDF Lower the weight of the most frequent word: “the”

36 Data that comes from structured places like financial statements and accounting programs can lose structure if it is not in a form that can be sorted, quantified, or interpreted between more than one system. Image Credit:

37 Accountants and information technology must and provide weights and make scaling decisions and translations when software systems cannot speak to each other and with unstructured data. This loses some objectivity that ultimately drives correlations and decisions. This decision making should be a pairing of accounting professional’s knowledge with data analysis, and information system structuring. It cannot be solely in the hands of the IT department. Ref:

38 Don’t Discount Unstructured Data

39 Example of Unstructured Data and Fraud Detection:
An executive chef is shorting customer portions at a restaurant. Every third night, he takes some of the inventory home. The customers take an online survey however, the online survey does not reflect portion sizes in its scale. The customers complain in the comments section about the portion sizes, however this verbiage is not tracked in the software. It is not caught in the inventory count because its shorting the customers. The software gathers the survey information from on a Likert Scale (Very Good, Somewhat Good, Average Etc.) and is unable to make any connections. There are no obvious whistleblowers and clues, which is how this fraud could have been detected. Six months later, when sales have declined, and a video system is installed the fraud is discovered. Additional Example of shorting fraud:

40 …Or the restaurant monitors their unstructured data in Social Media and the fraud the next evening in one tweet. Whistle blown. Image Credit:

41 Link to monitoring tools:
Monitoring Social media is known as “social listening.” Social listening can be a good determinant of fraud and business opportunity. Inexpensive tools are available. Small businesses do not need to hire a social media manager to engage and monitor social media. Link to monitoring tools: Ref:

42 Time to Circle Back to Structured Data
Let’s talk about XBRL

43

44 You can learn more about XBRL at https://www.xbrl.org/
There is also an XBRL expert at the University of Delaware who has a complete XBRL textbook and curriculum you can use in your AIS curriculum or continuing education Students certified in XBRL can demand higher salaries at graduation and be in a wider job pool. Demand exceeds supply. Ref:

45 What kind of software and programs do Accountants Use for Big Data?

46 The majority of Big Data solutions are: Software-only
As an appliance (like Oracle) In the Cloud Source:

47 Traditional Accounting Software
Excel Access Business Intelligence Software: Large Market Oracle, SAP, Microsoft IBM, SAS Small Market Tableau Qlik, Alteryx, Actuate Internet of Things Deep Learning Predictive Sensor & Control Technology FOB in

48 Common Infrastructure Location Interdependence Online Accessibility
Let’s talk about Common Infrastructure Location Interdependence Online Accessibility Utility Pricing On-Demand resources

49 Utilizing SAAS (Software as a Service) Cloud Accounting vendors allows businesses to take advantage of reduced infrastructure, employee, and hardware costs. Reputable Cloud Accounting Systems are excellent for small business

50 Businesses startups are often unaware of the threats surrounding the data they collect.
The cloud can protect them from hard drive failure and natural disasters…However it cannot protect them from themselves.

51 What can Accountants do to help protect and maintain the security of Big Data?

52 Time out for a Pop Quiz: How many of you recognize this movie?
?? ?? ?? ??

53 In the Social Network, (posted in previous photo), a social media entrepreneur is portrayed as having stolen an idea from classmates at Harvard University. A recent lawsuit against Fitbit was dismissed after a court found that it was not liable for information it received ex-employees of Jawbone that brought over private files. A t-shirt seller running on autopilot uses a data mining program to find popular phrases and automatically post variations based on mined word counts on t-shirts on Amazon. Using the phrase “Keep calm and Carry on” accidently post a variation which was an offensive phrase on t-shirts. Source:

54 The cloud does not protect businesses from data piracy, hacking, human error and unauthorized use and outside sharing.

55 You cannot run on autopilot
Businesses that rely on simplified algorithms run a risk of embarrassing situations and overlooking potential frauds. You cannot run on autopilot

56 Important Note: Managers should be data-informed
Important Note: Managers should be data-informed. You cannot run on auto-pilot and let the data make your decisions. Front-line situations vary and each situation is unique.

57 Cybersecurity in big data involves building good gates through tagging and identifying potential threats, outliers, and opportunities before they happen. It is proactive.

58 Organized Classified, Properly tagged information is GOOD Security, it can help you quickly find outliers. OUTLIERS MATTER Link Reference:

59 To ensure cybersecurity, large quantities of data can be also used to build classifiers.
A classifier is a gate. A positive classifier means that it is malware, a negative classifier means that it is a normal communication or data item. Programmers build classifiers by having access to an understanding of what is good and what is bad. This is best managed when working side-by-side with accountants and auditors. Auditors design the gate, programmers build it. Placing locks and controls and organizing digital information is imperative to securing and protecting information. This is a reference page, (link: for those who are interested in learning how to remove outliers from data: Data1 = Table[PDF[NormalDistribution[3.5, .8], i], {i, -5, 15, .01}] + RandomReal[{100, 500}]; noise = RandomReal RandomReal[{-0.2, .2}, Length[data1]]; data2 = data1 + noise; n = RandomInteger[{1, Length[data2]}, RandomInteger[{2, 10}]]; data2[[n]] = data2[[n]]*1.01; ListPlot[{data2}, PlotRange -> All]

60

61 Observation Classifier Spam/Violation/outlier (positive) Keep
(negative) An comes through office communication, data analysis looks for certain classifiers, if it passes through the classifier gate it is kept, if it is not, it I classified as an outlier and checked to see if it is malicious.

62

63 A traditional audit checks once a year to see if the gate doors, and internal controls are working properly. A continuous audit checks gates and controls daily through data analysis and seeks opportunities and threats and looks for outliers in data. Auditors handle exceptions/alarms. (Constant vigilance). Source: Dr. Vasarheyl also presented at the AISE conference in Colorado Springs, 2016

64 How do we prepare students for Big Data?

65 The AICPA lists 8 Tips for Teaching Big Data:
Follow the trendsetters (and they aren’t who you think) Do your homework Study up (Coursera) Familiarize yourself with Software (Tableau, Apache, Hadoop) Start Small (start with a few rows of data to practice the concepts) Incorporate Real life examples Encourage Visualization This is not Optional. Big Data is here to stay.

66 The AICPA lists this skillset to perform continuous audits:
1 Knowledge of business processes, controls and inherent risks 2. Internal audit experience 3. Familiarity with audit planning, audit processes, and forensic accounting 4. An understanding of data extraction tools (IDEA, ACL) 5. Data analytics background (regression, ANOVA, data mining, SQL, probabilities) 6. Knowledge in statistics 7. Technical skills (ERP, programming) 8. Professional skepticism and judgment Source:

67 Big Data Degree Offerings that pair accounting programs and data Experts
Universities Leading the Charge West Virginia University University of Arkansas San Diego State

68 Source Material: CURRICULUM PLANNING (Mostly) Free E-Learning
Online: Big Data University TeraData University Coursera XBRL EXCEL Power Pivot Add-in Add-in-a9c2c6e2-cc a7d d045

69 Source material: PRACTICE datasets

70 Source Material: on your phone
Big Data University Learn SQL/Python

71 Questions? Contact: Amber Gribbins 941-518-8828; gribbins@gmail.com
State College of Florida Saint Leo University Doctor of Business Admin, Accounting Candidate Argosy University Follow me on Quora: Amber Gribbins


Download ppt "Big Data and Accounting"

Similar presentations


Ads by Google