Big Data ---a statistician’s perspective Ming Ji, PhD College of Nursing USF.

Slides:



Advertisements
Similar presentations
MongoDB PostgreSQL SaaS Quality Measure Storage
Advertisements

Lesson 6: Telemedicine. What is telemedicine? Telemedicine is the remote diagnosis and treatment of patients by means of telecommunications technology.
STATISTICS DEFINITION AND MEANING
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Big Data and Predictive Analytics in Health Care Presented by: Mehadi Sayed President and CEO, Clinisys EMR Inc.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Open Government Vlora Ademi, Business Development Manager-Edu, Microsoft Macedonia &Kosovo
Big Data Workflows N AME : A SHOK P ADMARAJU C OURSE : T OPICS ON S OFTWARE E NGINEERING I NSTRUCTOR : D R. S ERGIU D ASCALU.
DSS: Decision Support Systems and AI: Artificial Intelligence
Data mining and statistical learning: lecture 1a Statistics and computer science for a data-rich world.
25 Need-to-Know Facts. Fact 1 Every 2 days we create as much information as we did from the beginning of time until 2003 [Source]Source © 2014 Bernard.
Fraud Detection in Banking using Big Data By Madhu Malapaka For ISACA, Hyderabad Chapter Date: 14 th Dec 2014 Wilshire Software.
Evolution in Coming 10 Years: What's the Future of Network? - Evolution in Coming 10 Years: What's the Future of Network? - Big Data- Big Changes in the.
Introduction to Data Science Kamal Al Nasr, Matthew Hayes and Jean-Claude Pedjeu Computer Science and Mathematical Sciences College of Engineering Tennessee.
Big Data. What is Big Data? Analog starage vs digital. The FOUR V’s of Big Data. Who’s Generating Big Data The importance of Big Data. Optimalization.
Big Data A big step towards innovation, competition and productivity.
1 © 2012 Oracle Corporation – Proprietary and Confidential Big Data Challenge – From EMR’s to Translational Research Miroslav Končar, PhD Oracle Healthcare.
© 2012 TeraMedica, Inc. Big Data: Challenges and Opportunities for Healthcare Joe Paxton Healthcare and Life Sciences Sales Leader.
Medical Informatics Basics
Basic Concepts in Big Data
By: Dr. Mohammed Alojail College of Computer Sciences & Information Technology 1.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
© 2011 IBM Corporation Smarter Software for a Smarter Planet The Capabilities of IBM Software Borislav Borissov SWG Manager, IBM.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Big data analytics Rafal Lukawiecki Strategic Consultant Project Botticelli
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
Big Data. What is Big Data? Big Data Analytics: 11 Case Histories and Success Stories
Charles Tappert Seidenberg School of CSIS, Pace University
Medical Informatics Basics
Computers in Healthcare Jinbo Bi Department of Computer Science and Engineering Connecticut Institute for Clinical and Translational Research University.
Medical Informatics Basics Lection 1 Associated professor Andriy Semenets Department of Medical Informatics.
© 2014 Advanced Performance Institute, BWMC Ltd. All rights reserved. To get a better understanding of what Big Data is, it is often described using Five.
© 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche.
The analyses upon which this publication is based were performed under Contract Number HHSM C sponsored by the Center for Medicare and Medicaid.
Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Major Disciplines in Computer Science Ken Nguyen Department of Information Technology Clayton State University.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
Cloud Computing & Big Data Group 9 Femme L H Sabaru | Aditya Gisheila N P | Aninda Harapan | Harry | Andrew Khosugih.
Big Data – Big Opportunity Mohammad Khansari ITRC President Jan 2015 ITRC, Tehran, Iran.
© 2012 IBM Corporation Converting Big Data into Big Knowledge.
Big Data: Electronic Gold And why Oreus should invest in Big Data Thomas Snuverink.
SUPPLY CHAIN OF BIG DATA. WHAT IS BIG DATA?  A lot of data  Too much data for traditional methods  The 3Vs  Volume  Velocity  Variety.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
IoT Meets Big Data Standardization Considerations
What’s the Big Deal about Big Data? Jennifer Lewis Priestley, Ph.D. Professor of Statistics and Data Science.
Miloš Kotlar 2012/115 Single Layer Perceptron Linear Classifier.
BUSINESS INTELLIGENCE & ADVANCED ANALYTICS DISCOVER | PLAN | EXECUTE JANUARY 14, 2016.
What’s the Big Deal about Big Data? 52 nd Annual ACMSE Conference Jennifer Lewis Priestley, Ph.D. Professor of Statistics and Data Science.
Big Data Javad Azimi May First of All… Sorry about the language  Feel free to ask any question Please share similar experiences.
Announce-1 CSE 5810Announcements  Informatics is:  Management and Processing of Data  From Multiple Sources/Contexts  Involves Classification (Ontologies),
Project Proposal Presentation on M edicine prescription system based on big data analysis By Ashish Kumar Chakraverti.
BIG DATA SOURCE AND EXAMPLES DIRECT QUOTES FROM SOURCE: RAINER, KELLY, PRINCE, BRAD AND WATSON, HUGH, MANAGEMENT INFORMATION SYSTEMS: MOVING BUSINESS FORWARD,
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Data Analytics (CS40003) Introduction to Data Lecture #1
CNIT131 Internet Basics & Beginning HTML
Big Data.
Big-Data Fundamentals
Mohammad J. Mansourzadeh
What is Pattern Recognition?
Luke Do, Jessica Olmedo, Arely Romero, and Vianca Santana
Big Data.
Big Data Overview.
Big Data Young Lee BUS 550.
Zoie Barrett and Brian Lam
Big Data: Four Vs Salhuldin Alqarghuli.
Big Data Gulriz Kurban.
Big DATA.
Simplifying Healthcare
Artificial Intelligence
Presentation transcript:

Big Data ---a statistician’s perspective Ming Ji, PhD College of Nursing USF

Disclaimer  I am not an expert in big data and cannot cover all the developments in big data

Big Data is Here  What is Big Data?  Data that are too big to handle  Data that challenge existing technology and methods to store, process and analyze.

Examples of Big Data  Science Data (CERN)  National Survey Data (NHANES, NHIS, ACS, CPS, NHGIS)  Genomic Data (microarray, DNA sequencing, GWAS, microbiome)  Clinical Data (EHR)  Sensor Data (mHealth)  Social Media (Facebook, Twitter, LinkedIn, websites, blogs)  Climate Data (NOAA)  Financial Data (stock trading, banking, insurance, mortgage, credit cards )

Characteristics of Big Data  Volume  Velocity  Variety  Veracity

Generation of Big Data  Employee generated  User generated  Machine generated

Volume --- Big Data is Big  2.7 Zetabytes of data exist in the digital universe today.  Facebook stores, accesses, and analyzes 30+ Petabytes of user generated data.  In 2008, Google was processing 20,000 terabytes of data (20 petabytes) a day.  100 terabytes of data uploaded daily to Facebook.  Data production will be 44 times greater in 2020 than it was in  In the last 5 years, more scientific data were generated than the total amount of data generated in previous human history.

Velocity  High speed of streaming data.

Variety  Besides the traditional structured data such as numerical data sets stored in relational databases, big data are everywhere and are in many different formats with some are unstructured.  Numerical data; audios; videos; text messaging data; websites; blogs; imaging data; genomic data; environmental data; climate data; clinical data; handwritings, etc.

Veracity  Bias  Uncertainty  Abnormality

Challenges of Big Data  Require new data systems to transfer, store and process big data (Hadoop, Storm, SAP, BigQuery, Amazon EC2)  Require data analysis methods of big data (big data analytics using data mining and statistical data analysis)  Challenges traditional statistics theory (Law of Large Samples, Central Limit Theorem, n<<p)  Challenges traditional scientific research method (prediction based vs mechanism based research, can big data replace traditional scientific research?)

Personal View: Big Data is Still Data  Big data must follow the same principles of data management 1.Data collection (streaming data, sensors, GigaScience) 2.Data storage (Oracle, SAP, IBM, EMC, Hadoop, Storm, BigQuery, Amazon EC2 and EMR) 3.Data format conversion (voice2txt, txt2voice, natural language processing from unstructured to structured) 4.Data integration ( data linkage, meta data) 5.Data privacy (privacy-preserved data mining, computer security)

Personal View: Big Data May Not Be Big Enough  GWAS studies do not identify any genetic mutation for disease prediction.  Whole genome sequencing fails to predict risk of most common diseases. BMJ 

Personal View: Big Data Cannot Escape Statistics Principles  Collecting and analyzing data from any real world process must follow the same principles in statistical study design and data analysis. 1.Big sample size does not remove bias (<- sampling). 2.Big data may not be big enough (failure of predictive models from genomic data alone <- unmeasured confounders and underspecified models) 3.Not all the big data are useful and only a small subset is interesting to us --- find a needle in a hay stack(dimension reduction, Google’s MapReduce, real time data analysis)

Personal View: Big Data and Cybernetics  Big data will advance the further merging of humans and machines as predicted by Norbert Wiener on automation and human society. (wearable technology, machine intelligence, hybrid decision making )  System Sciences and Information Theory may be good theoretical models to guide us build more big data systems for various applications (feedback, control, adaptation, information).

Personal View: Big data will boost computational sciences  Big data calls for new hardware and software for computation (GPU, cloud computing, DNA computing, quantum computing)  Big data calls for the next generation artificial intelligence to produce “ smarter algorithms” to handle big data because we humans cannot directly process big data. (Super Turing Machine)

The Future of Big Data: Hope or Hype?  We are at the cross road. The true effect of big data on human society is yet to be seen.  And we cannot use predictive analytics to predict the future of big data.

How do we use big data in our research?  Think Big: Can you use historically collected and archived big data ? (genomic data, large national surveys, NOAA climate data, etc.)  Think Measurement: Do you have measurement devices that can generate big data ? (sensors, images, videos, genomics, climates, etc.)  Think Multidisciplinary: Do you have experts from other disciplines (informatics, computer sciences, engineering, biology, mathematics, statistics, etc) to work on big data?

Case Studies of Big Data: IBM Watson  Sloan Kettering Cancer Center doctors are training IBM Watson to be an expert in cancer diagnosis and treatment based on learning:  Over 600,000 diagnostic reports  Two million pages of medical journal articles  One and a half million patient records  14,700 hours of hands-on training

Case Study: Quantified Self lead by Larry Smarr  The Quantified Self Movement participants uses different devices to collect physical activity, sleep, diet, gut microbiome data to monitor their own health and use the data analysis results to work with their doctors for intervention.  Larry Smarr considers this as the future of disease prevention.

Case Study: Use big data to fight fraud in Medicare and Medicaid  CMS estimated that 65 billion dollars in Medicare and Medicaid lost to fraud in 2011  Fraud detection algorithms are implemented in large claim data system to capture suspicious fraudulent cases. (Real time fraud detection, fraud detection using social network data)  Health Care Fraud and Abuse Control Program reported to have recovered 4.2 billion dollars 