Presentation on theme: "Prescriptive Analytics Part I Nick Gonzalez, 2/10/14."— Presentation transcript:
Prescriptive Analytics Part I Nick Gonzalez, 2/10/14
-Isaac Asimov “It is change, continuing change, inevitable change, that is the dominant factor in society today. No sensible decision can be made any longer without taking into account not only the world as it is, but the world as it will be.”
Topics Covered Reference automated prescriptive analytics system Automated algorithm selection Distributed algorithm development
Covered in future presentations Ontology creation and extraction Representing solutions using ontologies Business optimization everything else…
Today’s Data Landscape
Tomorrow’s Data Landscape
Data is outpacing us Humans can not keep up Computers can but…
Example. Video Games game metrics learning process predictive models deploy gameserver rules simulations writ e start understandin g build / update models modif y copy to production generat e user space analytics space
Problems Scale Speed Adaptability
- Isaac Asimov “I do not fear computers. I fear the lack of them.”
Goals Remove the human element from analysis phases Generate accurate, actionable, predictive models Combine predictive models and simulation to solve problems
Guiding Principle Big data with simple algorithms will out perform sampled data with complex algorithms.
How is this possible? Focus on a single problem. Limit scope Goal must be Measurable Actionable
Process Data Data Engineering & Understanding Modeling Prep Simulation Actionable Deployment
1. Automated Understanding Find the data representation that is most ideal for the problem you are trying to solve.
Automated Understanding Raw Data Clean Data Initial Transform Stats meta
Automated Understanding Clean Data Stats meta Representation A Representation B Representation C A.1 … A.2 …
2. Automated Algorithm Selection Find the algorithm that performs best against the problem you are trying to solve, while meeting all criteria.
Automated Algorithm Selection Choose algorithms best suited for this type of problem. Consider the data, types, sparsity, size, and desired outcome Try multiple algorithms Calculate the Root Mean Squared Error or some other appropriate measure. Consider problem domain. Use cross validation. Do not just compare the average RMSE Choose the algorithm(s) that perform the best
Distributed Processing Learning to Scale
Approaching the Problem Two ways to approach a problem Bottom up Top down
Bottom Up Approach Hardware Assembly Language C, Pascal C++, Java Design Patterns, Algorithms Programmer
Top Down Problem Solver Problem Representation Distributed System Abstractions Functional Languages Hardware
Building Distributed Algorithms Identify the simplest concepts that describe data processing Collections Collection processing Problem Solver Problem Representation Distributed System Abstractions Functional Languages Hardware
Single “Box” Evolution of thought Data Data AlgorithmAlgorithm DataData Collection Collection Processing No “Box”
Coming together map mapcatreduce filtersortgroup HadoopSinglePCMPI… k-means densityrandomforestgradientboost ….
Distributed Processing Interface Simple concept Focus on building algorithms Many ways to implement this concept Works with both shared memory systems and distributed memory systems
Implementation Functional language - Clojure Reusable functions as callbacks Hadoop drivers written on top of Cascalog Data location and type are abstracted as “collection”
- Isaac Asimov “Part of the inhumanity of the computer is that once it is completely programmed and working smoothly, it is completely honest.”