Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Extraction in Tableau

Similar presentations


Presentation on theme: "Data Extraction in Tableau"— Presentation transcript:

1 Data Extraction in Tableau
Like Pulling Teeth: Data Extraction in Tableau Richard Wesley | Tableau Software 14 November 2017 | Title Slide

2 Quick Summary Tableau is designed for interactive, ad-hoc visualization of structured data Sharing of ground-truth resources results in database overloading and a poor user experience Copying the data into extracts presents its own set of challenges The visualization queries created by real-world users can drive unexpected performance problems

3 Visual Query Language Tableau is an interface for converting visual specifications into queries: Interactivity can generate a lot of complex queries SELECT Candidate, AVG(Amount) FROM FEC WHERE Date > # # AND StillRunning is true GROUP BY Candidate What is Tableau? Drag and Drop interface. Interactive performance.

4 A Brief History of Extracts

5 In the Beginning… Tableau talks directly to over 60 database engines These connections are usually to shared data resources Widespread use of analytics in an organisation overloads the database This makes both users and IT unhappy!

6 Version 1: Microsoft Access
Move querying to the user’s machine Offline analysis Connectivity already existed Many problems: Cumbersome export process Slow Flaky Not full SQL (Joins) Semantic incompatibility

7 Version 2: Contexts Separate the data model from the connection
User Fields Connection Fields Separate the data model from the connection Allow anything with the same schema to implement the data model Copy the data into a remote table Preserves semantics Problems: Remote database still overloaded Permissions Persistence/reuse management

8 Version 3: Firebird Pros: Cons:
Full SQL-92 relational engine Rich, extensible type system Small executable footprint (2MB) Single-file database format OSS licence allowed us to debug our peculiar issues Cons: Tiny function library Semantic incompatibilities No hash operators Poor performance Contexts allowed us to properly implement extracts

9 Version 4: Server Extract Engines
Allow Tableau Server admins to use an existing engine for extracts: Firebird (default) Postgres (part of the server product) MySQL (LAMP era) MS-SQL (popular) Vertica (fast analytic database) Endless problems: Semantics still a problem Shared resources still overloaded

10 Version 5: The Tableau Data Engine
Block-oriented column iterator Similar to VectorWise Careful use of metadata Operate on compressed data Only implement the pieces we need Analytic query focus Tableau function semantics No transactions String dictionaries Problems after seven years: Insertion speed vs compression Scalability

11 Version 6: Hyper Data-centric row iterator Advantages Challenges
High speed data insertion Improved performance Better scalability Real query compiler Transactions Challenges Some performance gaps Storage footprint Lots of history to emulate

12 Federated Live Query Tableau implements federation of live queries
Get the best of all “versions” Live query still very important Extraction takes time Fresh data and schema Security issues Broader exploration Extraction issues still relevant Semantic matching Server load String processing Federation engine to optimise data movement.

13 Query Frustrations Having our own database allowed us to work around some common query frustrations.

14 String Taxonomy Strings are at least four different things: String
Symbols: Smallish domains of categorical values (state name) Character Blobs: Description, comment, PDF Composites: URL, address, name Formatted Scalars: Date, debit String Symbol Clob Composite Formatted

15 Symbols Common dimensional attribute type
Users love symbols because they let them think verbally Databases hate symbols because they are slow to group Imperfect hashing Looped comparisons Collation! The Tableau Data Engine was optimized for symbols String Symbol Clob Composite Formatted

16 Formatting in the Database
Query languages have string formatting functions Users love them because they don’t have to learn a second formatting language Causes poor performance because formatting is applied to every row Then they group by them… The TDE solved this by treating decompression as a join Frequently formatting is functionally dependent, but you only know this if you know the domain.

17 Complex Filtering Tableau allows users to create complex filters
Multidimensional sets Sets defined by aggregates Multiple levels of detail Joins used extensively Solutions Materialize sets for reuse Session-scoped temporary tables String joins on computed columns Join indexes between dictionaries Hubbard sieves

18 Visualization is Statistical
Databases come from the world of accounting Exact answers in accounting are a kind of checksum Visualizations are imprecise Qualitative not quantitative Statistical summaries (aggregation) Screen resolution Non-accounting data is often a sample to begin with Want faster results but with accuracy guarantees

19 Better Statistical Infrastructure
Missing holistic aggregates mode, quantile, mad, Gini, skew… Windowing versions SQL standard needs to stop telling us to compute them badly (e.g. sorting for median…) Store approximate summaries HyperLogLog Sketching Data quality not linear with size “Big Data” suggests it is…

20 Discussion Time

21 Tableau Logo


Download ppt "Data Extraction in Tableau"

Similar presentations


Ads by Google