Apache Avro CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.

Slides:



Advertisements
Similar presentations
Chapter 11 Introduction to Programming in C
Advertisements

CMSC 132: Object-Oriented Programming II Nelson Padua-Perez William Pugh Department of Computer Science University of Maryland, College Park.
Classes and Objects. What is Design? The parts of the software including – what information each part holds – what things each part can do – how the various.
C How to Program, 6/e © by Pearson Education, Inc. All Rights Reserved.
STRING AN EXAMPLE OF REFERENCE DATA TYPE. 2 Primitive Data Types  The eight Java primitive data types are:  byte  short  int  long  float  double.
Programming Languages and Paradigms
De-mystifying Google’s hottest binary protocol Prasanna Kanagasabai Jovin Lobo.
SOAP.
CSCI-1680 RPC and Data Representation Rodrigo Fonseca.
Lab#1 (14/3/1431h) Introduction To java programming cs425
Engineering Problem Solving With C++ An Object Based Approach Fundamental Concepts Chapter 1 Engineering Problem Solving.
Copyright © 2012 Pearson Education, Inc. Chapter 1: Introduction to Computers and Programming.
Session 1 CS-240 Data Structures Binghamton University Dick Steflik.
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
Computer Science: A Structured Programming Approach Using C1 Objectives ❏ To understand the structure of a C-language program. ❏ To write your first C.
Copyright © 2012 Pearson Education, Inc. Chapter 1: Introduction to Computers and Programming.
Copyright © 2003 ProsoftTraining. All rights reserved. Distributed Object Computing Using Java and CORBA.
Data Formats CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
CSCI-1680 RPC and Data Representation Rodrigo Fonseca.
Avro Apache Course: Distributed class Student ID: AM Name: Azzaya Galbazar
Announcements  If you need more review of Java…  I have lots of good resources – talk to me  Use “Additional Help” link on webpage  Weekly assignments.
CIS Computer Programming Logic
Copyright © 2012 Pearson Education, Inc. Chapter 1: Introduction to Computers and Programming 1.
Performance and Insights on File Formats – 2.0 Luca Menichetti, Vag Motesnitsalis.
Chapter 1: Introduction to Computers and Programming.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 1: Introduction to Computers and Programming.
The New Zealand Institute for Plant & Food Research Limited Matthew Laurenson Web Services: Introduction & Design Considerations.
Prof: Dr. Shu-Ching Chen TA: Samira Pouyanfar Spring 2015 C Tutorial CIS5027.
IEEE CCGrid May 22, The gSOAP Toolkit Robert van Engelen Kyle Gallivan Florida State University.
C Tokens Identifiers Keywords Constants Operators Special symbols.
Copyright © Curt Hill Sounds, Resource Packs, JSON What more would you want?
Big Data Open Source Software and Projects ABDS in Summary I: Layers 1 to 2 Data Science Curriculum March Geoffrey Fox
Lec 6 Data types. Variable: Its data object that is defined and named by the programmer explicitly in a program. Data Types: It’s a class of Dos together.
 JAVA Compilation and Interpretation  JAVA Platform Independence  Building First JAVA Program  Escapes Sequences  Display text with printf  Data.
Serialization. Serialization is the process of converting an object into an intermediate format that can be stored (e.g. in a file or transmitted across.
1 Cisco Unified Application Environment Developers Conference 2008© 2008 Cisco Systems, Inc. All rights reserved.Cisco Public Introduction to Etch Scott.
1 C++ Syntax and Semantics, and the Program Development Process.
Property of Jack Wilson, Cerritos College1 CIS Computer Programming Logic Programming Concepts Overview prepared by Jack Wilson Cerritos College.
Distributed Programming CSCI 201L Jeffrey Miller, Ph.D. HTTP :// WWW - SCF. USC. EDU /~ CSCI 201 USC CSCI 201L.
Question of the Day You overhear a boy & his mother talking: Mom:What is ? Boy: That's easy, 33. Mom: Good. What's ? Boy:Simple. It's 40. Mom:Excellent!
JSON Java Script Object Notation Copyright © 2013 Curt Hill.
And other languages….  Array literals/initialization a = [1,2,3] a2 = [-10..0, 0..10] a3 = [[1,2],[3,4]] a4 = [w*h, w, h] a5 = [] empty = Array.new zeros.
C++ Programming Basic Learning Prepared By The Smartpath Information systems
1 Text Reference: Warford. 2 Computer Architecture: The design of those aspects of a computer which are visible to the programmer. Architecture Organization.
Types(1). Lecture 52 Type(1)  A type is a collection of values and operations on those values. Integer type  values..., -2, -1, 0, 1, 2,...  operations.
1 Chapter 2 C++ Syntax and Semantics, and the Program Development Process.
2016 N5 Prelim Revision. HTML Absolute/Relative addressing in HTML.
Recitation 5 Enums and The Java Collections classes/interfaces 1.
How to execute Program structure Variables name, keywords, binding, scope, lifetime Data types – type system – primitives, strings, arrays, hashes – pointers/references.
Question of the Day You overhear a boy & his mother talking: Mom:What is ? Boy: That's easy, 33. Mom: Good. What's ? Boy:Simple. It's 40. Mom:Excellent!
C is a high level language (HLL)
 Data Type is a basic classification which identifies different types of data.  Data Types helps in: › Determining the possible values of a variable.
Chapter 1: Introduction to Computers and Programming.
Basic Data Types อ. ยืนยง กันทะเนตร คณะเทคโนโลยีสารสนเทศและการสื่อสาร มหาวิทยาลัยพะเยา Chapter 4.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 9 Web Services: JAX-RPC,
Basic Concepts: computer, program, programming …
Chapter 6 – Data Types CSCE 343.
Chapter 6: Data Types Lectures # 10.
Object-Oriented Programming & Design Lecture 14 Martin van Bommel
Data.
Chapter 11 Introduction to Programming in C
Chapter 9 Web Services: JAX-RPC, WSDL, XML Schema, and SOAP
Towards Automatic Model Synchronization from Model Transformation
Chapter 11 Introduction to Programming in C
CS105 Introduction to Computer Concepts Intro to programming
JSON Data Demo.
CSCI-1680 RPC and Data Representation
Chapter 11 Introduction to Programming in C
Java Programming Language
Presentation transcript:

Apache Avro CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook

Overview Avro is a data serialization system Implemented in C, C++, C#, Java, JavaScript, Perl, PHP, Python, and Ruby

Avro Provides Rich data structures Compact, fast, binary data format A container file to store persistent data Remote Procedure Call (RPC) Simple integration with dynamic languages

Schema Declaration A JSON string A JSON object – {"type": "typeName"...attributes...} A JSON array, representing a union of types

Primitive Types Null Boolean Int Long Float Double Bytes String

Complex Types Records Enums Arrays Maps Unions Fixed

Record Example - LinkedList { "type": "record", "name": "LongList", // old name for this "aliases": ["LinkedLongs"], "fields" : [ // each element has a long {"name": "value", "type": "long"}, // optional next element {"name": "next", "type": ["LongList", "null"]} ] } Comments are here for descriptive purposes only – there are no comments in JSON

Enum Example – Playing Cards { "type": "enum", "name": "Suit", "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"] }

Array { "type": "array", "items": "string" }

Maps { "type": "map", "values": "long" }

Unions Represented using JSON arrays – ["string", "null"] declares a schema which may be a string or null May not contain more than one schema with the same type, except in the case of named types like record, fixed, and enum. – Two arrays or maps? No. But two record types? Yes! Cannot contain other unions

Fixed { "type": "fixed", "size": 16, "name": "md5" }

A bit on Naming Records, enums, and fixed types are all named The full name is composed of the name and a namespace – Names start with [A-Za-z_] and can only contain [A-Za-z0-9_] – Namespaces are dot-separated sequence of names Named types can be aliased to map a writer’s schema to a reader

Encodings! Binary JSON One is more readable by the machines, one is more readable by the humans Details of how they are encoded can be found at

Compression Null Deflate Snappy (optional)

Other Features RPC via Protocols – Message passing between readers and writers Schema Resolution – When schema and data don’t align Parsing Canonical Form – Transform schemas into PCF to determine “sameness” between schemas Schema Fingerprints – To “uniquely” identify schemas

Code Generation! ~]$ cat user.avsc { "namespace": "example.avro", "type": "record", "name": "User", "fields": [ {"name": "name", "type": "string"}, {"name": "favorite_number", "type": ["int", "null"] }, {"name": "favorite_color", "type": ["string", "null"] } ] }

Code Generation! ~]$ java -jar avro-tools jar compile \ schema user.avsc. Input files to compile: user.avsc ~]$ vi example/avro/User.java

Java and Python Demo! demos/tree/master/avro demos/tree/master/avro

References