Presentation is loading. Please wait.

Presentation is loading. Please wait.

Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.

Similar presentations


Presentation on theme: "Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17."— Presentation transcript:

1 Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17

2 Overview-What is Avro?  Avro is an Apache open source project that provides two services for the Hadoop(data serialization and exchange). Avro is recent serialization system.  Interoperability Can Serialize into Avro/Binary or Avro/JSON Supports reading and writing protobufs and thrift

3 Overview-Avro provides..?  Rich data structures with schema designed over JSON A compact, fast binary format. A container file, to store persistent data. Remote procedure call (RPC). Simple integration with dynamic languages.  Code generation is not required to read or write data files nor to use or implement RPC protocols.  Code generation as an optional optimization, only worth implementing for statically typed languages.

4 Overview  Avro uses JSON for Interface Description Language(IDL) To specify data types To specify protocols  Review: JavaScript Object Notation is just a light- weight text-based standard for data interchange.

5 Overview-Why the need for Avro?  Primary usage in Hadoop, provides standard:  Serialization format for persistent data  Wire format for communication Among Hadoop nodes. From client programs to Hadoop services.

6 Overview  Avro relies on schemas.  Schema stored with data  Each datum written with no per-value overheads. Thus serialization is fast and small  Avro in RPC:  Schema exchange during client-server handshake  Correspondence in fields can be easily resolved.

7 Overview-APIs  Supporting API for:  Java  C  C++  C#  Python  Ruby

8 Specification  A Schema is represented in JSON by on of:  A JSON string, naming a defined type.  A JSON object, of the form: {“type”: ”type name” …attributes…}  A JSON array, representing a union of embedded types.  Primitive types: null, boolean, int, long, float, double, bytes, string  Complex types: records, enums, arrays, maps, unions, fixed

9 Apache Avro with Maven Java 1. Apache Maven is a software project management and comprehension tool. 1. Based on the concept of a project object model (POM), 2. Maven can manage a project's build, reporting and documentation from a central piece of information

10 Apache Avro with Maven Java 1.Add two dependencies to pom.xml-the one is Apache Avro library, the other one is maven plugin that allows us to generate Java classes.

11 Apache Avro with Maven Java 1.Add two dependencies to pom.xml-the one is Apache Avro library, the other one is maven plugin that allows us to generate Java classes.

12 Apache Avro with Maven Java 2.Defining a schema #a schema file can only contain a single schema definition.

13 Apache Avro with Maven Java 2.Serializing and deserializing from a File # serializes book to file and deserializes it and print it to output.

14 Apache Avro with Maven Java 2.Serializing and deserializing from a File # serializes book to file and deserializes it and print it to output.

15 Apache Avro with Maven 2.Describing functions #DataFileWriter converts Java object into an in-memory serialized format. #SpecificDatumWriter extracts the schema from specified type. #DataFileWriter writes the serialized record, as well as the schema.

16 Apache Avro with Maven Java 4.Running the example code 5.Result output.

17 Thank you for your attention


Download ppt "Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17."

Similar presentations


Ads by Google