Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comma Separated Values

Similar presentations


Presentation on theme: "Comma Separated Values"— Presentation transcript:

1 Comma Separated Values
CSV Comma Separated Values

2 Goals for these videos Understand the distinction between a schema and a database instance Understand three commonly used file formats

3 Comma Separated Values
Delimited flat file Stores tabular data (numbers and text) in plain text Each line is a record Each record is a list of fields, separated by commas No actual standard except convention.

4 CSV Edge Cases Fields can be put in double quotes "josh","2016"
Fields containing an embedded comma character (,), double quote (") or newline character must be in double quotes "Nahum, Josh" Embedded double quotes must be preceded by an additional double quote "Josh said, ""Hi"" to us!" The first line of the file may be a header, which contains the column names. You need contextual information to tell if this is the case.

5 CSV Example CSV Contents Table Contents To Subject Message
Sign Up Do it, Do it now "Scare" Quotes allowed? To,Subject,Message Up,"Do it, Do it now" Quotes"," Are they allowed?"

6 Well-Formed CSV Which of these lines are well-formed (legal) lines in a CSV file? Josh,Nahum,48823 Hi Class!,Friday,2016 "\"Stop\" he said",Josh New York City,40°42'46"N,74°00'21"W

7 CSV Schema 1.0 Schema defines a textual language which can be used to define the data structure, types and rules for a data format. For instance, we may want to constrict what values are legal in a given column. The CSV format itself is very permissive. So we need a second document to define what constitutes "valid" data. There is an working draft of a CSV schema found here ( preservation.github.io/csv- schema/) by the National Archives of the UK.

8 Example CSV Schema version 1.0 @totalColumns 3 name: notEmpty age: range(0, 120) gender: is("m") or is("f") Valid CSV Data name,age,gender james,21,m lauren,19,f simon,57,m

9 Well-Formed versus Valid
Well-Formed means the data conforms to the file format (e.g. CSV). Valid means the data conforms to a schema (more restrictive than the format)

10 Whitespace Do these two lines represent the same record/content?
Josh,Nahum,48823 Josh, Nahum, 48823 Yes No Depends


Download ppt "Comma Separated Values"

Similar presentations


Ads by Google