Presentation is loading. Please wait.

Presentation is loading. Please wait.

Text Editing Kim Shepherd Digital Development Team The University of Auckland Library Tools, tips, tricks LIANZA ITSIG webinar.

Similar presentations


Presentation on theme: "Text Editing Kim Shepherd Digital Development Team The University of Auckland Library Tools, tips, tricks LIANZA ITSIG webinar."— Presentation transcript:

1 Text Editing Kim Shepherd Digital Development Team The University of Auckland Library Tools, tips, tricks LIANZA ITSIG webinar series

2 Summary General (large) text files –We manage and manipulate text data daily –It’s tedious and time consuming –Find & Replace is too limited and dangerous –We know there must be a better way... Tabular data files (eg. Spreadsheets) –We work with these all the time, usually in Excel –What tools can help us clean messy data?

3 Topics Regular Expressions Text Editors Operating on lines, not entire files Google Refine

4 Regular Expressions /^\s+[a-zA-Z0-9](?:\W+)/

5 Regular Expressions A way to describe a set of strings and capture parts of them Originated in old UNIX/POSIX tools Now used all over the place Test your regexes out on the web: –http://gskinner.com/RegExr/http://gskinner.com/RegExr/

6 Text Editors & Useful Languages sed, grep, awk

7 Text Editors Word processors aren’t text editors Shop around, compare features My favourite: Vim (UNIX, Windows, Mac) –Wikipedia comparison of editor featuresWikipedia comparison of editor features –Wikipedia list of regex softwareWikipedia list of regex software

8 Useful Languages / Interpeters Perl –An old favourite, great for string manipulation Python –The cool kids tell me it’s better than Perl GREL –We’ll get to this later...

9 Line-by-line processing while( ) {.... }

10 Line-by-line processing Large files are large! –If they’re big on disk, they’ll be big in memory Lines are (usually!) small –Read a line –Do something with it –Output the modified line

11

12 Google Refine Cleans messy tabular data –Easy facetting and filtering of columns/values –Easy transformation of values Google Refine Expression Language (GREL) –Extensive use of regular expressions and other standard string manipulation techniques Other features –Perform web service calls directly, reconcile row IDs

13

14 Conclusion Our problems are solvable! –Regular expressions –Decent text editors for general/unformatted text –Google Refine for tabular data Contact me –Please feel free to contact me with questions, corrections or ideas –Google+:


Download ppt "Text Editing Kim Shepherd Digital Development Team The University of Auckland Library Tools, tips, tricks LIANZA ITSIG webinar."

Similar presentations


Ads by Google