Presentation on theme: "Text Editing Kim Shepherd Digital Development Team The University of Auckland Library Tools, tips, tricks LIANZA ITSIG webinar."— Presentation transcript:
Text Editing Kim Shepherd firstname.lastname@example.org Digital Development Team The University of Auckland Library Tools, tips, tricks LIANZA ITSIG webinar series
Summary General (large) text files –We manage and manipulate text data daily –It’s tedious and time consuming –Find & Replace is too limited and dangerous –We know there must be a better way... Tabular data files (eg. Spreadsheets) –We work with these all the time, usually in Excel –What tools can help us clean messy data?
Topics Regular Expressions Text Editors Operating on lines, not entire files Google Refine
Regular Expressions A way to describe a set of strings and capture parts of them Originated in old UNIX/POSIX tools Now used all over the place Test your regexes out on the web: –http://gskinner.com/RegExr/http://gskinner.com/RegExr/
Text Editors Word processors aren’t text editors Shop around, compare features My favourite: Vim (UNIX, Windows, Mac) –Wikipedia comparison of editor featuresWikipedia comparison of editor features –Wikipedia list of regex softwareWikipedia list of regex software
Useful Languages / Interpeters Perl –An old favourite, great for string manipulation Python –The cool kids tell me it’s better than Perl GREL –We’ll get to this later...
Google Refine Cleans messy tabular data –Easy facetting and filtering of columns/values –Easy transformation of values Google Refine Expression Language (GREL) –Extensive use of regular expressions and other standard string manipulation techniques Other features –Perform web service calls directly, reconcile row IDs
Conclusion Our problems are solvable! –Regular expressions –Decent text editors for general/unformatted text –Google Refine for tabular data Contact me –Please feel free to contact me with questions, corrections or ideas –email@example.com@auckland.ac.nz –Twitter: @kimshepherd –Google+: firstname.lastname@example.org