© 2006 KDnuggets 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N"

Slides:



Advertisements
Similar presentations
Introduction to Linux command line for bioinformatics Wenjun Kang, MS Jorge Andrade, PhD 6/28/2013 Bioinformatics Core, Center.
Advertisements

SIUG Annual Meeting 2010 UNC Charlotte January 28, 2010 SIUG Annual Meeting 2010 Web Logs: Finally! Now What Do We Do With Them? Dan Pfohl, UNC Wilmington.
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
Unix. Outline Commands Environment Variables Basic Commands CommandMeaning lslist files and directories ls -alist all files and directories mkdirmake.
A Guide to Unix Using Linux Fourth Edition
 *, ? And [ …] . Any single character  ^ beginning of a line  $ end of the line.
CS 497C – Introduction to UNIX Lecture 25: - Simple Filters Chin-Chih Chang
T UTORIAL OF U NIX C OMMAND & SHELL SCRIPT S 5027 Professor: Dr. Shu-Ching Chen TA: Samira Pouyanfar Spring 2015.
Guide To UNIX Using Linux Third Edition
Grep, comm, and uniq. The grep Command The grep command allows a user to search for specific text inside a file. The grep command will find all occurrences.
CSCI 330 T HE UNIX S YSTEM File operations. OPERATIONS ON REGULAR FILES 2 CSCI The UNIX System Create Edit Display Contents Display Contents Print.
UNIX Overview. 2 UNIX UNIX is a multi-user and multi-tasking operating system. Multi-tasking: Multiple processes can run concurrently. Multi-user: different.
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
Unix Files, IO Plumbing and Filters The file system and pathnames Files with more than one link Shell wildcards Characters special to the shell Pipes and.
Unix Filters Text processing utilities. Filters Filter commands – Unix commands that serve dual purposes: –standalone –used with other commands and pipes.
UNIX Filters.
CS 124/LINGUIST 180 From Languages to Information Unix for Poets (in 2014) Dan Jurafsky (From Chris Manning’s modification of Ken Church’s presentation)
Linux Commands LINUX COMMANDS.
BILKENT UNIVERSITY DEPARTMENT OF COMPUTER TECHNOLOGY AND INFORMATION SYSTEMS CTIS156 INFORMATION TECHNOLOGIES II CHAPTER 10: ADVANCED FILE PROCESSING.
Advanced File Processing
Interpreting logs and reports IIPC GA 2014 Crawl engineers and operators workshop Bert Wendland/BnF.
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
Agenda User Profile File (.profile) –Keyword Shell Variables Linux (Unix) filters –Purpose –Commands: grep, sort, awk cut, tr, wc, spell.
LIN 6932 Unix Lecture 6 Hana Filip. LIN 6932 HW6 - Part II solutions posted on my website see syllabus.
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
Jozef Goetz, expanded by Jozef Goetz, 2009 Credits: Parts of the slides are based on slides created by UNIX textbook authors, Syed M. Sarwar, Robert.
1 Day 5 Additional Unix Commands. 2 Important vs. Not Often in Unix there are multiple ways to do something. –In this class, we will learn the important.
Session 2 Wharton Summer Tech Camp Basic Unix. Agenda Cover basic UNIX commands and useful functions.
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110.
Chapter Five Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command.
Module 6 – Redirections, Pipes and Power Tools.. STDin 0 STDout 1 STDerr 2 Redirections.
Pipes and Filters Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
CS 124/LINGUIST 180 From Languages to Information Unix for Poets (in 2013) Christopher Manning Stanford University.
Linux Commands C151 Multi-User Operating Systems.
– Introduction to the Shell 1/21/2016 Introduction to the Shell – Session Introduction to the Shell – Session 3 · Job control · Start,
CS 124/LINGUIST 180 From Languages to Information
CSCI 330 UNIX and Network Programming Unit II Basic UNIX Usage: File System.
Lecture 1: Introduction, Basic UNIX Advanced Programming Techniques.
ORAFACT Text Processing. ORAFACT Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows.
UNIX commands Head More (press Q to exit) Cat – Example cat file – Example cat file1 file2 Grep – Grep –v ‘expression’ – Grep –A 1 ‘expression’ – Grep.
In the last class, Filters and delimiters The sample database pr command head and tail commands cut and paste commands.
Comp 145 – Introduction to UNIX $200 $400 $600 $800 $1000 $200 $400 $600 $800 $1000 $200 $400 $600 $800 $1000 $200 $400 $600 $800 $1000 UNIX Processes.
Learning Unix/Linux Based on slides from: Eric Bishop.
Filters and Utilities. Notes: This is a simple overview of the filtering capability Some of these commands are very powerful ▫Only showing some of the.
SIMPLE FILTERS. CONTENTS Filters – definition To format text – pr Pick lines from the beginning – head Pick lines from the end – tail Extract characters.
Unix Tools Tawatchai Iempairote November 22, 2011.
Tutorial of Unix Command & shell scriptS 5027
Lesson 5-Exploring Utilities
Advanced File Processing
CS 124/LINGUIST 180 From Languages to Information
The UNIX Shell Learning Objectives:
Some Linux Commands.
Chapter 6 Filters.
Linux command line basics III: piping commands for text processing
INTRODUCTION TO UNIX: The Shell Command Interface
Tutorial of Unix Command & shell scriptS 5027
Tutorial of Unix Command & shell scriptS 5027
CS 124/LINGUIST 180 From Languages to Information
Guide To UNIX Using Linux Third Edition
Tutorial of Unix Command & shell scriptS 5027
MeasureCamp VI *NIX for ETL
More advanced BASH usage
CS 124/LINGUIST 180 From Languages to Information
Lecture 4 Redirecting standard I/O & Pipes
Linux Commands LINUX COMMANDS.
Presentation transcript:

© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" " "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;.NET CLR )“ [16/Feb/2006:00:06: ] "GET / HTTP/1.1" " 740_1006" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" [16/Feb/2006:00:06: ] "GET /kdr.css HTTP/1.1" " "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" [16/Feb/2006:00:06: ] "GET /images/KDnuggets_logo.gif HTTP/1.1" " "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" 2b: Unix Tools for Web Log Analysis

© 2006 KDnuggets Unix Unix is a very powerful operating system, with a rich set of tools especially suitable for log analysis. Many flavors: Linux, Cygwin (Unix for Windows), SunOS, … We use basic commands that should work the same on all flavors. We only explain basic options. For more, see man (manual page) E.g. man sort gives the details for sort command.

© 2006 KDnuggets Unix: cat – print a file cat file - print the file to standard output ip7509:56:00/index.html ip7509:56:00/a.jpg ip0209:56:01/ ip7509:56:02/b.jpg ip3009:56:02/test.htm ip0209:56:03/software.html Examples use file a.txt

© 2006 KDnuggets Unix: cat – print a file cat file - print the file to standard output % cat a.txt ip75 09:56:00 /index.html ip75 09:56:00 /a.jpg ip02 09:56:01 / ip75 09:56:02 /b.jpg ip30 09:56:02 /test.htm ip02 09:56:03 /software.html Unix prompt command output

© 2006 KDnuggets Unix: head – first n lines head -n file -- print the first n lines from file; if n is omitted, prints the first 10 lines. Example: % head -2 a.txt ip75 09:56:00 /index.html ip75 09:56:00 /a.jpg

© 2006 KDnuggets Unix: cut – select a column cut file extract a column or set of columns from file Example: % cut -d" " -f1 a.txt ip75 ip02 ip75 ip30 ip02

© 2006 KDnuggets Unix: sort – sort a file sort file sort the file in ascending order Example: % sort a.txt ip02 09:56:01 / ip02 09:56:03 /software.html ip30 09:56:02 /test.htm ip75 09:56:00 /a.jpg ip75 09:56:00 /index.html ip75 09:56:02 /b.jpg

© 2006 KDnuggets Unix: sort – sort a file, 2 sort –t"d" –k n file sort by field # n, where fields are separated by the delimiter character d Example: % sort -t" " -k 3 a.txt ip02 09:56:01 / ip75 09:56:00 /a.jpg ip75 09:56:02 /b.jpg ip75 09:56:00 /index.html ip02 09:56:03 /software.html ip30 09:56:02 /test.htm

© 2006 KDnuggets Unix: | (pipe) – combine commands command1 | command2 send the output of command1 to be the input to command2 Example: % sort -t" " -k 3 a.txt | head -1 ip02 09:56:01 /

© 2006 KDnuggets Unix: uniq – unique lines uniq –c file keeps the unique lines in the sorted file, -c option also produces counts of each line Example: the following commands get a unique list of IP addresses, and also counts % cut -d" " -f1 a.txt | sort | uniq ip02 ip30 ip75 % cut -d" " -f1 a.txt | sort | uniq -c 2 ip02 1 ip30 3 ip75

© 2006 KDnuggets Unix: wc – word/line count wc -l file count lines, words, and characters in file with –l option count only lines Note: –l is ell -- lowercase L -- not one. % cat a.txt | wc % cat a.txt | wc -l 6 % cut -d" " -f1 a.txt | sort | uniq | wc -l 3 count # of unique values in the first column

© 2006 KDnuggets Unix – sed (string editor) sed command [file...] very powerful string editor. E.g. to change "a" to "b" in file a.txt, we can use % cat a.txt | sed 's/a/b/' To change "/index.html" to "/", use % cat a.txt | sed 's/index.html//'

© 2006 KDnuggets Unix: gzip, gunzip, zcat – compress or expand files gzip file -- compress file gunzip file – uncompress file zcat file -- uncompress and cat file Log files are generally very large and stored in compressed form. zcat command allows you to process them without uncompressing them

© 2006 KDnuggets Unix: man – manual page man command print manual page for command you can usually find manual page by googling for unix man command