Administering Splunk 4.2 Ver. 1.0.

Slides:



Advertisements
Similar presentations
DNR-322L & DNR-326.
Advertisements

NetAcumen ActiveX Download Instructions
Unauthorized Reproduction Prohibited SkyPoint Alarm Integration Add-On Using OnGuard Alarms to create events in SkyPoint Also called ‘SkyPoint V0’ CR4400.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
Guide To UNIX Using Linux Third Edition
Hands-On Microsoft Windows Server 2003 Administration Chapter 6 Managing Printers, Publishing, Auditing, and Desk Resources.
Introduction to Unix (CA263) Introduction to Shell Script Programming By Tariq Ibn Aziz.
Linux Operations and Administration
Hosted Exchange The purpose of this Startup Guide is to familiarize you with ExchangeDefender's Exchange and SharePoint Hosting. ExchangeDefender.
03/07/08 © 2008 DSR and LDAP Authentication Avocent Technical Support.
Linux Operations and Administration
Introduction to Shell Script Programming
Copyright ®xSpring Pte Ltd, All rights reserved Versions DateVersionDescriptionAuthor May First version. Modified from Enterprise edition.NBL.
Moodle (Course Management Systems). Assignments 1 Assignments are a refreshingly simple method for collecting student work. They are a simple and flexible.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
CIS 450 – Network Security Chapter 16 – Covering the Tracks.
11 MANAGING AND DISTRIBUTING SOFTWARE BY USING GROUP POLICY Chapter 5.
FTP Server and FTP Commands By Nanda Ganesan, Ph.D. © Nanda Ganesan, All Rights Reserved.
Chapter 8 Cookies And Security JavaScript, Third Edition.
Diagnostic Pathfinder for Instructors. Diagnostic Pathfinder Local File vs. Database Normal operations Expert operations Admin operations.
The world leader in serving science Overview of Thermo 21 CFR Part 11 tools Overview of software used by multiple business units within the Spectroscopy.
SQL SERVER 2008 Installation Guide A Step by Step Guide Prepared by Hassan Tariq.
FTP COMMANDS OBJECTIVES. General overview. Introduction to FTP server. Types of FTP users. FTP commands examples. FTP commands in action (example of use).
Splunk Enterprise Instructor: Summer Partain 3 Day Course.
Lindsey Velez, Director of Instructional Technology Single Sign-On One Click.
© ExplorNet’s Centers for Quality Teaching and Learning 1 Describe applications and services. Objective Course Weight 5%
CACI Proprietary Information | Date 1 PD² v4.2 Increment 2 SR13 and FPDS Engine v3.5 Database Upgrade Name: Semarria Rosemond Title: Systems Analyst, Lead.
MIS 5208 Ed Ferrara, MSIA, CISSP Week 11: Processing and Analyzing Data.
Perform a complete mail merge Lesson 14 By the end of this lesson you will be able to complete the following: Use the Mail Merge Wizard to perform a basic.
9/21/04 James Gallagher Server Installation and Testing: Hands-on ● Install the CGI server with the HDF and FreeForm handlers ● Link data so the server.
Emdeon Office Batch Management Services This document provides detailed information on Batch Import Services and other Batch features.
Troubleshooting Directories and Files Debugging
1 Terminal Management System Usage Overview Document Version 1.1.
SQL Database Management
Core ELN Training: Office Web Apps (OWA)
bitcurator-access-webtools Quick Start Guide
FINESSE AGENT DESKTOP TRAINING.
Data Virtualization Demoette… ODBC Clients
Development Environment
Project Management: Messages
Essentials of UrbanCode Deploy v6.1 QQ147
© 2002, Cisco Systems, Inc. All rights reserved.
Chapter Objectives In this chapter, you will learn:
Creating Oracle Business Intelligence Interactive Dashboards
Configuring ALSMS Remote Navigation
CollegeSource Security Application &
Chapter 2: Configure a Network Operating System
Chapter 2: System Structures
Creating an Oracle Database
Bomgar Remote support software
Intro to PHP & Variables
Boeing Supply Chain Platform (BSCP) Detailed Training
Presenter: Karoline Lapko
CCA Skill Certification
DHCP, DNS, Client Connection, Assignment 1 1.3
Chapter 2: Configure a Network Operating System
1CapApp Company Setup Documentation
What is Cookie? Cookie is small information stored in text file on user’s hard drive by web server. This information is later used by web browser to retrieve.
WEB PROGRAMMING JavaScript.
HC Hyper-V Module GUI Portal VPS Templates Web Console
Tivoli Common Reporting v1.2 Overview
Configuring Internet-related services
Ed Ferrara, MSIA, CISSP MIS 5208 Processing and Analyzing Data Ed Ferrara, MSIA, CISSP
Deploying, Searching & Reporting in Splunk
Boolean Expressions to Make Comparisons
Inside a PMI Online Course
bitcurator-access-webtools Quick Start Guide
Administrator’s Manual
How to install and manage exchange server 2010 OP Saklani.
Presentation transcript:

Administering Splunk 4.2 Ver. 1.0

Document usage guidelines Should be used only for enrolled students Not meant to be a self-paced document Do not distribute March 24, 2011

Class Goals Describe Splunk installation and server operations Configure data inputs Describe default processing and understand how to modify data inputs Manage Splunk datastores Add users, configure groups, and understand authentication Describe alert configurations Configure forwarding/receiving and clustering Use Splunk’s Deployment Server Manage jobs and knowledge objects Find out where to get help

Course Outline Installing Splunk Configuring Data Inputs Modifying Data Inputs Config File Precedence Splunk's Data Stores Users, Groups, and Authentication Forwarding and Receiving Distributed Environments Licensing Security Jobs, Knowledge Objects, and Alerts Troubleshooting

Section 1: Installing Splunk

Section objectives List Splunk’s hardware/software requirements Describe how to install Splunk Perform server basics; starting, stopping, and restarting Splunk Describe the Splunk license model List the basic tools to configure Splunk: Manager, CLI, and editing config files Describe apps Upgrade to 4.2 List what’s new in Splunk 4.2 for administrators

OS requirements Splunk works on Windows, Linux, Solaris, FreeBSD, MacOS X, AIX, and HP-UX Check current documentation for specifics for each OS

Hardware Requirements Platform Recommended Configuration Minimum Configuration Non-Windows OS 2x quad-core Xeon, 3GHz, 8GB RAM, RAID 0 or 1+0, with a 64 bit OS installed 1x1.4 GHz CPU, 1 GB RAM Windows Pentium 4 or equivalent at 2Ghz, 2GB RAM Tell students that hard drive and file system requirements will be covered in the index section. Please take with a grain of salt High volume/high user installs will have greater requirements and/or multi-server installs Also note that minimum configuration implies minimum performance.

Supported browsers Firefox 2.x, 3.0.x, 3.5.x (3.5.x only supported on 4.0.6 and later) Internet Explorer 6, 7, & 8 Safari 3 Chrome 9 All browsers need Flash 9 to render reports and display the flash timeline

Download the bits Download Splunk from www.splunk.com/download (login required) Online installation instructions are available from the download page Obtain your enterprise license from sales or support

Download the right bits There are 32 and 64 bit versions, get the right one The wrong version may install, but won’t run Various packages, tarballs, and installers are available for each OS

Install it! For zipped tarballs simply unpack the contents into the directory you want to install Splunk For Windows just double click on the MSI file See the docs for OS specific packages, and Windows command line install instructions Splunk install directory is referred to as $SPLUNK_HOME in both the docs and courseware UNIX default is /opt/splunk Windows default is C:\Program Files\splunk

Step by step instructions www.splunk.com/base/Documentati on/latest/Installation/Chooseyourplat form

UNIX: to be or not to be root? Splunk can be installed as any user If you do not install as root, remember The Splunk account must be able to access the data sources /var/log is not typically open to non-root accounts Non-root accounts cannot access ports < 1024, so don’t use them when you configure data sources Make sure the Splunk account can access scripts used for inputs and alerts

Windows: local or domain user? 2 choices in Windows: local user OR domain account Local user will have full access ONLY to the local system You must use a domain account for Splunk if you want to: Read Event Logs remotely Collect performance counters remotely Read network shares for log files Enumerate the Active Directory schema using Active Directory Monitoring See the docs for details

Splunk subdirectories Executables are located in $SPLUNK_HOME/bin License and other important files are in $SPLUNK_HOME/etc Indexes by default are in $SPLUNK_HOME/var/lib/splunk Same directories in Windows, just different slashes

Splunk directory structure $SPLUNK_HOME bin etc var licenses, configs executables lib system apps users splunk indexes search launcher <custom app>

Windows: Starting Splunk Upon successful installation, you can choose to add Splunk to the start menu Tell students that Windows installer automatically starts Splunk at the end of install.

Windows: controlling Splunk services Splunk installs 2 services splunkd and Splunk Web Start and stop them as you would any service Both are set to startup automatically You can also control Splunk from the command line C:\Program Files\Splunk\bin>splunk start C:\Program Files\Splunk\bin>splunk stop C:\Program Files\Splunk\bin>splunk restart

UNIX: Starting Splunk The command for using/managing Splunk is $SPLUNK_HOME/bin/splunk The first time you start Splunk, avoid the prompt to accept the license by using the command line tag --accept-license # pwd /opt/splunk/bin # ./splunk start Be sure to mention the two dashes in accept license. Also mention that it’s useful for scripted installs # pwd /opt/splunk/bin # ./splunk start --accept-license

UNIX: controlling Splunk processes Stopping/starting Splunk Restarting Splunk Is Splunk running? # ./splunk start # ./splunk stop # ./splunk restart # ./splunk status or # ps –ef | grep splunk

UNIX: run Splunk at boot Splunk comes with a command to enable it to start at boot This modifies or adds a script to /etc/init.d that will automatically start Splunk when the OS starts Even if you didn’t install Splunk as root, this command must be run as root # ./splunk enable boot-start Whether the script adds or modifies a script to /etc/init.d depends on your OS.

Splunk processes – splunkd Accesses, processes, and indexes incoming data Handles all search requests and returns results Runs a web server on port 8089 by default Speaks SSL by default Runs Splunk helpers run as dependent process(es) of splunkd Splunk helpers run outside scripts, for example: Scripted inputs Cold to frozen scripts

Splunk processes – Splunk Web Python based web server based on CherryPy Provides both search and management web front end for splunkd Runs on port 8000 by default Sets initial login to user: admin password: changeme

Apps Apps are configurations of a Splunk environment designed to meet a specific business need Manage a specific technology Splunk for Websphere Splunk for Cisco and many more . . . Manage a specific OS Splunk for Windows Splunk for UNIX/LINUX Manage compliance PCI Enterprise Security Suite

splunkbase Choose from hundreds of apps on splunkbase.splunk.com Apps developed by Splunk as well as the community are available Vast majority of apps are free, so don’t be shy! Mention that only 2 apps (ESS and PCI) cost money at this time

Managing a Splunk installation Three ways to manage a Splunk installation Command Line Interface (CLI) Directly editing config files Splunk Manager interface in Splunk Web

Managing a Splunk installation - CLI Command Line Interface (CLI) Shell access to Splunk server and user access to Splunk directory required Most commands require authentication and admin role to run If you don’t provide inline authentication credentials, Splunk will ask you ./splunk clean eventdata main -auth admin:myadminpass command object authentication (inline)

Command line interface (CLI) Also requires authentication Enter auth as part of command or wait for prompt Inline help is available #./splunk add monitor /var/log –host www1 Splunk username: admin Password: #./splunk help Welcome to Splunk's Command Line Interface (CLI). Try typing these commands for more help: help simple, cheatsheet display a list of common commands with syntax help commands display a full list of CLI commands help [command] type a command name to access its help page

Managing a Splunk installation – config files Directly editing config files Shell/console access to Splunk server and sufficient user rights to edit files in the Splunk directory Config files must be saved in UTF8, be sure to use the right form for non-UTF8 OS Changes made this way more often require a restart

Direct editing of config files Changes done this way sometimes require a restart or reload of Splunk You can let the students know that for the most part there is no easy command to have Splunk reload its configs. |extract reload=T done in SplunkWeb will get Splunk to reload props and transforms but that’s it.

Managing a Splunk installation - Manager Splunk Manager interface in Splunk Web Access to Splunk Web Admin role on the Splunk server Access from the main navigation – Manager link

Splunk Manager – general settings

Splunk Manager – general settings (cont.) /opt/splunk

Splunk Manager – general settings (cont.)

Splunk Manager – general settings (cont.) Click Save when you are done All changes to general settings will require a restart

More Resources Look on Splunkbase for additional Apps to help you manage your Splunk servers http://www.splunkbase.com/apps/All/4.x There is a Troubleshooting section in the Splunk Admin manual http://www.splunk.com/base/Documentation/latest/Admin

Lab 1

Section 2: Configuring Data Inputs

Section objectives Set up data inputs List Splunk’s data input types and explain how they differ Set input properties such as host, ports, index, source type, etc.

Specifying data inputs There are a number of ways you can specify a data input: Apps Preconfigured inputs for various types of data sources available on splunkbase Splunk Web You can configure most inputs using the Splunk Web data input pages CLI You can use the CLI (command line interface) to configure most types of inputs inputs.conf When you use Splunk Web or CLI, configurations are saved to inputs.conf You can edit that file directly to handle advanced data requirements

Types of inputs Files and directories – monitor physical files on disk Network inputs – monitor network data feeds on specific ports Scripted inputs – import from non-traditional sources, APIs, databases, etc. Windows inputs – Windows specific: Windows event logs, performance monitoring, AD monitoring, and local registry monitoring File system change monitoring – monitor the state: permissions, read only, last changed, etc. of key config or security files

Setting up new inputs – Apps / Add-ons configure input through app setup process

Setting up new inputs – Manager Admin role and access to SplunkWeb Changes written to inputs.conf Location of inputs.conf is determined by app context

Setting up new inputs – CLI Admin role and shell/console access to Splunk server required* Useful for administering forwarders Location of inputs added via the CLI is the Search app **Using the -uri flag you can send remote CLI commands from a local Splunk instance to a remote instance without shell access. See the docs for details. http://www.splunk.com/base/Documentation/latest/Admin/AccessandusetheCLIonaremoteserver #./splunk add monitor /var/log –hostname www1 –index webfarm Your session is invalid. Please login. Password: Added monitor of ‘/var/log’

Setting up new inputs – inputs.conf Skip the middleman of Manager or the CLI and directly edit inputs.conf Shell/console access to Splunk server required Changes made this way require a restart [default] host = mysplunkserver.mycompany.com [monitor:///opt/secure] disabled = false followTail = 0 host_segment = 3 index = default sourcetype = linux_secure [monitor:///opt/tradelog] disabled = false sourcetype = trade_entries On our classroom Linux servers, students can use either vi or nano to edit the files

inputs.conf (cont.) Input path specifications in inputs.conf (monitor stanzas) use Splunk- defined wildcards (also used by props.conf, discussed in next section) (these are not REGEX-compliant expressions) Wildcard Description Regex equivalent Example(s) ... The ellipsis wildcard recurses through directories and subdirectories to match. .* /var/log/…/apache.log matches the files /var/log/www1/apache.log, /var/log/www2/apache.log, etc. * The asterisk wildcard matches anything in that specific directory path segment. Note: must be used in the last segment of the path. [^/]* /logs/*.log matches all files with the .log extension, such as /logs/apache.log. It does not match /logs/apache.txt.

inputs.conf (cont.) Syntax details: So . . . matches any character(s) recursively * matches anything 0 or more times except the / . is NOT a wildcard and simply matches the . Literally Syntax details: $SPLUNK_HOME/etc/system/README/inputs.conf.spec http://www.splunk.com/base/Documentation/latest/Admin/Inputsconf http://www.splunk.com/base/Documentation/latest/Admin/Specifyinput pathswithwildcard

Setting source, sourcetype, and host You can specify source, sourcetype, and host at the input level for most inputs Source Should be left to the default Sourcetype Most default processing for standard data types is based on sourcetype Whenever possible use automatic sourcetype, select from Splunk’s list, or use the recipes Host Opt for specific hostnames/FQDN as much as possible since the host field is a key search tool Be sure to tell students that what’s on this slide might not make a lot of sense now, but it will once they’ve completed the next section.

Data inputs – monitor Monitor – eats data from specified file(s) or directory(ies) Where Can be pointed to an individual file or the top of a complex directory hierarchy Recurses through specified directory Indexes any directory the Splunk server can reach, local or remote file systems How Unzips compressed files automatically before indexing them Eats new data as it arrives Automatically detects and handles log rotation “Remembers” where it was in a file and picks up from that spot after restart

Data inputs – monitor (cont.) What Uses whitelists and blacklists to include or exclude files and directories Can be instructed to start only at the end of a large file (like tail –f) Can automatically assigns a source type to events, even in directories containing multiple log files from different systems, processes, etc.

Monitor via Manager (called Files & Directories) add new input edit existing input

Monitor file or directory – Manager

Monitor file or directory – Manager: Source Specify a file or directory for ongoing monitoring Can also upload a copy of a file Useful for testing and development

Monitor a file or directory – Manager: Host Specify a constant value if all monitored files in an input are from the same host

Monitor a file or directory – Manager: Host When multiple hosts write to the same directory and the host name appears in the file name or part of the path, use REGEX on path to extract the host name /var/log/www1.log will extract www1 /var/log/www_db1.log will extract www_db1

Monitor a file or directory – Manager: Host When multiple hosts write to the same directory and host name appears as a consistent subdirectory in the path, use segment in path /logs/www1/web.log or /logs/www2/web.log

Monitor a file or directory – Manager: Sourcetype Automatic Splunk automatically determines source type for most major data types Useful for directories with many different types of log files Manual Enter a name for a specific sourcetype From list Choose the sourectype from the dropdown list

Monitor a file or directory – Manager: Index Select the index where this monitor input will be stored If you want to put a new input in a new index, you must create the index before the input

Monitor a file or directory – Manager: Follow tail Follow tail works like “tail -f” – it starts at the end of the file and only eats new input from that point forward Only applies to the very first time the new monitor input is added

Monitor a file or directory – Manager: Whitelist If a file is whitelisted, Splunk consumes it and ignores all other files in the set Use whitelist rules to tell Splunk which files to consume when monitoring directories This whitelist will only index files that end in .log Use a | to create OR statements: indexes files that end in query.log or my.log Add a leading slash to insure an exact file match: only indexes query.log and my.log

Monitor a file or directory – Manager: Blacklist If a file is blacklisted, Splunk ignores it and consumes all other files in the set Use blacklist rules to tell Splunk which files not to consume when monitoring directories This blacklist won't index files that end in .txt Use a | and () to create OR statements: won't index files that end in .txt or .gz This blacklist avoids both archive and historical directories (as well as files named archive and historical)

Scripted inputs Splunk can run scripts periodically that generate input Scripts need to be shell (.sh) on *nix or batch (.bat) on Windows Or Python on any platform Can use any scripting language the OS will run if wrapped in a shell or batch “wrapper” Splunk eats the standard output (stdout) of the script Use them to run diagnostic commands such as top, netstat, vmstat, ps, etc. Used in conjunction with many Splunk Apps to gather specialized information from the OS or other systems running on the server Also good for gathering data from APIs, message queues, or other custom connections

Setting up a scripted input Write or obtain the script Copy it to your Splunk server’s script directory If possible, test your script from that directory to make sure it runs correctly Set up input in Manager Click save and wait for a few intervals to pass, then verify that the input is available in Search or its App

Manager – Scripted inputs

Manager – Scripted inputs (cont.) Splunk will only run scripts from specified bin directories $SPLUNK_HOME/etc/system/bin OR $SPLUNK_HOME/etc/app/<app_name>/bin Interval is in seconds, though you can also specify a schedule using CRON syntax The interval is the time period between script executions Instructor note: source and sourcetype for scripted intputs aren’t all that sensitive with regards to processing. Since the vast majority of scripted inputs are customer customized there is little default processing tied to them. Tell students to set the sourcetype and source to meet their own identification needs.

Manager – Network inputs

Manager – Network inputs: Source port TCP or UDP feeds from 3rd party systems (not Splunk Forwarders) Splunk can be configured to “listen” to a specified UDP or TCP data feed and index the data Can be set to accept feeds from any host or just one host on that port Can specify any non-used network port (that is NOT splunkd’s or Splunk Web’s ports)

Manager – Network inputs: source and sourcetype By default Splunk will set the source to be host:port a syslog feed from a firewall named “fw_01” would have fw_01:514 for its source Only two options for sourcetype, from list or manual If there are multiple sourctypes coming from a single network feed you will need to configure further processing to handle it (Covered in the next section)

Manager – Network inputs: Host Three choices for host: IP – Splunk will use the IP address of the sender (default) DNS – Splunk will do a reverse DNS lookup for the host name Custom – allows you to specify a specific host name

File system change monitoring FSChange (must be setup in inputs.conf) monitors changes to files and directories DOES NOT index the contents of the files and directories Writes an event to an index when it detects a change or deletion Monitors: Modification date/time group ID user ID file mode (read/write attributes, etc.) optional SHA256 hash of file contents

Setting up fsmonitor Set up a stanza in inputs.conf List the directory you want Splunk to monitor DO NOT use file system change monitoring on a directory that is being indexed using Monitor Default sourcetype = fs_notification pollPeriod is interval in seconds Splunk checks the files for changes [fschange:/etc/] pollPeriod = 60 host = splunkserver.company.com

Windows inputs Windows inputs must be set up on a Windows Splunk instance UNIX indexers CAN and will index and search Windows inputs Set up a Universal Forwarder or Light Forwarder to get Windows inputs to a UNIX indexer

Windows inputs – Local or remote event logs Local event logs can be collected from a Universal Forwarder or the local indexer Remote event log collection requires proper domain account permissions on the remote machine

Windows inputs – local event logs Select the event logs you wish Splunk to index For further settings, edit inputs.conf directly

Windows inputs – remote event logs Enter a host to choose logs Click Find logs… to populate the available logs list Optionally, you can collect the same set of logs from additional hosts Enter host names or IP addresses, separated by commas

Windows event log settings in inputs.conf start_from - Use this setting to tell Splunk to start with the newest events and then work its way back to the oldest – default is oldest current_only - If set to 1, Splunk will only index events starting from the day the input was set up and going forward – default is 0

Windows inputs – Performance monitor Use Performance Monitor to collect data from a local machine – Forwarder or Indexer

Windows inputs – Performance monitor (cont.) Select an object to monitor Based on the object you select, the Counters section is populated with available counters

Windows inputs – Performance monitor (cont.) Select instances Set the polling interval

Windows inputs – Registry monitoring Indexes the registry whole cloth, as well as any ongoing changes See the docs for details on limiting what is actually monitored www.splunk.com/base/Docu mentation/latest/Admin/Monit orWindowsregistrydata

Windows inputs: AD monitoring You can specify a domain controller or let Splunk discover the nearest one You can then specify the highest node in the tree you want Splunk to monitor Splunk will move down the tree recursively If unchecked will index the entire tree, including the schema Use permissions of the Splunk users to limit what it can monitor in AD www.splunk.com/base/Documentation/latest/Admin/AuditActiveDirectory

Windows inputs – Windows app Installing the Windows app allows you to collect and monitor several common windows input types

Lab 2 – Data inputs

Section 3: Modifying Data Inputs

Section objectives Describe how data moves from input to index Understand the default processing that occurs during indexing List the config files that govern data processing Learn how to override default data processing Learn how to discard unwanted events Learn how mask sensitive data Learn how to extract fields

Input to Index Big Picture Network inputs Windows inputs Disk Monitor inputs Scripted inputs

Indexing phases License Meter Input Phase: Raw data from all forms of input collected Parsing Phase: Raw data broken down into events, and then event by event processing Indexing Phase: Index generated and data is written to disk License Meter

Inputs phase details Inputs phase works with entire streams of data, not individual events. Overarching metadata is applied. inputs.conf source, sourcetype, and host props.conf CHARSET and sourcetyping based on source windows files wmi.conf and regmon-filters.conf Inputs.conf has other settings like index= as well, but they don’t come in to play at this stage. 2 other props.conf settings which are beyond the scope of this class are CHECK_METHOD and NO_BINARY See: www.splunk.com/wiki/Where_do_I_configure_my_Splunk_settings%3F for details

props.conf props.conf is a config file that plays a role in all aspects of Splunk data processing Governs most aspects of data processing, can also invoke settings in other config files Uses similar “stanza” format of inputs.conf and other Splunk config files See $SPLUNK_HOME\etc\system\README\props.conf.spec and props.conf.example for syntax and examples

props.conf specifications props.conf stanzas use specifications to map configurations to data streams The specification can be either host, source, or sourcetype source and sourcetype specs are case sensitive, host is NOT Pattern Example The host field is case insensitive to be more in line with DNS. You can make it case sensitive by adding the REGEX (?-i) [host::<hostname>] attribute = value [source::<source>] [<sourcetype>] [host::www1] TZ = US/Pacific [source::/var/log/trade.log] sourcetype = trade_entries [syslog] TRANSFORMS-host=per_event_host

Inputs phase props.conf sourcetype can be set based on source during the inputs phase CHARSET spec can be set at this time. Default is automatic, use this setting to override if auto is not working correctly. See docs for list of character sets www.splunk.com/base/Documentation/base/Data/Configurecharactersetencoding [source::/var/log/custom*] sourcetype = mycustomsourcetype [source::...\\web\\iis*] sourcetype = iis_access [source::.../seoul/*] CHARSET = EUC-KR [source::h:\\web\\თბილისი\\*] CHARSET = Georgian-Academy

Parsing phase big picture Data from inputs phase are broken up into individual events, and then any event-level processing is performed. “Chunks” of data from inputs phase Broken into individual events. Event-by-event processing

Parsing phase details A majority of data processing work is done during the parsing phase Actual event boundaries are decided, date/timestamp are extracted and any type of per-event operation is performed automatic auto-sourcetyping, auto-date/timestamping, and auto-linebreaking, time zone override per-event REGEX based sourcetype, host, or index settings, custom line breaking and date/timestamping custom REGEX/SEDCMD rewrites, per-event routing to other indexers, 3rd party systems, or the “null queue”

Parsing phase: automatic Switches data to UTF-8 By default Splunk will attempt to automatically detect event boundaries (monitor and network inputs) extract date/timestamps (monitor and network inputs) assign sourcetypes (for monitor input only) Default settings are in $SPLUNK_HOME/etc/system/default/props.conf in the parsing phase props.conf can call stanzas in another config file transforms.conf located in the same directory

It’s automatic . . . Success rate of automatic processing will vary. For standard data types such as syslog, web logs, etc., Splunk does a great job. For custom, or esoteric logs you’ll need to test, though even then the odds are good it will get it right. Correct date/timestamping and linebreaking are key to subsequent processing and the ultimate “searchability” of data Other types of automatic processing Window inputs syslog host extraction www.splunk.com/base/Documentation/base/Data/Overviewofeventpro cessing

Line breaking If automatic event boundary detection is not working correctly Bad event breaking is usually easy to detect in indexed test data, but be careful since bad line breaking can show up as bad timestamping 2 methods SHOULD_LINEMERGE = false (most efficient) Using this method Splunk cuts the data stream directly into finished events using either the new line \n or carriage return \r characters (default) or a REGEX you specify with LINE_BREAKER SHOULD_LINEMERGE = true Splunk uses a configurable two-step process to split your data into individual events

SHOULD_LINEMERGE = false Already set for many standard types of data including syslog (including snare), windows inputs, and web data See $SPLUNK_HOME/system or apps/<app_name>/default/props.conf for details Should be set for custom data with one event per line formats breaking on /n or /r characters Or if possible use other pattern breakers, but be ready to sacrifice the characters that make up the pattern from your raw data The characters that make up the pattern match aren’t kept as part of the events

SHOULD_LINEMERGE = true The default if not specified Splunk merges multiple lines of data into single events based on the rule, new line with a date at the start or 256 total lines marks an event boundary BREAK_ONLY_BEFORE_DATE = true (the default) MAX_EVENTS = 256 (default) Certain predefined data types like log4j and other application server logs use BREAK_ONLY_BEFORE = <REGEX pattern> that when matching the start of a new line, marks the start of a new event

Custom line merge If your multiline data and default processing don’t get along – beyond the BREAK_ONLY_BEFORE setting there are many more REGEX based settings to divide up your events www.splunk.com/base/Documentation/latest/Data/Indexmulti-lineevents for details see also $SPLUNK_HOME/etc/system/README/props.conf.spec or www.splunk.com/base/Documentation/latest/Admin/Propsconf

Date/timestamp extraction Like event boundaries, correct date/timestamp extraction is key to Splunking your data Verify timestamping when setting up new data types Pay close attention to time stamping during testing/staging of custom/or non- standard data types Convert UNIX time or other non-human readable time stamps and compare Well tuned for standard data types See props.conf in $DEFAULT and http://www.splunk.com/base/Documentation/latest/Data/ConfigureTimestampR ecognition for details

Custom date/timestamp – props.conf TIME_PREFIX = <REGEX> which matches characters right BEFORE the date/timestamp Use this for events with multiple timestamps to pinpoint the correct one or with events that have data that looks like a timestamp but isn’t that confuses the processor Example data with “date-like” code at the start of the line 1989/12/31 16:00:00 ed May 23 15:40:21 2011 ERROR UserManager - Exception thrown Start looking here for date/timestamp [my_custom_source_or_sourcetype] TIME_PREFIX = \d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2} \w+\s

Custom date/timestamp – props.conf (con’t) MAX_TIMESTAMP_LOOKAHEAD = <integer> specifying how many characters to look beyond the start of the line for a timestamp works in conjunction with TIME_PREFIX if set, in which case it starts counting from the point the TIME_PREFIX indicates Splunk should start looking for the date/timestamp Improves efficiency of timestamp extraction As with multiline event configs, see $SPLUNK_HOME\etc\system\README\props.conf.spec and the docs for even more options if necessary www.splunk.com/base/Documentation/latest/Data/Handleeventtimesta mps

Time zones Splunk follows these default rules when it attaches a time zone to a time stamp It looks in the raw event data for a time zone indicator such as GMT+8 or PST and uses that It looks in props.conf to see if a TZ attribute has been given for this data stream based on standard settings referenced here: en.wikipedia.org/wiki/List_of_zoneinfo_timezones If all else fails it will apply the time zone of the indexer [host::nyc*] TZ = America/New York [source::/mnt/cn_east/*] TZ = Asia/Shanghai

Time and Splunking Splunk depends heavily on existing time infrastructure Timestamps in Splunk are only as good as the time settings on servers and devices that feed into Splunk A good enterprise time infrastructure makes for good timestamping which makes for good Splunking

Per event REGEX changes Splunk can modify data in individual events based on REGEX pattern matches Requires invoking a second file, transforms.conf (see next slide) Using props.conf and transforms.conf you can disable/modify existing modifications, or add your own custom settings

transforms.conf Config file whose stanzas are invoked by props.conf All caps TRANSFORMS = <transforms.conf_stanza> syntax used to invoke index time changes Required for all REGEX pattern match processing Resides in the same directory(ies) as props.conf Can also be called at search time by REPORT, LOOKUP (search time section coming up) $SPLUNK_HOME/etc/system/default/transforms.conf [syslog-host] DEST_KEY = MetaData:Host REGEX = :\d\d\s+(?:\d+\s+|(?:user|daemon|local.?)\.\w+\s+)*\[?(\w[\w\.\-]{2,})\]?\s FORMAT = host::$1 $SPLUNK_HOME/etc/system/default/props.conf [syslog] TRANSFORMS = syslog-host

transforms.conf (cont.) Transforms uses standard settings to indicate what its REGEX will match and what it will rewrite based on the match The source and destination of these tranformations are referred to as “keys” SOURCE_KEY tells Splunk where to apply the REGEX (optional) DEST_KEY tells Splunk where to apply the data modified by the REGEX and FORMAT setting (required) REGEX is the regular expression and capture groups (if any) that operate on the SOURCE_KEY (required) FORMAT controls how REGEX writes the DEST_KEY (required)

Keys in action From the default syslog host extraction [syslog-host] We are updating the host field, so our DEST_KEY is MetaData:Host, for sourcetype it would be MetaData:Sourcetype, for index it would be_MetaData:Index (Case and for index the underscore counts!) See transforms.conf.spec for details. [syslog-host] DEST_KEY = MetaData:Host REGEX = :\d\d\s+(?:\d+\s+|(?:user|daemon|local.?)\.\w+\s+)*\[?(\w[\w\.\-]{2,})\]?\s FORMAT = host::$1 The REGEX pattern here is looking for a host name embedded in syslog data. Only one capture group is referenced here: the 2nd set of parenthesis. In this circumstance we would expect the host name to appear within the 2nd set of parenthesis. FORMAT specifies what is written out to the DEST_KEY. Here host::$1 means host=1st REGEX capture group.

Setting sourcetype per event You can configure Splunk to set sourcetype on a per event basis This should be your sourcetypeing of “last resort” since inputs.conf settings and source based sourcetyping using just props.conf are less resource intensive In props.conf In transforms.conf A value after TRANSFORMS give this transformation a name space, this comes into play for multiple transformations and provides precedence if needed [source::udp:514] TRANSFORMS-1srct = custom_sourcetyper Any event from this source where the last word of the line is “Custom” will get the sourcetype of “custom_log” [custom_sourcetyper] DEST_KEY = MetaData:Sourcetype REGEX = .*Custom$ FORMAT = sourcetype::custom_log

Per event index routing Like sourcetype, if at all possible specify the index for your inputs in inputs.conf props.conf transforms.conf Note the use of _MetaData:Index [routed_sourcetype] TRANSFORMS-1indx = custom_sourcetype_index We’re using a wide open REGEX since we want everything classified as this sourcetype routed to a different index. More granular routing would have a more complex REGEX [custom_sourcetype_index] DEST_KEY = _MetaData:Index REGEX = . FORMAT = custom_index For index routing, the FORMAT simply takes the name of the index you are routing to

Filtering unwanted events You can route specific unwanted events to the “null queue” Events discarded at this point do NOT count against your daily license quota props.conf transforms.conf Here our DEST_KEY is queue since we’re routing these events outside the data flow [WinEventLog:System] TRANSFORMS-1trash = null_queue_filter Be sure to mention that whitelisting and blacklisting in inputs.conf is a better way to do this if possible. Since Windows Event logs are multiline events we need to use the REGEX multiline indicator (?m). Applies to any multiline event and REGEX, not just null queue [null_queue_filter] DEST_KEY = queue REGEX = (?m)^EventCode=(592|593) FORMAT = nullQueue FORMAT indicating nullQueue means we are throwing away events that match this pattern

Other routing Beyond routing to the nullQueue, you can also route data to: other Splunk indexers 3rd party systems See for details www.splunk.com/base/Documentation/latest/Admin/Routeandfilterdata

Modifying the raw data stream Sometimes it’s necessary to modify the underlying log data, especially in the case of privacy concerns Splunk provides 2 methods of doing this, REGEX and SEDCMD The REGEX method uses transforms.conf and works on a per-event level, the SEDCMD uses only props.conf and operates on an entire source, sourcetype, or host identified stream Care should be taken when modifying _raw since unlike all other modifications discussed, this sort actually modifies the raw log data

DEST_KEY = _raw indicates we are modifying the actual log data Modifying _raw - REGEX Works similarly to previous props.conf / transforms.conf modifications props.conf transforms.conf [source::...\\store\\purchases.log] TRANSFORMS-1ccnum = cc_num_anon DEST_KEY = _raw indicates we are modifying the actual log data [cc_num_anon] DEST_KEY = _raw REGEX = (.*CC_Num:/s)\d{12}(\d{4}.*) FORMAT = $1xxxxxxxxxxxx$2 $1 preserves all the data prior to the first 12 digits of the credit card number. $2 grabs everything after including the last 4 digits, we need to do this since we are “rewriting” the raw data feed

Modifying _raw – SEDCMD Splunk leverages a “sed-like” syntax for simplified data modifications Note that while sed is traditionally a UNIX command, this functionality works on Windows- based Splunk installs as well It’s all done with a single stanza in props.conf The REGEX syntax using “s”: SEDCMD-<name> = s/<REGEX>/<replacement>/flags flags are either “g” to replace all matches, or a number to just replace that number of matches The string match syntax using “y”: SEDCMD-<name> = y/<string1>/<string2> String matches cannot be limited, all matches will be replaced String1 will be replaced with string2

\1 here works like a $1 back-reference in transforms.conf REGEX SEDCMD con’t An example SEDCMD REGEX based replacement to overwrite the first 5 digits of an account number anytime it appears in the “accounts.log” source This will replace id_num=12345-67890 with id_num=xxxxx- 67890 You can put multiple replacement rules in a single props.conf stanza, simply put a space and start again with s/ \1 here works like a $1 back-reference in transforms.conf REGEX [source::.../accounts.log] SEDCMD-1accn = s/id_num=\d{5}(\d{5})/id_num=xxxxx\1/g

Parsing phase: override Splunk’s automatic processing can be overridden/disabled Make your changes to files in $SPLUNK_HOME/etc/system/local or $SPLUNK_HOME/etc/<app_name>/local To disable create/edit props.conf in $SPLUNK_HOME/etc/system/local or local directory of an app Turn off syslog host extraction for the “syslog” sourcetype $SPLUNK_HOME/etc/system/local/props.conf $SPLUNK_HOME/etc/system/default/props.conf overwrites [syslog] TRANSFORMS = [syslog] TRANSFORMS = syslog-host

Indexing phase details After the parsing phase Splunk passes the fully processed data off to the index processor license meter _raw is metered for license usage index created Keyword index created, _raw is compressed and both are written to disk Disk

Persisted to disk Once data reaches hard disk all modifications and extractions are written to disk along with _raw source, sourcetype, host, timestamp, and punct Indexed data cannot be changed Modifications to processing won’t be retroactive without reindexing For this reason it’s recommended to test default and custom index time processing on a staging instance prior to indexing in production

Search phase – Big picture Searches from users or alerts Search time modifications Disk

Search phase – Big picture RT search Real time searches work similarly except they bypass disk Real time searches from users or alerts Search time modifications Index phase Disk

Search time modifications MANY different transformations/updates/modifications are available at search time Data (usually sourcetype) dependent field extractions both custom, default, or from add-ons or apps Lookups, event types, tags, field aliases, and many more . . . These changes only apply to search results, no modification to data written to disk Fully retroactive – designed to be flexible Best way to customize data and build institutional knowledge into Splunk

Search time for admins Splunk expands the ability to create most search time mods to the Power and User roles Most are covered in more user/knowledge manager oriented classes Most can be fully administered through Splunk Web’s manager Admins may be called on to install apps and add ons (already covered) Remember, apps/add ons are bundles of search time lookups, field extractions, tags, etc. NOT just views and dashboards Create custom field extractions and change/disable search time modifications using the file system

Default field extractions at search time Most fields used in Splunk come from your data For many common sourcetypes Splunk has default search time field extractions in place Additional default extractions are easy to add with Add Ons and Apps The *Nix app for example has many search time fields for standard UNIX-y logs like secure.log or messages.log, etc. The Windows app has similar defaults for Windows data For non-OS data, look for an app specifically designed for that data on www.splunkbase.com

3 ways to create a search time field Editing config files - available only to admins, knowledge of REGEX required Using the IFX in Splunk Web (covered in Using) - available to admin and power role, knowledge of REGEX helpful but not required Using the rex command in the search language (covered in Search & Reporting) - all roles can use this command, knowledge of REGEX required

The usual suspects Custom search time fields are created by stanzas in props.conf and sometimes transforms.com 2 methods 1. using just props.conf EXTRACT Simple single field extractions Available after Splunk 4.0 Recommended method covered here 2. using props.conf REPORTS and transforms.conf Useful for reusing extractions across multiple sourcetypes www.splunk.com/base/Documentation/latest/Knowledge/Createandmaintainsearch- timefieldextractionsthroughconfigurationfiles for details

props.conf EXTRACT A single stanza in props.conf using EXTRACT with a source, sourcetype, or host spec (usually a sourcetype) Use the EXTRACT command with a name and the REGEX after the equals sign props.conf [tradelog] EXTRACT-1type = .*type:\s(?<acct_type>personal|business) Wrap parenthesis around your field value to created a named capture, and then embed your field name within those parenthesis with ?<field_name>

Other search time processing Many other knowledge objects/search time processing are stored in other config files macros.conf, tags.conf, eventtypes.conf, savedsearches.conf, etc. When users create or modify these Splunk Web simply writes to these files for them Admins can directly modify these files, though we recommend using Manager if possible See .conf files in $SPLUNK_HOME/etc/system/README and the docs for details on specific files

Lab 3

Section 4: Config Precedence

Config files and precedence UI or CLI changes also update config files Splunk gathers up all of the various config files and combines them at index and search time based on rules of precedence Rules of precedence vary depending on if configurations are being applied at search time or index time Index time precedence relies solely on the location of the files Search time precedence also takes into account which user is logged- in and which app they are using

Index time precedence At index time, Splunk applies precedence in the following order $SPLUNK_HOME/etc/system/local $SPLUNK_HOME/etc/apps/<app_name>/local** $SPLUNK_HOME/etc/apps/<app_name>/default $SPLUNK_HOME/etc/system/default **Note that within the $SPLUNK_HOME/etc/apps directory individual apps get precedence based on ASCII alphabetical order. So an app called “aardvark” would have precedence over the “windows” app. But an app called “1windows” would have precedence over “aardvark” since numbers come before letters in ASCII. Also note that ASCII order is not numerical order, so 1 would come before 2, but 10 would also come before 2!

Index time precedence 2 and 3 in ASCII order by app name $SPLUNK_HOME etc system apps users 6 1 unix search local joe mary default admin unix search 5 3 4 2 default local default local local local 2 and 3 in ASCII order by app name

Search time precedence Search time has the following precedence order $SPLUNK_HOME/etc/users/<username>/<app_context>/local** $SPLUNK_HOME/etc/apps/<app_context>/local and default** $SPLUNK_HOME/etc/system/local $SPLUNK_HOME/etc/apps/<app_by_ASCII>/local*** $SPLUNK_HOME/etc/apps/<app_by_ASCII>/default $SPLUNK_HOME/etc/system/default ** app_context is the app the user is currently in/using and username refers to the actual user name the user logged in as ***app_by_ASCII refers to the ASCII order referred to in the previous slide

Search time precedence Example: mary working in the unix app context $SPLUNK_HOME etc system apps users 7 4 unix search joe mary default local admin unix search 3 2 6 5 default local default local 1 local local After 3, the earlier pattern applies

Precedence is cumulative At index time if $SPLUNK_HOME/etc/system/local/props.conf contained this stanza [source::/opt/tradelog/trade.log] sourcetype = tradelog And if $SPLUNK_HOME/etc/apps/tradeapp/local/props.conf contained SHOULD_LINEMERGE = True BREAK_ONLY_BEFORE = TradeID Becomes Be sure to note that this applies both at index and search time.

However At index time if $SPLUNK_HOME/etc/system/local/props.conf contained the following stanza [source::/opt/tradelog/trade.log] sourcetype = tradelog And if $SPLUNK_HOME/etc/apps/tradeapp/local/props.conf contained sourcetype=log_of_trade SHOULD_LINEMERGE = True BREAK_ONLY_BEFORE = TradeID Becomes:

Section 5: Splunk’s Data Store

Section Objectives Learn index directory structure Answer the question: “What are buckets?” and describe how they move from hot to cold Describe how to configure aging and retention times Show how to set up indexes Learn how to set up volumes on hard disk Describe back up strategies Show how to clean out an entire index or selectively delete data

Splunk’s default indexes Splunk ships with several indexes already set up main – the default index, all inputs go here by default (called defaultdb in the file system) summary – default index for summary indexing system _internal – Splunk indexes its own logs and metrics from its processing here _audit – Splunk stores its audit trails and other optional auditing information _thefishbucket – Splunk stores file information for its monitor function

Index locations in the file system $SPLUNK_HOME/var/lib/splunk $SPLUNK_DB defaultdb os _internaldb etc… index=main db colddb thaweddb hot / warm buckets cold buckets unarchived buckets Each index has three subdirectories

Index divisions Splunk divides its indexes into 3 sections, plus a special restored from archive section, for fastest searching and indexing Hot – most recently indexed events, multiple buckets, read and write, same directory as warm Warm – next step in the aging process, multiple buckets, read only, same directory as hot Cold – final step in the aging process, multiple buckets, read only, separate directory from warm and hot Thawed – restored from archive data, read only, separate directory from the rest

What are buckets? Buckets are logical groupings of indexed data based on time range Starting in the hot section, Splunk divides its indexed data into buckets based on their time range Periodically, Splunk runs the optimize process on the hot section of the index to optimize the placement of events in the buckets Once a hot bucket reaches its size limit, it will be automatically “rolled” into warm Default bucket size is set automatically by Splunk at install based on OS type Once rolled into warm, each individual bucket is placed in a directory with 2 time stamps and an id number as the directory name Splunk uses buckets to limit its searches to the time range specified pulling recent results from hot right away, then those from warm or cold after that

Bucket retention times Hot buckets are segregated by date ranges Will roll from hot to warm once max size is met OR no data has been added to a particular hot bucket in 24 hours Warm by default contains 300 buckets (default) When bucket “301” is created, oldest is rolled into cold Cold will keep a bucket for six years (default) Once the youngest event in a bucket turns 6, it will be moved to “frozen” Buckets in frozen are either archived or deleted (deleted is the default)

Configuring and adding indexes You can configure existing indexes by using the Splunk Web, the CLI, or editing indexes.conf You can add new indexes by Splunk Web, CLI, or editing indexes.conf Certain parameters are only set in indexes.conf

Adding or editing indexes with Splunk Web Max bucket size can be set manually For daily indexing rates higher than 5 GB a day set it to auto_high_volume This will give you 1 GB (32-bit) or 10 GB (64-bit) buckets Set to auto will give you 750 MB buckets for both Adding an index requires restart Be sure to say adding an index always required a restart

Set up and edit indexes – indexes.conf Indexes are controlled by indexes.conf Global settings like default database appear before the specific index stanzas Each index has its own stanza with the name of the index in [ ] defaultDatabase = webfarm [webfarm] homePath = h:\splunk_index\db coldPath = h:\splunk_index\colddb thawedPath = h:\splunk_index\thawdb

Set up and edit indexes – indexes.conf (cont) Some per index settings Change number of buckets in warm Max total data size (in MB) If data grows beyond this number, Splunk will automatically move cold buckets to frozen This setting takes precedence over all other time/retention settings frozenTimePeriodInSecs = time in seconds buckets will stay in cold [webfarm] homePath = h:\splunk_index\db coldPath = h:\splunk_index\colddb thawedPath = h:\splunk_index\thawdb maxWarmDBCount = 150 maxTotalDataSizeMB = 850000 frozenTimePeriodInSecs = 2598000

Cold to frozen Frozen is either archive or oblivion – default is deletion To archive you must define : coldToFrozenPath - location where Splunk automatically archives frozen data Splunk will strip away the index data and only stores the raw data in the frozen location Frozen can be slow inexpensive NAS, tape, etc. Older versions of Splunk used cold to frozen scripts, those are still supported, though if you specify both a coldToFrozenPath and a coldToFrozenScript the path setting will take precedence coldToFrozenScript - script that Splunk runs when data is frozen Note: You can only set one or the other of these attributes. The coldToFrozenPath attribute takes precedence over coldToFrozenScript, if both are set Splunk ships with two example cold-to-frozen scripts in $SPLUNK_HOME/bin compressedExport.sh.example flatfileExport.sh.example

Editing index settings in Manager Navigate to Manager >> Indexes Select the index to view and change the settings

Storing cold in a separate location Warm and hot live in the same directory Cold is separate and can be moved to a different location Specify the new location for cold in indexes.conf or in Manager [webfarm] homePath = h:\splunk_index\db coldPath = \\filer\splunk_cold\colddb thawedPath = h:\splunk_index\thawdb maxWarmDBCount = 150 maxTotalDataSizeMB = 850000 frozenTimePeriodInSecs = 2598000

Storage volumes You can specify locations and maximum size for index partitions using volume stanzas Handy way to group and control multiple indexes Volume size limits apply to all indexes that use the volume Create volumes in indexes.conf Use volumes in index definitions [volume:hotNwarm] path = g:\superRAID maxVolumeDataSizeMB = 100000 [volume:cold] path = \\slowNAS\splunk maxVolumeDataSizeMB = 1000000 [network] maxWarmDBCount = 150 frozenTimePeriodInSecs = 15778463 homePath = volume:hotNwarm\network coldPath = volume:cold\network Be sure to use subdirectories for your indexes to avoid collisions

Moving an entire index To move an index requires 4 steps Stop Splunk Copy the entire index directory to new location being sure to preserve permissions and all subdirectories – verify copy Edit indexes.conf to indicate the new location Restart Splunk Use cp -rp on UNIX or robocopy on Windows

Backups: What to backup 3 main categories Indexed event data Both the actual log data AND the Splunk index $SPLUNK_HOME/var/lib/splunk/ User data Things such as event types, saved searches, etc. $SPLUNK_HOME/etc/users/ Splunk configurations Configuration files updated either by hand or Manager $SPLUNK_HOME/etc/system/local $SPLUNK_HOME/etc/apps/

Backups: How Recommended method Using the incremental backup of your choice backup: Warm and cold sections of your indexes User files Archive or backup configuration files Hot cannot be backed up without stopping Splunk Recommended methods of backing up hot Use the snapshot capability of underlying file system to take a snapshot of hot, then backup the snapshot Schedule multiple daily backups of warm (works best for high data volumes)

Rolling hot into warm Why? How If your indexing rate is low, and as a result your hot doesn’t roll into warm often enough making you worried about losing data in hot between backups How Roll the hot db into warm with a script right before backing up Restarting splunkd also forces a roll from hot to warm Example roll command for the CLI ./splunk _internal call /data/indexes/<index_name>/roll-hot-buckets Be careful about too many forced rolls to warm, too many warm buckets can greatly impact search performance The recommended best practice is to roll hot to warm at most once every 24 hours,

Deleting data: who The delete command can be used to permanently remove data form Splunk’s data store By default, even the admin role does not have the ability to run this command It is not recommended to give this ability to the admin role Instead, allow a few users to log in to a role specifically set up for deletions Create a user that’s part of the “Can_delete” role

Deleting data: how Log in to Splunk Web as a user of the “Can_delete” role Craft a search that identifies the data you wish to delete Double check that the search ONLY includes the data you wish to delete Pay special attention to which index you are using and the time range Once you’re certain you’ve targeted only the data you want to delete, pipe the search to delete Note that this is a “virtual” delete. Splunk marks the events as deleted and they will never show in searches again, but they will continue to take up space on disk. Say that they are working on a process that will eventually go and “clean up” deleted data.

Cleaning out an index Splunk clean all will remove users, saved searches and alerts Other options: clean [eventdata|userdata|all] [-index name] [-f] eventdata - indexed events and metadata on each event userdata - user accounts - requires a Splunk license all - everything on the server If no index is specified, the default is to clean all indexes SO ALWAYS SPECIFY AN INDEX TO AVOID TEARS

Restoring a frozen index To thaw, move a copy of the bucket directory to an index directory ./splunk rebuild <bucket directory> will rebuild the index Will also work to recover a corrupted directory Does not count against license Must shutdown splunkd before running ./splunk rebuild command

Section 6: Users, Groups, and Authentication

Section Objectives Understand user roles in Splunk Create a custom role Understand the methods of authentication in Splunk

Manage users and roles

User roles There are three built-in user roles: Admin, Power, User (Can Delete is a special case already covered) Administrators can configure custom roles Name the role Specify a default app Define the capabilities for the role Limit the time ranges the role can use Specify both default and accessible indexes New roles available in Splunk Manager “Access controls” option

Custom user roles – set restrictions Give the role a name and select a default app Set restrictions Search terms – restrict searches on certain fields, sources, hosts, etc Time range – default is -1 (no restriction). Set time range in seconds

Custom user roles – set limits Set limits (optional) Limits are per-person

Custom user roles – inherit Custom roles can be based on standard roles Administrators can then add or remove capabilities of the imported role

Custom user roles – capabilities Add or remove capabilities See authorize.conf.spec or http://www.splunk.com/base/Documentation/latest/Ad min/authorizeconf for details

Custom user roles – indexes You can specify which indexes this role is allowed to search as well as which are searched by default

Splunk authentication – users Specify user name, email, and default app

Splunk authentication – users (cont.) Assign a role and set password

LDAP authentication Splunk can be configured to work with most LDAP including Active Directory LDAP can be configured from Splunk Manager See the docs for details www.splunk.com/base/Documentation/latest/Admin/SetUpUserAuthenticationWithLDAP

Scripted Authentication Leverage existing PAM or RADIUS authentication systems for Splunk For the most up-to-date information on scripted authentication, see the README file in $SPLUNK_HOME/share/splunk/authScriptSamples/ There are also sample authentication scripts in that directory

Single Sign On Authentication is moved to a web proxy which passes along authentication to Splunk Web Auth server Proxy authorizes client 2 SSO client Splunk server 1 Splunk request 3 Proxy passes request with user name 5 Proxy returns page 4 Splunk Web returns page to proxy Proxy server

Lab

Section 7: Forwarding and Receiving

Section objectives Understand forwarders Compare forwarder types Examine topology examples Deploy and configure forwarders

Splunk forwarder types Universal forwarder Streamlined data-gathering “agent” version of Splunk with a separate installer Contains only the essential components needed to forward raw or unparsed data to receivers/indexers Cannot perform content-based routing In most cases, best tool for forwarding data throughput limited to 256kbps Light forwarder Full Splunk in Light forwarder mode (no separate install), otherwise works the same as Universal forwarder “Heavy” forwarder Full Splunk instance – does everything but write data to index Breaks data into events before forwarding Can handle content-based routing Technically it’s just called Forwarder, but we add heavy so not to be confused.

Comparing forwarders If you need to… use Forward unparsed data to a receiver or indexer Universal forwarder Collect data on a forwarder that requires a python-based scripted input Light forwarder Route collected data based on event info or filter data prior to WAN/slower connection Heavy forwarder

Forwarder topology: data consolidation Most common topology Multiple forwarders send data to a central indexer

Forwarder topology: load balancing Distributes data across multiple indexers Forwarder routes data sequentially to different indexers at specified intervals with automatic failover * * Requires distributed search covered later in this section

Setting up forwarders – big picture Enable receiving on your indexer(s) Install forwarders on production systems Configure forwarders to send to receivers Test connection with small amount of test data Setup inputs on forwarders Verify inputs are being received

Configure forwarding and receiving - Manager You can set up basic forwarding and receiving using Manager

Set up receiving port – Splunk Web Specify TCP port you wish Splunk to listen on and click save NOT Splunk Web or splunkd ports

Enable Indexer to indexer forwarding/receiving You can easily forward indexed data from one Splunk server to another Useful for replication across sites or forwarding one type of data to a different indexer

Enable forwarding – Splunk Web Enter either the hostname or IP address with the port of the receiving server If multiple hosts are defined, you can optionally select Automatic Load Balancing Restart required

Install universal forwarder: Windows The Windows version of Universal forwarder includes an Install Shield package that guides you through most of the forwarder’s configuration If the installer detects an earlier version of Splunk Forwarder you can: Automatically perform a migration during installation Fishbucket info is migrated, config files are NOT Install UF in a different location to preserve legacy forwarder

Install universal forwarder: Windows (cont.) If using a deployment server, indicate the hostname or IP and port Deployment server is covered in a later module Indicate the receiving indexer hostname or IP and port Must be listening port of indexer Skip if using deployment server

Install universal forwarder: Windows (cont.) Choose to forward from local or remote If remote, enter domain, username and password for remote host on next screen

Install universal forwarder: Windows (cont.) Enable Windows inputs Event logs Performance monitoring AD monitoring Clicking next begins the installation You can update your universal forwarder's configuration post-install by directly editing its inputs.conf and outputs.conf

Install universal forwarder: Windows CLI Use the CLI installation method when: You want to install the universal forwarder across your enterprise via a deployment tool You do not want the universal forwarder to start immediately after installation Include LAUNCHSPLUNK=0 in the install command You want to clone a system image for cloning that includes a Universal Forwarder

Install universal forwarder: Windows CLI (.cont) Run as Local System user and request configuration from deploymentserver1 For new deployments of the forwarder msiexec.exe /i splunkuniversalforwarder_x86.msi DEPLOYMENT_SERVER="deploymentserver1:8089" AGREETOLICENSE=Yes /quiet Run as a domain user but don’t launch immediately Prepare a sample host for cloning msiexec.exe /i splunkuniversalforwarder_x86.msi LOGON_USERNAME="AD\splunk" LOGON_PASSWORD="splunk123" DEPLOYMENT_SERVER="deploymentserver1:8089" LAUNCHSPLUNK=0 AGREETOLICENSE=Yes /quiet

Install universal forwarder: Windows CLI (.cont) Enable indexing of the Windows security and system event logs – run installer in silent mode Collect just the Security and System event logs through a "fire-and-forget" installation msiexec.exe /i splunkuniversalforwarder_x86.msi RECEIVING_INDEXER="indexer1:9997" WINEVENTLOG_SEC_ENABLE=1 WINEVENTLOG_SYS_ENABLE=1 AGREETOLICENSE=Yes /quiet Migrate from an existing forwarder – run installer in silent mode Migrate now and redefine your inputs later msiexec.exe /i splunkuniversalforwarder_x86.msi RECEIVING_INDEXER="indexer1:9997" MIGRATESPLUNK=1 AGREETOLICENSE=Yes /quiet

Install universal forwarder: *nix Install as you would full Splunk instance, replacing the package name rpm -i splunkuniversalforwarder_package_name.rpm Start Splunk and accept license Configure the following options Auto start: splunk enable boot-start Deployment server: splunk set deploy-poll <host:port> Client without deployment server: splunk enable deploy-client Forward to an indexer: splunk add forward-server <host:port> Configure inputs via inputs.conf

Migrate to universal forwarder: *nix You can migrate checkpoint data from an existing *nix light forwarder (version 4.0 or later) to the universal forwarder Important: Migration can only occur the first time you start the universal forwarder, post- installation. You cannot migrate at any later point Stop all services on the host Install the universal forwarder – do not start In the installation directory, create a file $SPLUNK_HOME/old_splunk.seed that contains a single line with the path of the old forwarder's $SPLUNK_HOME directory Start the universal forwarder Edit / add configurations Migration process only copies checkpoint files – you should manually copy over the old forwarder's inputs.conf

Forwarding configurations inputs.conf on the forwarder gathers the local logs/system info needed You can include input phase settings in props.conf on light forwarders Per-event processing must be done on the indexer outputs.conf points the forwarder to the correct receiver(s) If you set up forwarding in Splunk Manager, it will reside in the app context you were in when you enabled it If creating by hand, best practice is to place it in $SPLUNK_HOME/etc/system/local

Outputs.conf – basic example Main [tcpout] stanza has global settings [tcpout:web_indexers] stanza sets up receiving server Compression is turned on Server setting refers to either the IP or host name plus port of receiver [tcpout] defaultGroup = web_indexers disabled = false [tcpout:web_indexers] server = splunk1.company.com:9997 compressed = true [tcpout-server://splunk1.company.com:9997] Global settings Receiving server

Outputs.conf – indexer to indexer clone Main [tcpout] stanza has global settings such as whether to index a local copy [tcpout:uk_clone] stanza sets up receiving server Compression is turned on Server setting refers to either the IP or host name plus port of receiver [tcpout] IndexAndForward=true [tcpout:uk_clone] Compressed=true Server=uk_splunk.company.com:9997 Global settings Receiving server

Outputs.conf – single indexer and SSL Each forwarder would have a copy of outputs.conf with the following stanza Additionally the forwarders would be sending using SSL using Splunk’s self- signed certificates [tcpout:indexer] server = splunk.company.com:9997 sslPassword = ssl_for_m3 sslCertPath = $SPLUNK_HOME/etc/auth/server.pem sslRootCAPath = $SPLUNK_HOME/etc/auth/cacert.pem

Outputs.conf – clone indexers Set multiple target groups to get forwarders to send exact copies to multiple indexers [tcpout:indexer1] server = splunk1.mycompany.com:9997 [tcpout:indexer2] server = splunk2.mycompany.com:9997

Auto load balancing Splunk also offers automatic load balancing, which switches from server to server in a list based on a time interval Two options: static list in outputs.conf (see below) DNS list based on a series of A records for a single host name [tcpout:list_LB] autoLB = true server = splunk1.company.com:9997, splunk2.company.com:9997

Auto load balancing DNS list To set up DNS list load balancing create multiple A records with the same name with the IP address of each indexer [tcpout:DNS_LB] autoLB = true server = splunk1b.mycompany.com:9997 autoLBFrequency = 60 From DNS zone file splunk1 A 10.20.30.40 splunk2 A 10.20.30.41 splunk1b A 10.20.30.40 splunk1b A 10.20.30.41

Caching/queue size in outputs.conf maxQueueSize = 1000 (default) is the number of events the forwarder will queue if the target group cannot be reached In load-balanced situations, if the forwarder can’t reach one of the indexers, it will automatically switch to another, and will only queue if all are down/unreachable See outputs.conf.spec for details and even more queue settings

Indexer Acknowledgement Guards against loss of data when forwarding to an indexer Forwarder will re-send any data not acknowledged as "received" by the indexer Disabled by default Requires version 4.2 of both forwarder and receiver Can also be used for forwarders sending to an intermediate forwarder

Indexer Acknowledgement process As forwarder sends data, it maintains a copy of each 64k block in memory in the wait queue until it gets an acknowledgment from the indexer While waiting, it continues to send more data blocks The indexer receives a block of data, then parses and writes to disc Once on disc, indexer sends acknowledgment to forwarder Upon acknowledgment, the forwarder releases the block from memory If the wait queue is of sufficient size, it doesn't fill up while waiting for acknowledgments to arrive Wait queue size can be increased (covered in a later slide)

What happens when no ack is received? If the forwarder doesn't get acknowledgment for a block within 300 seconds (by default), it closes the connection Change wait time by setting readTimeout in outputs.conf If auto load balancing is enabled, it opens a connection to the next indexer in the group and sends the data If auto load balancing is not enabled, it tries to open a connection to the same indexer as before and resend the data Data block is kept in the wait queue until acknowledgment is received Once wait queue fills, forwarder stops sending until it receives acknowledgment for one of the blocks, at which point it can free up space in the queue.

Handling duplicates If there's a network problem that prevents an acknowledgment from reaching the forwarder, dupes may occur Example: indexer receives a data block then generates the acknowledgment – network goes down before forwarder gets ack. When network comes back up, forwarder resends the data block – indexer parses and writes Forwarders will record events to splunkd.log when it receives duplicate acks or resends due to no response

Enabling Indexer Acknowledgement Enabled on the forwarder Both forwarder and indexer must be at version 4.2 or greater Set useACK to true in outputs.conf [tcpout:<target_group>] server=<server1>, <server2>, ... useACK=true Disabled by default You can set useACK either globally or by target group, at the [tcpout] or [tcpout:<target_group>] stanza levels You cannot set it for individual servers at the [tcpout-server: ...] stanza level

Increasing wait queue size Max wait queue size is 3x the size of the in-memory output queue, which you set with the maxQueueSize attribute in outputs.conf maxQueueSize = [<integer>|<integer>[KB|MB|GB]] Wait queue and the output queues are configured by the same attribute but are separate queues Example: if you set maxQueueSize to 2MB, the maximum wait queue size will be 6MB Specifying a lone integer - maxQueueSize = 100 – sets max events for parsed data and max blocks (~64K) for unparsed data

Forwarding to an intermediate forwarder Two main possibilities to consider: Originating forwarder and intermediate forwarder both have acknowledgment enabled Intermediate forwarder waits until it receives acknowledgment from the indexer and then sends acknowledgment back to the originating forwarder Originating forwarder has acknowledgment enabled - intermediate forwarder does not Intermediate forwarder sends acknowledgment back to the originating forwarder as soon as it sends the data on to the indexer Because it doesn't have useACK enabled, the intermediate forwarder cannot verify delivery of the data to the indexer

Lab

Section 8: Distributed Environments

Objectives List Splunk server types Understand Distributed search Describe search head pooling Understand Deployment server

Types of Splunk server indexer universal forwarders Separate install. Gathers data and forwards to indexer. search head Indexers gather data from inputs and forwarders, process it and write it to disk. Search peer accessed by users. Runs ad-hoc and scheduled searches/alerts. Distributes searches out to all peers and combines results. heavy forwarders Gather or receives data, processes it and then forwards on to indexer.

Data lifecycle review Four main phases in the data lifecycle Input Splunk forwarder or full Splunk Parsing Splunk heavy forwarder or indexer Indexing Indexer Search Search head Collect raw data and send to indexer forwarders Parse data – line breaks, timestamps, index-time field extractions, save to disc and index indexer Pull events from index, search-time field extractions, display events, reports, etc. search head

Distributed Environments Overview The next three sections will introduce you to common topologies and tools used in distributed environments Distributed Search Search across multiple indexes Search Head Pooling Multiple search heads share configuration data Deployment Server Manage multiple, varying Splunk instance configurations from a single server

Distributed Search

Distributed search overview Search heads send search requests to multiple indexers and merge the results back to the user In a typical scenario, one Splunk server searches indexes on several other servers Used for Horizontal scaling across multiple indexers used for high volume data scenarios Accessing geo-diverse indexers Access control High availability scenarios

Distributed search topology examples Simple distributed search for horizontal scaling – one search head searching across three peers

Distributed search topology examples (cont.) Access control example – department search head has access to all the indexing search peers Each search peer also has the ability to search its own data Department A search peer has access to both its data and the data of department B

Distributed search topology examples (cont.) Load balancing example – provides high availability access to data

Distributed Search setup - Manager Turn on Distributed search and optionally turn on auto-discovery Allows this Splunk server to automatically add other search peers it discovers on the network

Distributed Search Add Peers - Manager Add individual peers manually Include authentication

Search Head Pooling

Search head pooling overview Multiple search heads can share configuration data Allows horizontal scaling for users searching across the same data Also reduces the impact if a search head becomes unavailable Shared resources are: .conf files Search artifacts – saved searches and other knowledge objects Scheduler state – only one search head in the pool runs a particular scheduled search Makes all files in $SPLUNK_HOME/etc/{apps,users} available for sharing – .conf files, .meta files, view files, search scripts, lookup tables, etc. All search heads in a pool should be running same version of Splunk

Topology example – with loadbalancer NFS or other similar technology User logs in Layer 7 Load Balancer

Topology example – without loadbalancer NFS User logs in User logs in User logs in User logs in

Create a pool of search heads Set up each search head individually in the same manner as configuring distributed search Set up shared storage that each search head can access For *nix, use NFS mount For windows, use CIFS (SMB) share The Splunk user account needs read/write access to shared storage Stop splunkd on all search heads in pool

Enable each search head Use the pooling enable CLI command to enable pooling on a search head. splunk pooling enable <path_to_shared_storage> [--debug] On NFS, <path_to_shared_storage> is NFS's mountpoint. On Windows, <path_to_shared_storage> is UNC path of the CIFS/SMB Execute this command on each search head in the pool. The command: Sets values in the [pooling] stanza of the server.conf file in $SPLUNK_HOME/etc/system/local Creates user and app subdirectories

Copy user and app directories to share Copy the contents of $SPLUNK_HOME/etc/apps and $SPLUNK_HOME/etc/users directories on existing search heads into the empty apps and users directories on the shared storage For example, if your NFS mount is at /tmp/nfs, copy the apps subdirectories into /tmp/nfs/apps Similarly, copy the user subdirectories: $SPLUNK_HOME/etc/users/ into /tmp/nfs/users Restart each search head in the pool

Using a load balancer Allows users to access the pool of search heads through a single interface, without needing to specify a particular one Ensures access to search artifacts and results if one of the search heads goes down When configuring the load balancer: The load balancer must employ layer-7 (application-level) processing Configure the load balancer so user sessions are "sticky" or "persistent” to ensure that a user remains on a single search head throughout a session

Search head management commands splunk pooling validate Revalidate the search head's access to shared resources splunk pooling disable Disables pooling for a given search head splunk pooling display Displays / verifies current status of search head $ splunk pooling enable /opt/splunk $ splunk pooling display Search head pooling is enabled with shared storage at: /tmp/nfs $ splunk pooling disable $ splunk pooling display Search head pooling is disabled

Configuration changes Once pooling is enabled on a search head, you must notify the search head if you directly edit a .config file If you add a stanza to any config file in a local directory, you must run the following command: splunk btool fix-dangling Not necessary if you make changes via Splunk Web Manager or CLI

Deployment Server

Deployment server overview The deployment server pushes out configurations and content – packaged in ‘deployment apps’ – to distributed clients Allows you to manage multiple Splunk instances from a single Splunk server Small environments – deployment server can also be a deployment client Greater than 30 deployment clients – deployment server should be its own instance

Deployment Terminology Deployment server A Splunk instance that acts as a centralized configuration manager Supplies configurations to any number of Splunk instances Any Splunk instance can act as a deployment server Deployment client Splunk instances that are remotely configured A Splunk instance can be both a deployment server and client at the same time Server class A logical grouping of deployment clients based on need for the same configs Deployment app Set of deployment content (including configuration files) deployed as a unit to clients of a server class.

Deployment server uses Distribute Apps and/or configurations Windows file servers Splunk for Windows App Collect event logs and WMI Database group Uptime, system health, access errors Web Hosting Group Analytics, business intelligence

Server Classes examples Windows Windows Server 2003 IIS Database Solaris servers (sunos-sun4u) Oracle Web hosting group Apache on Linux Could also group clients by OS, Hardware type, location, etc.

Deployment server example www1-forwarder www2-forwarder www3-forwarder db1-forwarder db2-forwarder db-logging-forwarder server class www-forwarder server class Deployment server

Deployment server configuration overview Designate a Splunk instance as deployment server Create serverclass.conf on the deployment server at $SPLUNK_HOME/etc/system/local Create deployment apps on the deployment server and put the content to be deployed into directories Create deploymentclient.conf on the Deployment clients Restart the deployment clients

Deployment serverclass.conf (cont.) Server classes group clients that need the same configuration If filters match the apps and configuration, content is deployed to the client Stanzas in serverclass.conf go from general to more specific All configuration information is evaluated from top to bottom in the configuration file, so order matters [global] repositoryLocation = $SPLUNK_HOME/etc/deploymentApps targetRepositoryLocation = $SPLUNK_HOME/etc/apps [serverClass:AppsByMachineType] [serverClass:AppsByMachineType:app:win_eventlog] Where apps are stored on the deployment server Applies to all server classes Where apps will be delivered on the client Server-class specific settings

Server classes example – serverclass.conf [serverClass:www-forwarder] filterType = blacklist blacklist.0=* whitelist.0=*.10.1.1* [serverClass:www-forwarder:app:webfarm-forwarders] stateOnClient=enabled [serverClass:db-logging-forwarder] whitelist.0=*.192.2* [serverClass:db-logging-forwarder:app:db-forwarder] Server class only applies to clients in the 10.1.1* IP range www-forwarder server class Deploy this app to clients that match db-logging-forwarder server class Server class only applies to clients in the 192.2* IP range Deploy this app to clients that match

serverclass.conf – group by machine type You can create server classes that apply to specific machine types or OSs [serverClass:AppsByMachineType:app:SplunkDesktop] machineTypes=Windows-Intel [serverClass:AppsByMachineType:app:unix] machineTypes=linux-i686, linux-x86_64 Deploy this app only to Windows machines Deploy this app only to Linux 32 or 64 bit machines

serverclass.conf – client handling options Optionally configure actions to take on the client after an app is deployed restartSplunkWeb = <True or False> restartSplunkd = <True or False> stateOnClient = <enabled, disabled, noop> Defaults to false Noop is used for apps that don’t need enabling such as a package of eventtypes or saved searches Defaults to true Enable or disable apps on the client after installation or change

Setup Deployment Client Install Splunk on the client machine Run the following command ./splunk set deploy-poll <ipaddress/hostname of deployment server>:8089 -auth admin:changeme This will create a file named deploymentclient.conf [deployment-client] disabled = false [target-broker:deploymentServer] targetUri = 225.225.225.1:8089 URI of deployment server

Verify deployment Server clients From the deployment server, you can verify deployment clients from CLI with the following command: ./splunk list deploy-clients Deployment client: ip=192.168.2.4, dns=192.168.2.4, hostname=mycompany-PC-64, mgmt=8089, build=64889, name=deploymentClient, id=connection_192.168.2.4_8089_192.168.2.4_deploymentClient, utsname=windows-unknown Command output

Deployment actions Default poll period is 30 seconds Specified in serverclass.conf The deployment server instructs the client what it should retrieve The deployment client then retrieves the new content client Poll server Send instructions Get content deployment server

Force-notify clients of changes If you make changes to a deployment app on the deployment server, you may want to immediately notify the clients of the change Run ./splunk reload deploy-server to notify all clients Run ./splunk reload deploy-server –class <class name> to notify a specific class

Section 9: Licensing

Section Objectives Identify license types Understand license violations Define license groups Define license pooling and stacking Add and remove licenses

Splunk license types Enterprise license Purchased from Splunk Allows for full functionality License limits indexing volume Enterprise trial license – downloads with product 500mb per day limit Otherwise same as enterprise, except that it expires 60 days after install

Splunk license types (cont.) Forwarder license Applied to non-indexing forwarders, and deployment servers Allows authentication, but no indexing Free license Activates automatically when 60 day trial enterprise license expires Can be activated before 60 days by using Manager Doesn’t allow authentication, forwarding to non-Splunk servers, or alerts Does allow 500mb/day of indexing and forwarding to other Splunk instances

License warnings and violations 5th warning in a rolling 30 day period causes violation and search to be disabled 3rd warning in Free version You must be “good” for 30 consecutive days for warning number to reset Indexing will continue, only search is locked out Note that you can still search Splunk’s internal indexes Contact Splunk Support to unlock your license

License groups License types are organized into groups Enterprise Group Includes Enterprise, Enterprise Trial, and sales trial Free Group Forwarder Group Licenses are stored in directories at $SPLUNK_HOME$/etc/licenses Each group is stored in a separate folder under that directory

License stacking and pooling overview Licenses in the Enterprise group can be aggregated together, or stacked Available license volume is the sum of the volumes of the individual licenses Enterprise trial license that comes with the Splunk download cannot be stacked Free license cannot be stacked Pools can be created for a given stack Specify Splunk indexing instances as members of a pool for the purpose of volume usage and tracking Allows for insulation of license usage by group of indexers or data type

Topology example – single pool Master has a stack of two licenses for a total of 500GB All indexers in the pool share 500GB entitlement collectively This should be the most common scenario Enterprise Stack – 500 GB Total Entitlement Default License Pool 500 GB Shared Entitlement 300GB License 200GB License

Topology example – multiple pools Master has a stack of two licenses, totaling 500GB Each pool has a specific entitlement amount Enterprise Stack – 500 GB Total Entitlement Asdasd Default Pool - 100GB local Entitlement 300GB License 200GB License Pool 4 100GB Entitlement Pool 2 100GB Entitlement Pool 3 200GB Entitlement

Managing licenses – overview You can manage license stacks and pools via Manager Switch from master to slave Change license group View license alerts Add licenses and manage stacks Add and manage pools

Managing licenses – master/slave By default, Splunk instances are master license servers Change an instance to slave by entering the master license server URI

Change license group Each master can only manage a single license group Select Enterprise, Forwarder, or Free Forwarder and Free cannot be stacked or used in Pools Enterprise is default

Adding a license Any 4.x license can be added 4.2 licenses can be uploaded, or XML can be copy/pasted 4.0 and 4.1 licenses must be uploaded

License stacks Enterprise Stack 4.2 Enterprise license

License pools For each stack, you can create one or more additional license pools Define a maximum volume for the pool Select indexers for the pool

Viewing pool volume Default pool Added pool

Viewing alerts windows enterprise

Viewing license info – master For each license installed on the master, you can view specific license info Exp. Date/time Features allowed Max violations Quota Stack name and type Status Violation window period Payload is unimplemented feature

Viewing license info – slave Displays local indexer name, master license server URI, last successful connection Messages link displays license alerts

Lab

Section 10: Security

Section objectives Learn what you can secure in Splunk Understanding SSL and Splunk Learn about user group and index security Learn what is recorded in the audit log Describe how to secure the audit log Understand archive data signing

What you can secure in Splunk SSL splunkd to Splunk Web Splunk Web to client forwarder to indexer Audit user actions file system Data Signing cold to frozen archive data audit data in Splunk

SSL Already enabled between splunkd and Splunk Web Can be enabled via Splunk Web > Manager or by editing web.conf Splunk will automatically generate homemade certificates You can pay for certificates to avoid browser complaints Forwarder to indexer communication can be secured Enabled in outputs.conf Adds to forwarder processor overhead Can force Splunk to only use SSLv3 if required

Data / Index Security Securing sensitive data within Splunk is best achieved by segregating the data by index Index access is governed by user groups Index level security is the best method to insure users have access to the data they need, while preventing them from seeing sensitive data

Auditing Splunk automatically creates an audit trail of Splunk user actions Stored in the _audit index Accessible only by administrators by default Useful for monitoring for prying eyes Splunk also audits file systems (FS change monitor) Use it on /etc/password or on Splunk’s own config files

Signing audit data Splunk has the ability to number and sign audit trail data Detects gaps Detects tampering Created fields called “validity” and “gap” in the audit log Does not work in distributed environments See the Knowledge Base for details on setting this up http://www.splunk.com/base/Documentation/latest/Admin/Signauditeven ts

Signing archive data You can sign archive data when it moves from cold to frozen You must specify a custom archiving script You cannot use it if you choose to have Splunk perform the archiving automatically Add signing to your script using signtool -s <archive_path> Splunk verifies archived data signatures automatically when the archive is restored Verify signatures manually by using signtool -v <archive_path>

Splunk Product Security Resources The Splunk Product Security Portal provides a single location for: Splunk Product Security Announcements Splunk Product Security Policy Splunk Product Security Best Practices Reporting Splunk Product Security Vulnerabilities This site is updated regularly with any security-related updates or announcements http://www.splunk.com/page/securityportal splunk.com > Support > Security

Section 11: Jobs, Knowledge Objects, and Alerts

Section objectives Understand jobs Manage jobs Understand alerts, and alert settings Understand PDF server and alerts Understand what knowledge objects are and how to set their permissions

What are jobs Jobs are searches that users or the system runs A job is created when You hit return in the search box You load a dashboard with embedded saved searches An alert is triggered or saved search runs Jobs create artifacts when they run What are artifacts? Traces of jobs (such as search results) that are created on disk Persistence to disk allows users to recreate or resurrect jobs

Managing Jobs – Splunk Web Users can mange their own jobs Administrators can manage all users’ jobs Click on Jobs in the Splunk Web to manage, rerun, and resurrect jobs

Manage jobs – OS level (*nix only) Search jobs run as processes at the OS level View search jobs running Included in the process description will be key information the actual search running who ran the search their role the search ID ps –ef | grep “splunkd search” 502 3179 1662 0 0:00.05 ?? 0:00.26 splunkd search --id=rt_1297105108.42 --maxbuckets=0 --ttl=600 --maxout=10000 --maxtime=0 --lookups=1 --reduce_freq=10 --user=admin --pro --roles=admin:power:user

Manage jobs – OS level continued There will be 2 jobs for each process 2nd job is the “helper” – it will die if you kill the 1st job Running jobs will be writing data to $SPLUNK_HOME/var/run/splunk/dispatch/<job_id> Saved searches will append the name of the saved search to the job_id directory This directory exists for the TTL of the job You may need to delete artifact directories for jobs you kill by hand TTL = time to live

Alerts Review Alerts are saved searches that run on a schedule and “do something” based on the data that is returned Alerts can send an email, trigger a shell script, or create an RSS feed

Email alert configuration In the Email Subject field, $name$ is replaced by the saved search name You must first configure email alert settings in Manager

PDF report server Splunk offers the ability to print and email reports in PDF format You must install the PDF print server add-on on a Linux-based Splunk instance The Splunk instance doesn’t have to be an indexer, but cannot be a light forwarder See www.splunk.com/base/Documentation/latest/Installation/Config urePDFprintingforSplunkWeb for details

Scripted alerts You can have an alert that activates a script Scripts must be located in $SPLUNK_HOME/bin/scripts Scripts can be in any language the underlying operating system can run Splunk passes a number of variables to the script For details on variables etc., see the docs: http://www.splunk.com/base/Documentation/latest/admin/Confi gureScriptedAlerts

Knowledge Objects Knowledge objects are user-created things such as Eventtypes Saved Searches Field Extractions using IFX (Interactive Field Extractor) Tags Knowledge objects initially are only available to the user who created them Permissions must be granted to allow other users/apps to use them

Knowledge object permissions Users only need read permissions to use knowledge objects Use app context to segregate app-specific knowledge objects

Section 12: Troubleshooting

Section objectives Learn how to set specific log levels using Manager Learn basic troubleshooting steps to solve/identify common issues Learn how to get community help with Splunk Understand how to contact Splunk Support Search is the root of all troubleshooting in Splunk. Make sure when troubleshooting you’ve doubled checked the search.

Splunk’s log levels Log levels from lowest to highest: crit, fatal, error, warn, info, debug By default all subsystems are set to info or warn All of Splunk’s logs can be set to debug by restarting Splunk in debug mode Generally not recommended since it’s burdensome on production systems and creates lots of unwanted “noise” in the logs Better to set to debug granularly on the individual subsystem(s) you are troubleshooting (see next slide) Splunk Support may ask for overall debug mode in certain cases

Set granular log levels You can granularly adjust subsystem log levels to debug to troubleshoot specific issues using Manager Can also set them using log.cfg in $SPLUNK_HOME/etc (useful for light forwarders)

Troubleshooting: check your search Many times input or forwarder problems are actually misdiagnosed search problems Before starting to troubleshoot a missing input or forwarder that is not forwarding, double check your search Sometimes inputs wind up in unexpected indexes so try adding “index=*” when searching for a missing input/forwarder Sometimes time stamps are extracted wrong on new inputs, try searching “All Time” to help diagnose this Generally, use wildcards in other parts of your search to cast the widest net for missing data

Deployment monitor The Deployment Monitor is a collection of dashboards and drilldown pages with information to help monitor the health of a system Index throughput over time Number of forwarders connecting to the indexer over time Indexer and forwarder abnormalities Details for individual forwarders and indexers, such as status and forwarding volume over time Source types being indexed by the system License usage

Main – index throughput and forwarders

Main – indexer and forwarder warnings

Main – sourcetype warnings

Viewing warning info Click the arrow icon to view warning information

Configuring alerts Click configure alerting to modify the underlying saved search/alert

Indexers – All Indexers Number of current active searches MB indexed today Can select alternate time range Table report of indexer(s) status, last connection, and total GB indexed in last 30 minutes

Indexer Properties Data specific to a given indexer Drill-down from All Indexers view Can drill-down on any chart item to show underlying events

All Sourcetypes Shows MB Received by sourcetype Table display shows each sourcetype, current status, last received, and total MB received Drill down on any item for underlying events

Sourcetype info Drill-down from All sourcetypes shows info for single sourcetype

License Usage Cumulative MB per day by Sourcetype MB Received By sourcetype, source, host, forwarder, indexer, license pool Drill-down shows underlying events in Search view Usage statistics Shows last received and total MB received

Backfill data Use backfill Summary Indexes to add two-weeks worth of data to the summary indexes (useful for new Deployment Monitor installation on existing Splunk instance) Use Flush and Backfill to erase old data and re-populate

Community based support Splunk docs are constantly being updated and improved, so be sure to select your version of Splunk to make sure the doc you are reading applies to your version http://www.splunk.com/base/Documentation Splunk Answers: post specific questions and get them answered by Splunk experts (also makes for great and informative reading) http://answers.splunk.com IRC Channel: Splunk maintains a channel #splunk on the EFNet IRC server. Support engineers and many well-informed Splunk users “hang out” there

Splunk Support Contact Splunk Support email: support@splunk.com File a case online http://www.splunk.com/index.php/submit_issue 24/7 phone depending on support contract

Thanks! Please take our survey.