Offensive Rootkit Development

Offensive Rootkit Development
December 8, 2010

Introductions Trainer Class participant introductions Name
Experience in Windows driver development and rootkits Why are you here? What would you like to learn in this class?

Class Structure This course is focused on 32-bit Windows rootkit techniques and foundational Windows driver development: Lecture Code Walkthroughs Hands-on lab exercises

Domain Areas This class will touch on subjects that can stand on their own as a 1-week class In order to fully understand Windows rootkits we will delve into: Windows Architecture Windows Driver development NDIS programming

Class Materials Course DVD contains: Slides Labs Tools directory
Code galore

Tools Directory on DVD Process Explorer
DbgView (View debug statements) InstDrv (Install driver) WinObj (Object Viewer) Netcat Drivers (lists drivers) Wireshark install

Objectives After completing this course, students will be able to:
Understand the different generations of Windows rootkit techniques Build a fully functional rootkit Utilize lab code for own work and research Identify what’s needed to “weaponize” tools

Module 1 Rootkit overview

What is a Rootkit? Definition
Set of programs that patch and trojan existing execution paths in a system Software that hides an attackers presence Hide execution Hide files / directories Hide network communications on the target system Hides registry entries Hides drivers

What a Rootkit is NOT Not an exploit Not a virus Not a worm
A rootkit may USE software exploits For example, to load into the kernel

When Building a Rootkit
How specialized is the rootkit? Is it OS dependent? i.e. only works on XP and not Vista Is it patch specific? i.e. works on XP SP1 and not XP SP2 Goal as developer should be lowest common denominator One rootkit implementation can attack many systems

When to Install a Rootkit
Once you have an access to the target From an exploit Physical access Want to hide presence Hide your backdoor or other goodies like keyloggers Low-level techniques Provide low-level network path for backdoors Circumvent OS security structures

Skills needed Device Driver development
Knowledge of Operating System internals Hardware specifics – If going that route for specific targets but also for advanced memory techniques Flash memory Controller hardware Specialized CPU

Module 2 Driver overview

Environment Subsystem or User application
Windows Drivers Environment Subsystem or User application User Mode Kernel Mode System Services I/O Manager Device Driver Device

Ring 0 Device Driver Full reign of memory Patch a kernel file on disk
Insert a hook or detour Patch a kernel file on disk No holds barred Ring 0 Ring 1 Ring 2 Ring 3

Development Software Window Driver Kit (WDK) 7.1
Download from Microsoft - Free Not using Windows Driver Framework (WDF) Wraps core functionality to make driver development easier Using Windows Driver Model (WDM) Dates back to Windows 2000 Windows power management, Plug and play

Skeleton Driver Structure
DriverEntry() DriverUnload()

DriverEntry Equivalent to “main” in userland
Only driver routine with an exported name Called when the driver is first loaded by the OS May create one or more Device Objects May set up a “device name” so that userland applications can obtain a handle to request services from the driver

DriverEntry Function NTSTATUS DriverEntry( Parameters
IN PDRIVER_OBJECT DriverObject, IN PUNICODE_STRING RegistryPath ); Parameters DriverObject Caller-supplied pointer to a DRIVER_OBJECT structure. This is the driver's driver object. RegistryPath Pointer to a counted Unicode string specifying the path to the driver's registry key.

DRIVER_OBJECT One unique DRIVER_OBJECT structure for each loaded device driver “Catalog” of pointers to various functions provided by the driver Life of a DRIVER_OBJECT Created by I/O Manager when a driver is loaded DriverEntry loads pointers to supported functions into the driver object Driver object is used to find the correct “service dispatch routine” when the driver receives an IRP

DEVICE_OBJECT A Given driver may have more than 1 DEVICE_OBJECT
Usually created in DriverEntry Use IoCreateDevice to create the DEVICE_OBJECT Needs to be deleted in the Driver’s Unload routine

Driver Object Structure
Start I/O Routine Device Object Unload Routine Device Object DriverStartIo DriverUnload Dispatch Routine Device Object MajorFunction[] Dispatch Routine .

DriverUnload Routine DriverUnload(
PDRIVER_OBJECT DriverObject ) Function pointer set in driver entry Performs deinitialization and cleanup Must be provided to unload the driver dynamically

Build SOURCES File Change the target name to something better Fields
TARGETPATH TARGETTYPE SOURCES INCLUDES (optional) TARGETLIBS (optional) Change the target name to something better

Lab Exercise: Basic_1 Add an unload routine Insert DbgPrint statements
Print message on load Print message on unload Verify that build environment is working Load and unload basic_1 Verify that debug statements are being delivered Use DebugView

Module 3 Driver Programming

Driver Foo Windows driver classes are a week long on their own
We will only cover some of the basics of Windows Drivers in the next few modules Remember kernel development is precise. Unlike user apps, simple mistakes bring down the whole system

Basic Programming techniques
We’ll cover some basic areas that will help in your basic driver development: Certain functions and rules apply to certain contexts (IRQL) Memory allocation String functions Not as detrimental but showing how there are similar looking functions to Win32 API. File I/O Registry functions WDK help is your friend

IRQL  What Is It? Windows defines interrupt priority by IRQL (interrupt request level pronounced “irk-al”) Prioritizing interrupts distributes CPU time among competing I/O devices Code executing at a given IRQL cannot be interrupted by code at a lower or equal IRQL Each processor has an independent IRQL

IRQL Levels IRQL IRQL value Description x86 PASSIVE_LEVEL
User threads and most kernel-mode operations APC_LEVEL 1 Asynchronous procedure calls and page faults DISPATCH_LEVEL 2 Thread scheduler and deferred procedure calls (DPCs) CMC_LEVEL N/A Correctable machine-check level (IA64 platforms only) Device interrupt levels (DIRQL) 26-Mar Device interrupts PC_LEVEL Performance counter (IA64 platforms only) PROFILE_LEVEL 27 Profiling timer for releases earlier than Windows 2000 SYNCH_LEVEL Synchronization of code and instruction streams across processors CLOCK_LEVEL Clock timer CLOCK2_LEVEL 28 Clock timer for x86 hardware IPI_LEVEL 29 Interprocessor interrupt for enforcing cache consistency POWER_LEVEL 30 Power failure HIGH_LEVEL 31 Machine checks and catastrophic errors; profiling timer for Windows XP and later releases

PASSIVE_LEVEL IRQL Lowest IRQL level User mode code runs at this level
No interrupts are masked off Pagable memory is accessible All kernel APIs can be called

APC_LEVEL IRQL Asynchronous procedure calls run at this level
APC level interrupts are masked off Disables other APC’s from occurring Pagable memory is still accessible Some kernel API’s can’t be called

DISPATCH_LEVEL IRQL OS scheduler runs at this level
Code running at DISPATCH_LEVEL is non preemptable I/O Completion routines can be called at this level DPC level interrupts and lower are masked off Pagable memory is NOT accessible Waiting on objects is NOT allowed File I/O is NOT allowed Few kernel API’s can be called

DIRQL All interrupts at this level or less are masked off
Range of IRQLS Used to determine which devices have priority over other devices Pagable memory is NOT accessible Very few kernel API’s can be called

Getting The Current IRQL
The KeGetCurrentIrql routine returns the current IRQL. KIRQL KeGetCurrentIrql( void );

Memory Resources Kernel Stack Pool Allocations Paged Pool
Nonpaged Pool

Kernel Stack Only 12K in size Overflow will cause BSOD! Guidelines
No deeply nested routines Limit recursion Don’t allocate large data structures on the stack

Pool Allocations Use API’s ExAllocatePool, ExFreePool
Both paged and nonpaged pools Nonpaged pool is a limited resource Pool allocation restrictions Driver routines which can be called at DISPATCH_LEVEL must only use non paged pool allocations Drivers must be executing <= DISPATCH_LEVEL to allocate or free nonpaged memory Remember: No page faults can be taken >= DISPATCH_LEVEL Drivers must be executing at or below APC_LEVEL to allocate or free from the paged pool

Unicode Strings Contents Field UNICODE_STRING, *PUNICODE_STRING
: Current string length in bytes : Max string length in bytes : Pointer to driver allocated buffer holding the real string data Field USHORT Length USHORT Max Length PWSTR Buffer

Unicode String Functions
RtlInitUnicodeString – Initializes a UNICODE_STRING from a NULL terminated Unicode string RtlAnsiStringToUnicodeString – Converts an ANSI string to Unicode RtlCopyUnicodeString – Copies a source string to a destination RtlCompareUnicodeString – Tests equality of two Unicode strings

File I/O Creation Read / Write Close
ZwCreateFile, ZwOpenFile Read / Write ZwReadFile, ZwWriteFile Close ZwClose NOTE: File I/O must be performed at PASSIVE_LEVEL!

Registry Access ZwCreateKey ZwOpenKey ZwQueryKey ZwQueryValueKey
ZwEnumerateValueKey ZwEnumerateKey ZwSetValueKey

Module 4 Driver debugging

Kernel Mode Debuggers KD – Command line debugger Windbg – GUI wraps KD
Softice ($)– No longer supported – XP last OS Syser ($)– Supposedly picks up where Softice left off We will use Windbg

When to use Debugger Target-Host configuration
Local debugging (Very limited) – This is where Softice ruled Analyze crash dumps

Target-Host Configurations
Two physical machines Developer machine – Hosts driver development and debugger Target machine – Debug and run code Virtual Machine (VM) Target machine is a VM on developer computer This is the preferred way but sometimes need to work with hardware so may not work

Two Computer Setup Connect with Null Modem cable 1394 Cable (Firewire)
USB cable We will use VMWare for Target

Target Computer Setup In boot.ini add to end of OS partition:
/debug /debugport=com2 /baudrate=115200 Use COM port that you are using suffices for VM setup

VM Setup (Target) Add a serial port that uses named pipe
\\.\pipe\name_of _pipe

VMWare Settings

Windbg setup File Menu->Kernel Debugging Set Baud Rate
Set port to use named pipe – Same name in VM setup Check Pipe

Microsoft Symbols Create a symbol directory on host computer
Add to Symbol file path SRV*c:\work\symbols*

Additional debug settings
Add user symbols to path I386 directory of driver folder Add source code path Source directory of driver

Popular BSOD Codes for Developers
Stop 0x A or IRQL_NOT_LESS_OR_EQUAL Stop 0x or PAGE_FAULT_IN_NONPAGED_AREA You must know your context – More on this later

Full Crash Dumps Control Panel -> System applet Uncheck restart
Make sure set to do a full memory dump

Helpful Commands Command Description !analyze -v
Displays information about current bug check or exception .reload Reloads symbols .reboot Reboots target machine lm List modules (Drivers) dt Displays information about a variable or data type uf Displays assembly of function in memory !process Displays information about process including EPROCESS block

Driver Verifier Tool Great tool to run before releasing drivers
Location: C:\windows\system32\verifier.exe Puts selected drivers through ringer: Special memory pool allocation Forced IRQL checking Memory Pool tracking I/O verification Deadlock Detection May have to comment out sections of rootkit

Lab Exercise Setup VMWare for debugging Setup WinDBG for debugging
Debug your driver Set breakpoints Try commands

Module 5 Driver Loading

Ways to Load a Driver Configuration Manager SystemLoadAndCallImage
Service Control Manager SystemLoadAndCallImage Kernel level exploit (buffer overflow)

Windows Registry The Windows Registry
Provides persistent, non-volatile storage Composed of Hives, which are files HKEY_LOCAL_MACHINE\SYSTEM \WINDOWS\SYSTEM32\CONFIG\SYSTEM HKEY_LOCAL_MACHINE\SAM \WINDOWS\SYSTEM32\CONFIG\SAM HKEY_LOCAL_MACHINE\SECURITY \WINDOWS\SYSTEM32\CONFIG\SECURITY HKEY_LOCAL_MACHINE\SOFTWARE \WINDOWS\SYSTEM32\CONFIG\SOFTWARE

SYSTEM Hive Ntldr loads the entire HKEY_LOCAL_MACHINE\SYSTEM hive into physical memory at boot time Hive controls Services and Drivers Load Time Load Order Dependencies Parameters

Registry Functions You would use ZW* equivalent
Kernel Registry functions – in the System Call Table NtClose NtCreateKey NtDeleteKey NtDeleteValueKey NtEnumerateKey NtEnumerateValueKey NtFlushKey NtLoadKey NtOpenKey NtQueryKey NtQueryValueKey NtSetValueKey NtUnloadKey You would use ZW* equivalent

The Service Control Manger
The SCM lives in \%SystemRoot%\System32\Services.exe Winlogon process starts the SCM early during boot Scans HKLM\SYSTEM\CurrentControlSet\Services and creates an entry in the service database for each key. If service or a driver is marked for auto-start the SCM starts it and detects startup failures. A database entry includes all the service-related parameters defined for a service.

Driver Registry Entry

Start Values Start SERVICE_BOOT_START (0) SERVICE_SYSTEM_START (1)
Ntldr or Osloader preloads the driver SERVICE_SYSTEM_START (1) Loads and initializes during kernel initialization SERVICE_AUTO_START (2) Loads after the SCM, Services.exe, starts SERVICE_DEMAND_START (3)

Type, Group, Tag Type Group Tag
Specifies the type of service. Since we are going to deal with device driver the only value we can use is SERVICE_KERNEL_DRIVER Group Driver or Service initializes when group is initialized HKEY_LOCAL_MACHINE\SYSTEM\CURRENTCONTROLSET\CONTROL\SERVICEGROUPORDER\LIST Tag Only applies to drivers Order within the group initialization HKEY_LOCAL_MACHINE\SYSTEM\CURRENTCONTROLSET\CONTROL\GROUPORDERLIST

SCM Steps Using the SCM to install a driver
Establishing a connection to the SCM Installing a new driver Starting the driver Uninstalling the driver

Establishing A Connection To The SCM
Call OpenSCManager function to establish a connection to the SCM database. OpenSCManager returns a handle to SCM database that is used in subsequent APIs

Installing A New Driver
Pass SCM handle from OpenSCManager into CreateService CreateService creates a registry entry for your driver under: HKLM\SYSTEM\CurrentControlSet\Services If driver already exists in the SCM, GetLastError returns ERROR_SERVICE_EXISTS. If CreateService successful the handle to driver is returned. (Add to SCM DB) Could this be done by hand? Less Noisy?

Starting A Driver Maps driver image into memory
Fixes up relocations & imported symbols Calls DriverEntry Runs in SYSTEM context If DriverEntry returns STATUS_SUCCSS, StartService will return a non 0 value

Unloading A Driver Call DeleteService to remove the driver’s SCM database entry from the registry Call CloseServiceHandle Note: This function does not actually delete the service right away; it simply marks the service for deletion. The SCM will delete the service only when the service stops running and after all handles to the service have been closed

Walkthrough: ADV_Loader
Creates a service Registers with the Service Control Manager (SCM) Opens the service if it already exists Sends a message to the SCM to start the service

Loading via SystemLoadAndCallImage
The SCM is not an ideal method of installing a rootkit driver Creates registry entries which may result in detection Another Alternative: Use ZwSetSystemInformation Located in ntdll.dll

ZwSetSystemInformation
NTSTATUS ZwSetSystemInformation( IN DWORD SystemInformationClass, IN OUT PVOID SystemInformation, IN ULONG SystemInformationLength ); typedef struct _SYSTEM_LOAD_AND_CALL_IMAGE{ UNICODE_STRING ModuleName; }; Example Usage: SYSTEM _LOAD_AND_CALL_IMAGE MyRootkitImage; ZwSetSystemInformation( SystemLoadAndCallImage, &MyRootkitImage, sizeof(SYSTEM_LOAD_AND_CALL_IMAGE);

DISCUSSION: The differences between proper loading, and using SYSTEMLOADANDCALLIMAGE Image is Pageable Image is loaded in the context of the process that called ZwSetSystemInformation Not SYSTEM context Not a true DRIVER_OBJECT

Walkthrough: BASIC_LOADER
Uses undocumented way to load driver ZwSetSystemInformation SYSTEMLOADANDCALLIMAGE Image is loaded and executed

Lab Exercise: ADV_Loader
Alter adv_loader to work for a driver of your choice

Module 6 Driver communication

Device Communication Packet driven communication model
Uses IRPs (I/O Request Packets) Created by the I/O manager or another driver I/O manager allocates work orders on behalf of the user and sends them down the device stack. IRP’s are “work orders” for the driver Drivers maintain a “dispatch table” that describes which IRP “work orders” they will respond to.

IRP Structure IRP Header Parameters for sub-request
(IO_STACK_LOCATION) Parameters for sub-request (IO_STACK_LOCATION) Parameters for sub-request (IO_STACK_LOCATION)

Analogy to Building a House (1)
An IRP can be thought of in terms of contract for building a house It contains the blueprint for the house Analogy: IRP header Multiple subcontractors will need to work on the house (Plumber, Carpenter, Electrician, ect) Analogy: Multiple drivers may need to respond to an IRP. These are the subcontractors.

Analogy to Building a House (2)
Each of these subcontractors has a toolbox with the tools that he needs to complete his work. Analogy: Each driver needs some data / parameters to respond to the IRP These parameters are stored in an IO_STACK_LOCATION Each driver that responds to the IRP has its own IO_STACK_LOCATION in the IRP. Once all of the subcontractors have done their work, the house is complete. Analogy: Once all of the drivers have finished responding to the IRP, it is marked as complete.

Path of an IRP I/O Subsystem IRP I/O Subsystem Allocates IRP Device
Upper Driver Lower Driver Lowest Driver Device Stack IRP Is Passed Down The Device Stack Hardware Fills IRP With Data Hardware Device

Dispatch Routines Routines that are executed in response to certain events / requests (IRPs) received by the driver IRP dispatch function entry points are filled in the Driver Object’s Major Function Table by the Driver Entry routine pDeviceObject->MajorFunction[IRP_MJ_CREATE] = DispatchCreate pDeviceObject->MajorFunction[IRP_MJ_CLOSE] = DispatchClose

IRP Major Function Codes
IRP_MJ_CREATE IRP_MJ_CLEANUP IRP_MJ_CLOSE IRP_MJ_READ IRP_MJ_WRITE IRP_MJ_DEVICE_CONTROL IRP_MJ_INTERNAL_DEVICE_CONTROL IRP_MJ_SET_INFORMATION IRP_MJ_FLUSH_BUFFERS IRP_MJ_SHUTDOWN

Dispatch Routine Functionality
When the driver receives a request, it looks up the request in the IRP stack using IoGetCurrentStackLocation. These requests can fall into 3 categories: Standard Requests (IRP_MJ_CREATE, IRP_MJ_CLOSE, AND IRP_MJ_CLEANUP) Return (Minimum): STATUS_SUCCESS or error. Programmer Defined Requests IRP_MJ_DEVICE_CONTROL Undefined Requests Return STATUS_NOT_IMPLEMENTED The DriverDispatcher routine functions more or less as the WndProc does to a standard win32 application.

Simple Dispatch Routines
Completing a request NTSTATUS DispatchXxx( IN PDEVICE_OBJECT pDo, IN PIRP pIrp ) { pIrpIoStatus.Status = STATUS_SUCCESS; pIrpIoStatus.Information = 0; //Mark the IRP as complete IoCompleteRequest( pIrp, IO_NO_INCREMENT) return STATUS_SUCCESS }

User / Kernel Communication
Establishing the communication link Receiving a user mode request in a driver Passing data between user / kernel mode Making a driver request from a user mode application Receiving a user mode request in your driver

Kernel Side Setup Driver creates a symbolic link so the user application can obtain a handle to communicate with it Use IoCreateSymbolicLink "\??“ directory contains a symbolic links to the real, internal device names. A Symbolic Link maps a “DOS Device Name” to an “NT Device Name” The device name must appear in a special directory in the Object Manager's namespace, "\??". Internal device names can't be used in Win32 applications

IoCreateSymbolicLink
The IoCreateSymbolicLink routine sets up a symbolic link between a device object name and a user-visible name for the device. NTSTATUS IoCreateSymbolicLink( IN PUNICODE_STRING SymbolicLinkName, IN PUNICODE_STRING DeviceName );

Lab Exercise: Symbolic Link
To your skeleton driver, add code to Create a Device Object For the device type, use FILE_DEVICE_UNKNOWN Create a Symbolic Link View in Winobj

User App Setup The application calls CreateFile with the driver’s DOS Device Name HANDLE hFile; hFile = CreateFile("\\\\.\\Example", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);

User API  Kernel IRP User mode calls with a device handle generate IRPs CreateFile -> IRP_MJ_CREATE CloseHandle -> IRP_MJ_CLEANUP & IRP_MJ_CLOSE WriteFile -> IRP_MJ_WRITE ReadFile-> IRP_MJ_READ DeviceIoControl -> IRP_MJ_DEVICE_CONTROL

User / Kernel Data Passing
Think about what you need… Do you need to be able to access the data from an arbitrary context? Do you need to read the memory at a raised IRQL level? 3 Methods Buffered I/O Direct I/O Neither I/O

Buffered I/O I/O Manager copies data from requestor’s buffer to kernel buffer in non-paged pool Irp->AssociatedIrp.SystemBuffer You have 1 buffer. You read input data from this buffer and you write output data to the same buffer, overwriting the input data

Buffered I/O Pros & Cons
The user mode buffer doesn’t need to be locked down in memory CON Allocates nonpaged memory and performs a copy Overhead in processing read / write request Nonpaged memory is a precious resource Might not be able to allocate enough for a large buffer

Direct I/O Irp->MdlAddress I/O Manager creates MDL
Physical pages are locked into memory

Direct I/O Pros & Cons PRO CON
Good for large buffers because it doesn’t require user memory to be copied CON User buffers are locked in memory until the IRP is completed

Neither I/O Irp->UserBuffer = lpOutputBuffer
I/O Manager passes virtual address of requestor’s buffer directly to the driver

Neither I/O Pros & Cons PRO CON Easiest method
You have to know context Limited to use by a “highest layer” driver If you are running at elevated IRQL, user memory could get paged out and you can’t take a page fault to bring it back in

Making a Driver Request from User Mode
Driver requests are sent via IOCTL IOCTLs are 32 bit numbers that define Transfer type Control code (message type) Access type Device type

More on IOCTLs Use CTL_CODE macro to define IOCTLs
#define IOCTL_EXAMPLE_SAMPLE_BUFFERED_IO CTL_CODE ( FILE_DEVICE_UNKNOWN, 0x802 METHOD_BUFFERED, FILE_READ_DATA | FILE_WRITE_DATA ) NOTE: User defined control codes start at 0x800 +

DeviceIoControl The DeviceIoControl function sends a control code directly to a specified device driver, causing the device to perform the corresponding operation. BOOL DeviceIoControl( HANDLE hDevice, DWORD dwIoControlCode, LPVOID lpInBuffer, DWORD nInBufferSize, LPVOID lpOutBuffer, DWORD nOutBufferSize, LPDWORD lpBytesReturned, LPOVERLAPPED lpOverlapped );

Lab Exercise: IOCTL Define IOCTL codes in your skeleton driver and your basic_loader application called CTL_FOO_METHOD. Use the Buffered I/O Method

Driver Response Support IRP_MJ_DEVICE_CONTROL
In IRP_MJ_DEVICE_CONTROL Dispatch Routine Call IoGetCurrentIrpStackLocation to get the current stack location Switch on: CurrentStackLocationParameters.DeviceIoControl.IoControlCode Complete the request as shown in ‘Simple Dispatch Routine’ slide

IOCTL Dispatch Routine
NTSTATUS OnIoctl( PDEVICE_OBJECT pDeviceObject, PIRP pIRP ) { NTSTATUS NtStatus = STATUS_NOT_SUPPORTED; PIO_STACK_LOCATION pIoStackIrp = IoGetCurrentIrpStackLocation(Irp); if(pIoStackIrp) /* Should Never Be NULL! */ switch(pIoStackIrp->Parameters.DeviceIoControl.IoControlCode) case IOCTL_TEST1: NtStatus = Test1(); break; case IOCTL_TEST2: NtStatus = Test2(); } Irp->IoStatus.Status = NtStatus; Irp->IoStatus.Information = dwDataWritten; IoCompleteRequest(Irp, IO_NO_INCREMENT); return NtStatus;

Lab Exercise: Driver Comms
Define a routine in your driver called OnCtlFooMethod When you receive the IRP_MJ_DEVICE_CONTROL Verify that the control code is valid (i.e. CTL_FOO_METHOD) Return the string “FOO SUCCESS” In the user application send the device I/O control to the driver. Check the output buffer for the string “FOO SUCCESS”

Module 7 Application Dropper

Packing it all up Rootkit could be loaded from a buffer overflow
Rootkit could be single executable

Walkthrough: Arcbot Embeds SYS file as resource in an executable
Writes resource to a file Loads resource file using ZwSetSystemInformation

Lab Exercise: Custom Loader
Use Arcbot as a reference and build your own dropper for your rootkit.

Module 8 Interrupts

Interrupt Descriptor Table (IDT) Hooking
The IDT is a table of function pointers to interrupt handlers 1 IDT entry per interrupt 1 IDT table per processor (multi processor systems) This also includes Hyperthreaded machines (which will have 2 IDT tables) Software Interrupts can be generated using the INT x instruction Hooking the IDT simply involves replacing one of the IDT function pointers with the address of your own routine CAVEAT: If you are dealing with a multi-processor system, remember that multiple IDTs must be hooked

X86 Interrupts

Interrupts a Rootkit Might be Interested in(1)
Debug Interrupts Int 1, Int 3 Covert transfer of control (possibilities for indirect hooking?) Invalid Opcode Interrupt Int 6 Covert transfer of control might be hard to detect an inline hook based upon injection of an invalid opcode Would require full disassembly to identify Page Fault Interrupt Int 0E Control memory view seen by OS & applications

Interrupts a Rootkit Might be Interested in(2)
System Call Interface Interrupt (Win2K) Int 2E Another method to hook the system call table Windows XP and later use SYSENTER Hardware Interrupts Ex Keyboard Interrupt Interrupt bound to IRQ1 in the PIC (programmable interrupt controller) User Defined Interrupts Int Covert communication channel?

Getting the IDT Base Addess
IDT base address stored in the IDTR register

IDT Entry Layout IDT contains 3 types of entries (Task Gate, Interrupt Gate, Trap Gate) We are only interested in the trap / interrupt gate (which have identical layouts)

Reading the IDT with SIDT
CPU // entry in the IDT, this is sometimes called // an "interrupt gate" typedef struct { unsigned short LowOffset; unsigned short selector; unsigned char unused_lo; unsigned char segment_type:4; //0x0E is an interrupt gate unsigned char system_segment_flag:1; unsigned char DPL:2; // descriptor privilege level unsigned char P:1; /* present */ unsigned short HiOffset; } IDTENTRY; /* sidt returns idt in this format */ typedef struct { unsigned short IDTLimit; unsigned short LowIDTbase; unsigned short HiIDTbase; } IDTINFO;

Walkthrough: Basic_interrupt
Examine the interrupt descriptor table Bonus exercise: Use Windbg to dump the IDT and compare the two

Walkthrough: Basic_hook_int
Count the number of times an interrupt is called

Lab Exercise: Interrupts
Compile and run basic_hook_int Count number of interrupts Choose another interrupt to hook, and count how many times it’s called

Module 9 SSDT Hooking

System Call User process calls CreateFile()
CreateFile() in Kernel32.dll NtCreateFile() in ntdll.dll User Mode SYSTENTER Kernel Mode KiSystemServiceTable NtCreateFile in ntoskrnl.exe

Hooking in Kernel Space
The operating system is global memory Does not rely on process context Except when portions of a driver are pageable By altering a single piece of code or a single pointer to code, the rootkit subverts every process on the system.

System Call(1) System Call KiSystemService Call Number 0x11223344
System Service Descriptor Table Table Entry USER MODE KERNEL MODE

System Call(2) System Call Kernel or module Kernel or module
Call Number 0x Table Entry System Service Descriptor Table USER MODE KERNEL MODE

System Call / Rootkit(1)
Kernel or module System Call Some rootkit System Service Descriptor Table USER MODE KERNEL MODE

Kernel or module System Call Some rootkit System Service Descriptor Table USER MODE KERNEL MODE

Kernel or module System Call System Service Descriptor Table Some rootkit USER MODE KERNEL MODE

MEMORY Protection XP and later version of the operating system
Protect the System Call Table Protect the Interrupt Descriptor Table Protect code segments Writing to read and execute only memory causes BSoD

Removing Memory Protection In The Kernel
3 Methods of Removing Globally Cr0 Locally MDL Direct PTE modification

Removal via Cr0 CR0—Contains system control flags that control operating mode and states of the processor NOTE: When the processor is in supervisor mode and the WP flag (bit 17) in register CR0 is clear, write protection is turned off.

Cr0 Removal Code Modify Memory Here… // REProtect memory
// UNProtect memory __asm { push eax mov eax, CR0 and eax, 0FFFEFFFFh mov CR0, eax pop eax } Modify Memory Here… // REProtect memory __asm { push eax mov eax, CR0 or eax, NOT 0FFFEFFFFh mov CR0, eax pop eax }

Removal via MDL Although CR0 is easier to code, MDL flags can also be set. This may be a more ‘proper’ way to do this. MDLs (Memory Descriptor Lists) contain the start address, owning process, number of bytes, and flags for a memory region typedef struct _MDL{ struct _MDL *Next; CSHORT Size; CSHORT MdlFlags; struct _EPROCESS *Process; PVOID MappedSystemVa; PVOID StartVa; ULONG ByteCount; ULONG ByteOffset; }MDL, *PMDL;

Removal via MDL: Steps MMCreateMdl - Define the start address and size of the memory region you want to remove protection on MMBuildMdlForNonPagedPool – map the MDL Change flags on MDL by ORing MdlFlags with MDL_MAPPED_TO_SYSTEM_VA MmMapLockedPages - Lock the MDL pages in memory

Discussion MDL flags Although CR0 is easier to code, MDL flags can also be set. This may be a more “proper” way to do this. Code is in zip file called basic_mdl_flags

Walkthrough: basic_hook_cr0
Gets the system service descriptor table Finds the offset of the system call in the System Service Descriptor Table (SSDT) Flips the 17th bit in the CR0 register to allow the write Uses Interlocked Exchange for safe swap

Lab Exercise: Process Context
Which process is in context? Implemented GetProcessNameOffset() and GetProcessName() functions Uses PSGetCurrentProcess() Print the offset for the process name Use InstDrv to load the driver. Note the process name Bonus: Use SystemLoadAndCallImage. What is the process name that loaded the driver?

System Service Descriptor Table (SSDT)
Exported by the kernel as KeServiceDescriptorTable Contains pointers to core services implemented in ntoskrnl.exe Services are associated with an ID (index into System Service Dispatch Table) Contains pointer to System Service Parameter Table (SSPT) Specifies the # of functions parameter bytes for each service

KeServiceDescriptorTable Layout
System Service Dispatch Table KeServiceDescriptorTable 0x804A x806FF3A0 System Service Dispatch Table ServiceCounterTable NumberOfServices System Service Parameter Table System Service Parameter Table C 2B

Making a System Call Load eax with the index of the service in the SSDT Load edx with pointer to user mode function parameters Make the system call Win2K – Issue INT 2E WinXP – issue SYSENTER System call invokes KiSystem service Looks up index in SSDT Calls ntosrknl.exe function Returns results back to user mode

Determining SSDT Indexes
System services indexes are OS version specific Detect OS version and hard code indexes for each version of the OS service you want to hook Determine dynamically Look up exported function in Ntdll.dll Functions follow consistent pattern mov eax, index Extract index from code

NtXX and ZwXX ntoskrnl.exe: NTDLL.DLL
ZwXX calls through INT 2E or SYSENTER NtXX is actual code for function NTDLL.DLL Both ZwXX and NtXX call through INT 2E or SYSENTER (both exports point to same code)

dumpbin /EXPORTS c:\windows\system32\ntoskrnl.exe
See the Exports dumpbin /EXPORTS c:\windows\system32\ntoskrnl.exe B 00024F78 ZwQueryDirectoryObject C 00024F8C ZwQueryEaFile D 00024FB4 ZwQueryFullAttributesFile E 00024FDC ZwQueryInformationFile F 00024FF0 ZwQueryInformationJobObject ZwQueryInformationProcess C ZwQueryInformationThread ZwQueryInformationToken ZwQueryInstallUILanguage ZwQueryKey CC ZwQueryObject C ZwQuerySection ZwQuerySecurityObject ZwQuerySymbolicLinkObject ZwQuerySystemInformation A E4 ZwQueryValueKey B C ZwQueryVolumeInformationFile C C ZwReadFile D ZwReplaceKey WARNING: The system call Sometimes is NOT EXPORTED From NTOSKRNL !!

Compare NTDLL and NTOSKRNL
You cannot link to these (red) E 0000B628 ZwConnectPort F 0000B638 ZwContinue B648 ZwCreateDebugObject B658 ZwCreateDirectoryObject B668 ZwCreateEvent B678 ZwCreateEventPair B688 ZwCreateFile B698 ZwCreateIoCompletion B6A8 ZwCreateJobObject B6B8 ZwCreateJobSet B6C8 ZwCreateKey C5A8 ZwCreateKeyedEvent A 0000B6D8 ZwCreat slotFile B 0000B6E8 ZwCreateMutant F C ZwConnectPort F B8 ZwCreateDirectoryObject F CC ZwCreateEvent F F4 ZwCreateFile FA C ZwCreateJobObject FB ZwCreateKey FC F8 ZwCreateSection FD ZwCreateSymbolicLinkObject FE ZwCreateTimer FF D4 ZwDeleteBootEntry E8 ZwDeleteFile FC ZwDeleteKey ZwDeleteValueKey ZwDeviceIoControlFile C ZwDisplayString ZwDuplicateObject ZwDuplicateToken NTDLL NTOSKRNL

The Code in NTOSKRNL .text:00424604 ZwClose proc near
.text: mov eax, 19h .text: lea edx, [esp+Handle] .text: D pushf .text: E push 8 .text: call sub_4594AD .text: retn 4 .text: ZwClose endp ; sp = -8 PAGE:004C375E NtClose proc near PAGE:004C375E mov eax, large fs:124h PAGE:004C movsx eax, byte ptr [eax+140h] PAGE:004C376B push 0 PAGE:004C376D push eax PAGE:004C376E push [esp+8+Handle] PAGE:004C call sub_4C35CC PAGE:004C retn 4 PAGE:004C3777 NtClose endp

Zoom in the Opcodes .text:00424604 mov eax, 19h .text:00424604 B8 19

The Macro _function+1 .text:00424604 _function
#define SYSTEMSERVICE(_function) KeServiceDescriptorTable.ServiceTableBase[ *(PULONG)((PUCHAR)_function+1)] _function+1 .text: B8 19 00 00 00 _function _function+1 _function+2 …

19 Offset into SSDT KeServiceDescriptorTable.ServiceTableBase[ ] …
19, 0x004C375E PAGE:004C375E mov eax, large fs:124h PAGE:004C movsx eax, byte ptr [eax+140h] PAGE:004C376B push 0 PAGE:004C376D push eax PAGE:004C376E push [esp+8+Handle] … …

How to Query NTDLL From kernel mode, you can query NTDLL to find the call number This is NOT Linking This is performed at RUNTIME

Load NTDLL C:\WINDOWS\SYSTEM32\NTDLL.DLL Your Driver NTDLL.DLL

Parse the PE EAT Your Driver NTDLL.DLL 911 38E 0000B628 ZwConnectPort
F 0000B638 ZwContinue B648 ZwCreateDebugObject B658 ZwCreateDirectoryObject B668 ZwCreateEvent B678 ZwCreateEventPair B688 ZwCreateFile B698 ZwCreateIoCompletion B6A8 ZwCreateJobObject B6B8 ZwCreateJobSet B6C8 ZwCreateKey C5A8 ZwCreateKeyedEvent A 0000B6D8 ZwCreat slotFile B 0000B6E8 ZwCreateMutant

Find the Function .text:00435604 ZwCreateDebugObject proc near
E 0000B628 ZwConnectPort F 0000B638 ZwContinue B648 ZwCreateDebugObject B658 ZwCreateDirectoryObject B668 ZwCreateEvent .text: ZwCreateDebugObject proc near .text: mov eax, 27h .text: lea edx, [esp+Handle] .text: D int 2E .text: retn 4

Walkthrough: Macro Using the MACRO to find the System Call number

Walkthrough: basic_hook_noexp
Parses PE format of NTDLL.DLL to determine call numbers of system calls not exported by NTOSKRNL.EXE

Hooking the SSDT Interface
Win2K Interrupt hook on INT 2E WinXP SYSENTER routine is contained in a model specific register (MSR) IA32_SYSENTER_EIP Replace value of IA32_SYSENTER_EIP with address of hook function NOTE: calls can be filtered because eax contains the index of the requested service

Hooking a Specific Service
Simply involves replacing ntoskrnl.exe pointer in SSDT with hook function Use InterlockedExchange to prevent race conditions NOTE: hook function should be present in nonpaged memory if the function is ever called at elevated IRQL

Lab Exercise: Exported Call Hook
Add a NEW call hook to basic_hook_cr0 Make sure the call is exported from NTOSKRNL Hint: IDA Pro displays exports Microsoft Depends displays exports dumpbin /EXPORTS ntoskrnl.exe

Lab Exercise: A New Call Hook
Make a new call hook on some function that is not exported from ntoskrnl.exe Dump the arguments that are being passed Print the name of the process making the call Hint: ZwCreateProcess isn’t used on XP it seems, so you might want to try a different one

Hiding a Process Using Hooks
Possible by hooking ZwQuerySystemInformation Returns buffer containing structures _SYSTEM_PROCESSES _SYSTEM_THREADS

_SYSTEM_PROCESSES struct _SYSTEM_PROCESSES { ULONG NextEntryDelta;
ULONG ThreadCount; ULONG Reserved[6]; LARGE_INTEGER CreateTime; LARGE_INTEGER UserTime; LARGE_INTEGER KernelTime; UNICODE_STRING ProcessName; KPRIORITY BasePriority; ULONG ProcessId; ULONG InheritedFromProcessId; ULONG HandleCount; ULONG Reserved2[2]; VM_COUNTERS VmCounters; IO_COUNTERS IoCounters; //windows 2000 only struct _SYSTEM_THREADS Threads[1]; };

_SYSTEM_THREADS struct _SYSTEM_THREADS { LARGE_INTEGER KernelTime;
LARGE_INTEGER UserTime; LARGE_INTEGER CreateTime; ULONG WaitTime; PVOID StartAddress; CLIENT_ID ClientIs; KPRIORITY Priority; KPRIORITY BasePriority; ULONG ContextSwitchCount; ULONG ThreadState; KWAIT_REASON WaitReason; };

Walkthrough: basic_hook_hide_proc
Hides a process using a prefix match Exempts a process using a prefix match

Lab Exercise: Hide Process
Modify basic_hook_hide_proc to include proper memory unprotects Modify basic_hook_hide_proc to use a different match prefix Verify operation

Lab Exercise: Help Hint: Use basic_hook_cr0
BONUS: Use basic_mdl_flags Alter the name of the process that will be hidden memcmp( process_name.Buffer, "_root_", 6)) Test the driver

Hiding a File Using Hooks
Possible by hooking ZwQueryDirectoryFile Returns buffer containing structures FileDirectoryInformation FileFullDirectoryInformation FileBothDirectoryInformation FileNamesInformation

A Closer Look at NTQueryDirectoryFile
File hiding is performed by modifying NextEntryOffset fields typedef struct _FILE_DIRECTORY_INFORMATION { ULONG NextEntryOffset; ULONG Unknown; LARGE_INTEGER CreationTime; LARGE_INTEGER LastAccessTime; LARGE_INTEGER LastWriteTime; LARGE_INTEGER ChangeTime; LARGE_INTEGER EndOfFile; LARGE_INTEGER AllocationSize; ULONG FileAttributes; ULONG FileNameLength; WCHAR FileName[1]; } FILE_DIRECTORY_INFORMATION, *PFILE_DIRECTORY_INFORMATION; for FileFullDirectoryInformation: typedef struct _FILE_FULL_DIRECTORY_INFORMATION { ULONG EaInformationLength; } FILE_FULL_DIRECTORY_INFORMATION, *PFILE_FULL_DIRECTORY_INFORMATION; typedef struct _FILE_BOTH_DIRECTORY_INFORMATION { ULONG NextEntryOffset; ULONG Unknown; LARGE_INTEGER CreationTime; LARGE_INTEGER LastAccessTime; LARGE_INTEGER LastWriteTime; LARGE_INTEGER ChangeTime; LARGE_INTEGER EndOfFile; LARGE_INTEGER AllocationSize; ULONG FileAttributes; ULONG FileNameLength; ULONG EaInformationLength; UCHAR AlternateNameLength; WCHAR AlternateName[12]; WCHAR FileName[1]; } FILE_BOTH_DIRECTORY_INFORMATION, *PFILE_BOTH_DIRECTORY_INFORMATION; and for FileNamesInformation: typedef struct _FILE_NAMES_INFORMATION { } FILE_NAMES_INFORMATION, *PFILE_NAMES_INFORMATION;

Walkthrough: basic_hook_hide_file
Hooks ZwQueryDirectoryFile

Lab Exercise: File/Directory Hiding
Alter the prefix used on file/directory hiding Bonus: can anyone find a way to get around the file hiding? Bonus: what happens if you view the files using an SMB share?

Module 10 irp hooking

IRP Hooking Involves replacing the pointers in the IRP Major Function table of a driver Provides an alternative to writing a filter driver Applications: Intercepting file system writes to hide files Intercepting TCP queries to hide ports

IRP Review Recall: Supported IRP Major function codes…
IRP_MJ_CREATE IRP_MJ_CLEANUP IRP_MJ_CLOSE IRP_MJ_READ IRP_MJ_WRITE IRP_MJ_DEVICE_CONTROL IRP_MJ_INTERNAL_DEVICE_CONTROL IRP_MJ_SET_INFORMATION IRP_MJ_FLUSH_BUFFERS IRP_MJ_SHUTDOWN The IRP function code that you choose to hook is dependent upon what goal you are specifically trying to acomplish

Finding the Driver’s IRP Function Table
The IRP function table is located in the Driver Object Recall: the Device Object contains a back pointer to the Driver Object DRIVER_OBJECT Driver Object Start I/O Routine Device Object Unload Routine Device Object DriverStartIo DriverUnload Dispatch Routine Device Object MajorFunction[] Dispatch Routine .

IoGetDeviceObjectPointer()
We can locate a driver’s Device Object using IoGetDeviceObjectPointer NTSTATUS IoGetDeviceObjectPointer( IN PUNICODE_STRING ObjectName, IN ACCESS_MASK DesiredAccess, OUT PFILE_OBJECT *FileObject, OUT PDEVICE_OBJECT *DeviceObject ); The IoGetDeviceObjectPointer routine returns a pointer to a named device object and corresponding file object, if the requested access to the objects can be granted.

Hooking an IRP 2 Steps Hooking the Dispatch Routine
Simply overwrite the pointer to the Dispatch Routine in the driver’s IRP Major Function table Use InterlockedExchange to avoid race conditions Need to save the old pointer so you can restore it if you need to be able to unload your rootkit Hooking the Completion Routine Recall: When we intercept the IRP on it’s way down the driver stack in our hooked dispatch routine, it’s still empty! We need to install a completion routine in order to “sniff” / filter the actual data when its filled into the IRP

Hooking the Completion Routine
2 Cases No completion routine exists for hooked IRP Set one the normal way using IoSetCompletionRoutine A completion routine already exists for the IRP we hooked Save the old completion routine Located at pIrpStack->CompletionRoutine Insert our new completion routine using InterlockedExchange Note 1: the new completion routine should call the old completion routine after it finishes with the IRP Note 2: Completion routines can be called at IRQL = DISPATCH_LEVEL

Example: Using IRP Hooking to Hide TCP Queries
We can hide network ports from applications like netstat by hooking TCP.SYS We are interested specifically in IRP’s with the user defined control code IOCTL_TCP_QUERY_INFORMATION_EX This structure returns the list of ports to programs like netstat Since it is a user defined IOCTL, we need to place our initial hook on IRP_MJ_DEVICE_CONTROL and then filter the IRPs for those of type IOCTL_TCP_QUERY_INFORMATION_EX The input buffer should be cast to a type TDIObjectID Its uses the transfer method METHOD_NEITHER which means its located at pIrpStack->Parameters.DeviceIoControl.Type3InputBuffer

TDIObjectID The TDIObjectID structure is part of the TCP_REQUEST_QUERY_INFORMATION_EX structure that is used with IOCTL_TCP_QUERY_INFORMATION_EX to specify the kind of information being requested from the TCP driver. typedef struct { TDIEntityID toi_entity; unsigned long toi_class; unsigned long toi_type; unsigned long toi_id; } TDIObjectID;

TDIEntityID (1) The TDIEntityID structure is used as a part of the TDIObjectID structure to represent information about TDI drivers retrieved using IOCTL_TCP_QUERY_INFORMATION_EX. typedef struct { unsigned long tei_entity; unsigned long tei_instance; } TDIEntityID;

TDIEntityID (2) tei_entity - identifies the type of entity being addressed. The value can be one of the following. We are only interested in the CL_TL_ENTITY & CO_TL_ENTITY (udp) GENERIC_ENTITY Used when requesting a list of all entities. AT_ENTITY Identifies an address-translation (AT) entity. CL_NL_ENTITY Identifies a connectionless (CL) network-layer (NL) entity. CO_NL_ENTITY Identifies a connected, directed-packet (CO) network-layer (NL) entity. CL_TL_ENTITY Identifies a connectionless (CL) transport-layer (TL) entity. CO_TL_ENTITY Identifies a connected, directed-packet (CO) transport-layer (TL) entity. ER_ENTITY Identifies an Echo-Request/Echo-Reply (ER) entity. IF_ENTITY Identifies an interface entity.

toi_id toi_id: An ID used to determine the format of the output buffer
We are interested in values 0x101, 0x102, and 0x110 0x101 = CONNINFO101 0X102 = CONNINFO102 //contains pid that owns port 0X103 = CONNINFO110 EX: CONNINFO102 typedef struct _CONNINFO102 { unsigned long status; unsigned long src_addr; unsigned short src_port; unsigned short unk1; unsigned long dst_addr; unsigned short dst_port; unsigned short unk2; unsigned long pid; } CONNINFO102, *PCONNINFO102;

Filter DEVICE_CONTROL
IF we receive an IRP with a control code of IOCTL_TCP_QUERY_INFORMATION_EX AND The entity is of type CL_TL_ENTITY or CO_TL_ENTITY The toi_id is 0x101 or 0x102 or 0x110 THEN Set a completion routine on the IRP so we can filter the results after the data is filled in routine on the IRP so we can filter the results after the data is filled in

The CompletionRoutine
Is called after TCPIP.SYS has filled in the output buffer in the IRP with a structure for each existing TCP port on the host: These will be of type CONNINFO101, CONNINFO102, OR CONNINFO110 You can hide ports from netstat by parsing CONNINFO* structure and changing the status value to 0 Common status values 0 = Invisible 1 = Closed 2 = Listening 3 = SYN_SENT 4 = SYN_RECIEVED 5 = ESTABLISHED . 9 = CLOSING

Walkthrough: TCPIRPHook
Under TCPIRPHook directory InstallTCPDriverHook: Places the hook IRP_MJ_DEVICE_CONTROL in the Major function table contained in the DriverObject HookedDeviceControl: Filters the IRPs to determine which ones to set a completion routine on IoCompletionRoutine: Parses the output buffer after it has been filled in with conection data and hides ports by modifying the status values in the CONNINFO structures stored in the buffer

Lab Exercise: Hide Ports
Under TCPIRPHook directory Modify the Completion Routine in Rootkit.c to hide a connection by port or destination IP Test your modifications Connect to another machine Use netstat –a to display connections

Module 11 Build custom rootkit

Lab Exercise Use the driver skeleton and user application skeleton provided in the lab folder Implement code in the driver and user application using hooking techniques discussed so far User Application should drop the driver and load it on issue of first command User Application will remove registry entries and driver on removal

Commands to implement Command Description [-proc] PID
Hides the process [-fs] FILE_NAME Hides files and directories that match FILE_NAME [-net] #PORT Hides TCP network connections that match PORT [-remove] Unloads the driver and removes registry settings

Operational Options Can the driver survive reboot and still hide files/folders and network ports? (Persistence) What about children of process hidden? Can these be hidden? There’s a lot of time to do this lab but don’t worry if can‘t get to this. We will still review the code for these options.

Direct Kernel Object Manipulation (DKOM)
Module 12 Direct Kernel Object Manipulation (DKOM)

Direct Kernel Object Manipulation (DKOM)
Direct Kernel Object Manipulation (DKOM) in memory A device driver or loadable kernel module has access to kernel memory A sophisticated rootkit can modify the objects directly in memory in a relatively reliable fashion to hide. Recall the goal of rootkits is to hide things: processes, files, and network connections.

Operating System Design
Intel has four privilege levels or rings Microsoft and many other OS vendors use only two rings

Operating System Design
By only using two privilege levels, there is no separation between the kernel itself and third party drivers or loadable kernel modules (LKM’s) Drivers can modify the memory associated with kernel objects such as those that represent a process’s token

DKOM Uses DKOM Uses Hide Processes Add Privileges to Tokens
Add Groups to Tokens Manipulate the Token to Fool the Windows Event Viewer Hide Ports

Discussion: Implication of Hidden Processes
The intruder has full control of the system. Defeats a Host Based IDS/IPS that depends upon the underlying operating system. Could skew the results of forensic examinations.

Hiding Processes - Windows
KPRCB *CurrentThread *NextThread *IdleThread ETHREAD KTHREAD ApcState EPROCESS EPROCESS EPROCESS KPROCESS KPROCESS KPROCESS LIST_ENTRY { LIST_ENTRY { LIST_ENTRY { FLINK FLINK FLINK BLINK } BLINK } BLINK }

Locate the Processor Control Block (KPRCB) Located at 0xffdff120 fs register in kernel mode points to 0xffdff000 Within the KPRCB is a pointer to the Current Thread block (ETHREAD) Located at fs:[124] or 0xffdff124 An ETHREAD contains a KTHREAD structure

The KTHREAD structure contains a pointer to the EPROCESS block of the current process The EPROCESS block contains a LIST structure, which has a forward and backward pointer to active processes This creates the doubly linked list of active processes in Windows

To hide a process Locate the EPROCESS block of the process to hide Change the process behind it to point to the process after the process you are hiding Change the process after it to point to the process before the one you are trying to hide Essentially, the list of active now processes points “around” the hidden process

Why does the process continue to run? Scheduling in the Windows kernel is thread based and not process based. Although scheduling code to run is based upon threads, when the kernel reports what is running on the system, it reports based upon EPROCESS blocks. which can be modified with no adverse effect. This is what current tools (IDS/IPS’s) rely upon to discover what is running on the system.

Synchronization Issues
Modifying shared objects such as the active process list is not completely safe. Rootkit could be swapped out Multiprocessor issues In Windows, the list of active processes is protected by PspActiveProcessMutex. PsLoadedModuleResource guards the list of device drivers.

Synchronization Issues
Problem: These symbols are not exported by the operating system Need a way to find these and other symbols Hardcoding addresses – very unreliable Search for patterns in memory Functions within the kernel use PspActiveProcessMutex Find the mutex’s use within functions with a relatively consistent pattern. Alternately: Raise all CPUs to DISPATCH_LEVEL

Walkthrough: FU_Rootkit
FU process hiding

Hiding Modules - Windows
There is a doubly linked module list similar to the process list //Undocumented module entry structure Yypedef struct _MODULE_ENTRY{ LIST_ENTRY module_list_entry; DWORD unknown1[4]; DWORD base; DWORD unknown2; UNICODE_STRING driver_Path; UNICODE_STRING driver_Name; //… }MODULE_ENTRY, *PMODULE_ENTRY;

Hiding Modules - Windows
Unlike process list, no exported symbol or function for obtaining the linked list of modules Can locate module list indirectly via Driver Object structure DRIVER_OBJECT p_modEntry MODULE_ENTRY MODULE_ENTRY MODULE_ENTRY LIST_ENTRY { LIST_ENTRY { LIST_ENTRY { FLINK FLINK FLINK BLINK } BLINK } BLINK }

Lab Exercise: Module Hiding
Using the hide driver project – Find the Driver section address with Windbg Code address in source and hide drivers. Use Drivers program to hide drivers of choice – Remember context sensitive Why does QueryDriver still show driver?

Token Manipulation Add Privileges to Token Add Groups to Token
Make the Owner of the Token Any User Make Any Actions Taken by the Process Appear to be Someone else such as System Makes forensics difficult Totally fakes out the Windows Event Viewer

Tokens Static Part TOKEN SOURCE TokenId AuthenticationId ParentTokenId
ExpirationTime TokenLock ModifiedId SessionId UserAndGroupCount RestrictedSidCount PrivilegeCount VariableLength Etc…

Tokens Variable Part Privileges User and Groups Restricted SID’s LUID
Attribute User and Groups Pointer to SID Restricted SID’s

Manipulating Tokens Difficult to just grow the token because you are not sure what is after the variable part in memory Although static portion has pointers to the privileges and groups, just changing these to point to newly allocated memory does not work due to crazy math in a SepDuplicateToken() function

Manipulating Tokens There are a lot of Privileges in a token that are disabled We can discard these since they are disabled anyway and free up space for new privileges and groups The “in-line” method

Adding Tokens to Privileges with DKOM
Typedef struct _LUID_AND_ATTRIBUTES { DWORD Luid; DWORD Attributes; }

Adding Tokens to Privileges with DKOM
Static Portion 0x LUID LUID 0x 0x LUID 0x LUID 0x LUID Disabled Priv’s SID’s Enabled Priv’s Added Priv’s Restricted SID’s

Walkthrough: Privilege Altering
FU privilege altering

Adding Groups to Tokens with DKOM
Typedef struct _SID_AND_ATTRIBUTES { DWORD pSID; DWORD Attributes; }

Adding Groups to Tokens with DKOM
Static Portion 0x LUID LUID 0x 0x LUID pSID 0x pSID 0x SID Disabled Priv’s SID’s SID Enabled Priv’s Added SID’s Restricted SID’s

Walkthrough: Adding Groups
FU group additions

Faking out Windows Event Viewer with DKOM
Change one DWORD in Static Portion of Token SYSTEM_LUID = 0x000003E7 Make FIRST SID in Token the System SID All logging of the Process now appears as System Useful if Detailed Process Tracking is Enabled

Walkthrough: Altering Logging
FU AUTH_ID

Module 13 Detour patching

Inline Hooking Detour patching – also called inline function patching.
Involves over writing the first few bytes of the target function with an unconditional, direct jump to a detour (hook) function.

What do we do with the Overwritten Bytes?
Dynamically hook / unhook? Replace bytes when hook is called and restore hook when function returns. Race conditions  Can loose hook! Save bytes into a trampoline function? Bytes are appended with an unconditional jump pointing to the remainder of the target function and are executed after the hook function has executed MS Detours library takes this approach

Inline Hooked Execution Path (using detour / trampoline functions)

Inline Hooking Pros and Cons
It allows for generic, non API function hooking (i.e. non exported functions). The function will be hooked regardless of how the function address is resolved (i.e. load time vs. runtime) Can be more difficult to detect than table based hooking (IAT / EAT) (especially if hook is polymorphic or deeply embedded) More on this shortly  Cons: Complexity – Underlying code structure affects hook stability! May require disassembly engine!

Function Prologs / Epilogs
Compiler generated code at the beginning of a function Sets up stack frame and allocates local variables Function Epilog Compiler generated code at the end of a function Undoes the prolog (i.e. dismantles stack frame and deallocates local variables)

Function Prologs / Epilogs How does this relate to Hooking?
Typical assembly generated for function prologs Pre-XP SP2 55 push ebp 8BEC mov ebp, esp … … Post-XP SP2 8BFF mov edi, edi An inline hook must overwrite a min of 5 bytes to insert the jump. Here we will have to overwrite some additional arbitrary instructions beyond the prolog! As for XP-SP2, inline hooking is easier. MS introduced mov edi, edi to pad up function prologs to exactly 5 bytes Used for “hot patching”.

Instruction Boundary Alignment
55 8B EC 53 33 DB 38 5D 24 PUSH MOV PUSH XOR CMP EA AA AA AA AA 08 00 FAR JMP CMP EA AA AA AA AA 08 00 90 90 FAR JMP NOP NOP We pad up to the next instruction boundary w/ NOPs !!!

XREFS  From Overwritten Bytes
What’s wrong here? When we relocate these 3 instructions to our trampoline, we will relocate a relative jump, making the jump offset invalid! Rare problem, but it can happen. For maximum reliability need disassembler to determine jumps and fix up relative offsets. SomeFunc: .text:01001D60 3B FE cmp edi, esi .text:01001D jz short loc_1001D68 .text:01001D B 0E mov cx, [esi] .text:01001D inc edi .text:01001D inc edi We need to overwrite these 3 instructions

XREFS  To Overwritten Bytes
What’s wrong here? The cross reference back into the 3 relocated instructions no longer falls on the original instruction boundary and will cause a crash! Difficult to handle this situation because you need full disassembly w/ data flow analysis! SomeFunc: (Before Hook) .text:01001D60 3B FE cmp edi, esi .text:01001D B 0E mov cx, [esi] .text:01001D inc edi .text:01001D inc edi text:01001D jz short loc_1001D64 We need to overwrite these 3 instructions SomeFunc: (After Hook) .text:01001D60 8B 00 F jmp 0x0060F000 ..text:01001D inc edi .text:01001D inc edi text:01001D jz short loc_1001D64

Inline Hook Algorithm Enumerate processes for target process handle
Inject Inline hooking routine and new API hook function (or .dll which includes both) into the target processes address space Can use any of the aforementioned code injection techniques Inline hooking routine tasks: Locate the address of the function being hooked Disassemble forward, checking instruction length until you have 5 bytes worth of instructions Save old instructions into buffer Patch in jmp 0xXXXXXXXX where XXXXXXXX is the address of your hook function Pad up space between jmp and next instruction w/ NOPs Hook routine tasks Hook code Execute saved instructions Jmp back to instruction following the hook

Walkthrough: Migbot Uses an inline function hook on the original function Note: You may need to change the memory protections to write to the code page Requires you to know what the original function looked like Could have written a run-time disassembly engine Could have embedded the inline hook deep in the function

Lab Exercise: Migbot Alter MIGBOT to detour another function
Hint: Use Windbg to get the original bytes

Module 14 Detection Methods

Detecting Rootkits That Hook
Behavioral Detection Detecting the hook / hooking behavior Kaspersky and some other Personal Security Products Cross View Detectors Comparing high and low level queries Rootkit Revealer Blacklight Integrity Checkers Tripwire Not talking about prevention – i.e. PatchGuard

Detecting Hooks Detect diverted execution paths
Detect alterations in the number, order, and frequency of system calls. May suffer from false positives. “Normal” software may do a rootkit-like technique Most end users don’t have skill to screen these out

Positive Hits Hooks are an obvious deviation from normal in most cases
Addresses in the Import Address Table (IAT) or the function itself should be within the corresponding DLL or module System Calls should be within the kernel (NTOSKRNL.EXE)

Where to look System Call Table Interrupt Descriptor Table
Import Address Table Inline function hooks

Hook Detection Pros and Cons
Useful for identifying execution path hooking rootkits (which includes most rootkits)! Can identify compromises by new, previously unknown rootkits! Cons Not very useful for identifying non execution path hooking rootkits (i.e. rootkits using DKOM) Sometimes difficult to define “normal” Problems with false positives (i.e. some legit applications like firewalls and AV install hooks) Can identify general rootkit compromises but not specific rootkits

Cross View Detections Cross View Detections can also be used to detect rootkits that modify the execution path by hooking…

Cross View Based Detection
Uses two views of same information Example: Walk the list of EPROCESS structures in memory Call ZwQuerySystemInformation Compares results Any differences are reported Often uses undocumented structures Occasionally uses “clean” and “dirty” boots for comparison

Cross View Detections Leverages that there is usually “more than one way to ask the same question” Compares the results of a high level (API) and low level queries Discrepancy indicates the presence of subversive code Cross View Detection Tools Rootkit Revealer Strider Ghostbuster (Old Research) Blacklight

Cross View Detections Pros and Cons
Effective against both old and new rootkits! Fairly difficult to subvert the low level query using common & simple rootkit tricks Cons Difficult implementation of low level query Can’t rely upon any OS support Should communicate directly w/ hardware to minimize possibility of subversion May suffer from false positives

Rootkit Revealer Freeware tool by Sysinternals
Detects “persistent” rootkits (meaning they persist between reboots) i.e. exist on the file system and in the Windows registry Rootkit revealer compares the results of an enumeration of files and registry keys using high level APIS with the results obtained by parsing the raw NTFS file system and registry hives Detects “persistent” rootkits that hook (meaning they persist between reboots) Suffers from some false positives

Strider Ghostbuster MS research tool (kind of old but worth talking about) Basically an extension of Rootkit Revealer techniques Added detection for hidden processes Support for both “Inside the Box” (online) and “Outside the Box” (offline) scans

Integrity Checkers Compares a current snapshot of memory or the file system with a known, trusted baseline Differences indicate rootkit activity Not usually capable of pinpointing the origin of the activity that caused the changes Examples Tripwire

Tripwire Disk based integrity checker
Creates a trusted database of unique CRC hash values for system files During a system scan, it recalculates CRCs and compares them to the originals Premise is that system files should not change, thus mismatch indicates compromise Not effective against modern rootkits who modify system files in memory rather than on disk

Lab Exercise: Hook Detector
Use code pieces in lab folder Detect running instance of your rootkit

Module 15 networking

NDIS Network Driver Interface Specification
NDIS Miniport Drivers: Layer 2 (Data-Link) of the OSI reference model NDIS Protocol Drivers: Layer 3 (Network) and/or Layer 4 (Transport) of the OSI reference model Encapsulates adapter drivers The NDIS library (ndis.sys): Exports functions for use by higher layer drivers such as TDI transports or NDIS protocol drivers Wraps HAL functions for use by NDIS miniport drivers

TDI transport or NDIS Protocol Driver
Simple View TDI Client NDIS protocol drivers are kernel-mode protocol drivers. They process IRPs from TDI Clients and callbacks from the NDIS Library, performing protocol specific tasks. TCP, UDP, IP, NetBEUI, IPX/SPX TDI transport or NDIS Protocol Driver NDIS NDIS miniport drivers are kernel-mode drivers that interface NDIS protocol drivers to specific network adapters. They do not process regular IRPs, and are completely encapsulated by the NDIS Library NDIS Library NDIS miniport NDIS Library HAL Hardware

NDIS Intermediate Drivers
An intermediate protocol driver interfaces between protocol drivers and a miniport driver using the NDIS Library. Not covered in this class. Microsoft provides a sample driver

Walkthrough: basic_ndis_1
Goals Get an indication of a received packet (but not the entire packet yet).

Writing an NDIS Protocol Driver
STEP 1: Register callback handlers with the NDIS Library (known as a protocol driver’s lower edge). aProtocolChar.MajorNdisVersion = 4; aProtocolChar.MinorNdisVersion = 0; aProtocolChar.OpenAdapterCompleteHandler = OnOpenAdapterDone; aProtocolChar.CloseAdapterCompleteHandler = OnCloseAdapterDone; //… NdisRegisterProtocol( &aStatus, &gNdisProtocolHandle, &aProtocolChar, sizeof(NDIS_PROTOCOL_CHARACTERISTICS ));

STEP 2: Identify the adapter your NDIS protocol driver will bind to. Check the registry HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\NetworkCards or HKLM\SYSTEM\CurrentControlSet\Services\TcpIp\Linkage An example adapter name: \Device\{357AC276-D8E7-47BF-954D-F3123D3319BD} This is the unique device name registered by your adapter’s miniport driver.

STEP 3: Bind to an adapter. VOID NdisOpenAdapter( OUT PNDIS_STATUS Status, OUT PNDIS_STATUS OpenErrorStatus, OUT PNDIS_HANDLE NdisBindingHandle, OUT PUINT SelectedMediumIndex, IN PNDIS_MEDIUM MediumArray, IN UINT MediumArraySize, IN NDIS_HANDLE NdisProtocolHandle, IN NDIS_HANDLE ProtocolBindingContext, IN PNDIS_STRING AdapterName, IN UINT OpenOptions, IN PSTRING AddressingInformation OPTIONAL, );

STEP 4: Handle callbacks. BindAdapterHandler: performs dynamic binding to an adapter. Call NdisOpenAdapter from here. UnbindAdapterHandler: performs dynamic unbinding to an adapter. Call NdisCloseAdapter from here. OpenAdapterCompleteHandler: called when NDIS completes any asynchronous binding operations started by the call to NdisOpenAdapter. ReceiveHandler: called by the miniport driver when a network packet, or some portion of one, is available for inspection by bound protocol drivers. ReceivePacketHandler: called by miniport drivers that support multi-packet receive indications or for packets with associated out-of-band information. Others…

Lab Exercise: NDIS Basic
Build Test Bonus: Bind to all adapters automatically.

Walkthrough: Basic_ndis_2
Goals Receive entire packets. Send packets.

Receiving the Entire Packet
In ReceiveHandler Allocate nonpaged pool memory Call NdisAllocatePacket Call NdisAllocateBuffer Attach buffer to packet with NdisChainBufferAtFront Call NdisTransferData

Sending Data Acquire a spinlock Call NdisAllocatePacket
Call NdisAllocateMemoryWithTag Call NdisAllocateBuffer Attach buffer to packet with NdisChainBufferAtBack Call NdisSend

Lab Exercise: Basic_ndis_2
Build Test

Lab Exercise: Sniffer Using rk_044 source code – Add code to build a sniffer and write out packets. Bonus: Write to a log.

TDI Transport Driver Interface A TDI Client is a kernel-mode driver that sends TDI formatted IRPs to a TDI Protocol driver. A TDI Protocol driver is a kernel-mode driver that receives TDI formatted IRPs, performs transport specific tasks, and interfaces with an NDIS miniport driver using the NDIS Library.

Walkthrough: basic_tdi1
Goals Connect to a remote server

Writing a TDI Client Obtain an address object from the TDI Transport.
Obtain a connection object from the TDI Transport. Allocate and send an associate address TDI IRP to associate the address object to the connection object. Allocate and send a connect TDI IRP to establish a remote connection or a listen TDI IRP to accept remote connections.

Address and Connection Objects
TDI Transports generally create device objects that TDI Clients can use to obtain information and objects. TCPIP.SYS creates: \Device\Tcp \Device\Udp \Device\Ip \Device\Rawip (XP/2003 only) \Device\Ipmulticast (XP/2003 only)

Address Object Communication with the TDI Transport device object is done using the Extended Attributes structure passed with ZwCreateFile. An address object is obtained by sending TdiTransportAddress (defined in TDI.h).

Connection Object A connection object is obtained by sending TdiConnectionContext (defined in TDI.h).

Associate TDI IRP Call TdiBuildInternalDeviceControlIrp ()
Call TdiBuildAssociateAddress () Call IoSetCompletionRoutine () Call IoCallDriver ()

Connect TDI IRP Call TdiBuildInternalDeviceControlIrp ()
Call TdiBuildConnect () Call IoSetCompletionRoutine () Call IoCallDriver ()

Listen TDI IRP Call TdiBuildInternalDeviceControlIrp ()
Call TdiBuildListen () Call IoSetCompletionRoutine () Call IoCallDriver ()

Lab Exercise: TDI Connect
Build basic_tdi1 Alter basic_tdi1 to connect to a machine on the lab network.

Walkthrough: Basic_tdi2
Goals Send data using TDI

Send TDI IRP Call TdiBuildInternalDeviceControlIrp ()
Allocate memory from non paged pool Allocate an MDL with IoAllocateMdl () Lock the memory using MmProbeAndLockPages () Call TdiBuildSend () Call IoCallDriver ()

Lab Exercise: Send Data
Build basic_tdi2 Alter basic_tdi2 to send data to a machine on the lab network. Try a variety of data types (HTTP?). BONUS: Build receive functionality into basic_tdi2 and make it respond to simple commands!

Module 16 Filter Drivers

I/O Manager Drivers are layered IRPs are delivered in LIFO order
IRPs have stack locations for each layered driver IRPs are delivered in LIFO order Last driver to attach receive IRP first Lowest layer driver passes IRP back up the stack Completion routines are called for each driver in the stack that registered a callback

I/O Manager System Calls
NtDeviceIoControlFile NtWriteFile Etc. Requests are converted to I/O Request Packets (IRPs) IRPs are delivered to lower level drivers

I/O Manager IRP’s must be Fast I/O Created Completed Freed
Does not use IRP’s Can only be used for synchronous requests Bypasses FSD dispatch routines

Filter Drivers Device Stack
Special kind of driver that can be transparently inserted between existing drivers in the device stack. Filters mirror / mimic the lower driver. The surrounding drivers are unaware of the filter driver’s existence Upper Driver Filter Driver Lower Driver Hardware Device

Uses for Filter Drivers
To add functionality to an existing lower level driver. To hide or modify data being sent to an existing upper level driver or user. To stealthily intercept data (here’s where the rootkits fit in)

How do they Work? Filter drivers work by attaching a nameless Device object to a Device Object created by some lower level driver Filter drivers must support the same set of IRP_MJ_XXX requests as the driver it’s attached to The filter should set the same flags as the driver it’s attached to IRPs set to the lower level driver are automatically rerouted to the dispatch routine in the filter driver. The filter dispatch routine can… Complete the IRP on their own Or send the IRP down to the lower level driver with IoCallDriver

Filter Driver Attached to Keyboard Stack

Lab Exercise: Device Stack
Examine the Device Stack for Windows system drivers using the Device Tree tool from the WDK Can you identify any filters? (they are the unnamed devices)

Designing a Filter Driver Keylogger
3 Steps: Installing the Filter Driver Hooking the Keyboard – intercepting keyboard input using a filter driver. Logging the data – scan code conversion and logging to file.

Step 1: Installation In Driver Entry
Fill in all entries in the IRP_MJ_XXX dispatch table for the filter driver. Unhooked IRPs receive a pointer to a “pass down” routine and hooked IRP’s receive a pointer to a special “hook” routine Create an unnamed keyboard filter device by calling IoCreateDevice and passing in NULL for the device name Initialize the Device Extension Set the flags of the filter device equal to the flags of the target device Attach the filter device to the target device using IoAttachDevice

Step 2: Intercepting Keyboard Data
The Operating System sends an empty IRP packet down the device stack for the keyboard. The IRP is intercepted by the ReadDispatcher routine in the filter driver. While in this routine, the IRP is “tagged” with a “completion routine”. When the tagged, empty IRP reaches the bottom of the stack at the hardware / software interface, it waits for a key press.

Step 2: Intercepting Keyboard Data (continued)
When a key on the keyboard is pressed, the IRP is filled with the scan code for the pressed key and sent on its way back up the device stack. On its way back up the device stack, the completion routines that the IRP was tagged with on its way down the stack are called and the IRP is packed passed into them. This gives the filter driver an opportunity to extract the scan code information stored in the packet and add it to a queue for processing by the logger component.

IRP_MJ_READ Dispatch Routine
“Read IRPs” are intercepted by the filter driver’s IRP_MJ_READ When a “read” IRP is received, dispatch read performs the following tasks It sets up the IRP stack for the lower driver by setting the next IRP stack location equal to the current IRP stack location. The stack location pointers are obtained by calling IoGetCurrentIrpStackLocation and IoGetNextIrpStackLocation It sets a completion routine on the current IRP by calling IoSetCompletionRoutine It passes the IRP on to the next driver in the stack with IoCallDriver All other received IRPs are directed to a “pass down” dispatch routine which simply passes them down the stack without touching them

Completion Routine The completion routine is called after hardware fills IRP packet with keyboard scancode Completion routines are called at DISPATCH_LEVEL Can extract the scan code, but you can’t write it to a file at DISPATCH_LEVEL Have to queue keyboard scancodes and set up a separate thread to write them to a file Linked list ExInterlockedInsertHeadList ExInterlockedInsertTailList Can use Semaphore to notify thread that there are items in the queue

Hook Related Design Issues
IRQL Restrictions Queue memory for list must be non paged. Cannot write to a file in completion routine due to elevated IRQL (threading required). List Synchronization ExInterlockedInsertTailList ExInterlockedInsertHeadList Non-Paged Pool Fragmentation Nonpaged pool isn’t the best for small allocations like this Other options (i.e. see lookaside list in DDK)

Step 3: Logging the Data Logging implemented as a separate kernel thread due to IRQL restrictions on file I/O. Extracts queued keyboard scan codes from the list. Performs scan-code to ASCII conversion. Writes converted ASCII to file.

Logging Routine Issues
Tracking Key States – resolving multiple key-codes per scan code. Synchronizing thread termination and tagged (“pending”) IRP packets. If we attempt to unload a driver w/ “pending” IRPs, the system will crash Memory address for completion routine will be invalid when the OS tries to call it after unloading the driver Solution Don’t unload the driver Keep a count of pending IRP’s Drawback: driver won’t actually unload until the user presses another key

Walkthrough: Keylogger lab
Demo and Walkthrough of KLOG based keylogger

Lab Exercise: Keylogger lab
Modify the keylogger so it can talk to user mode. Have the user-mode application send the path to write the log file to the driver Bonus: What needs to be modified on the keylogger to evade signature detection? Install Avast anti-virus software and defeat signature detection

Module 17 Process Injection

Creating a Process from a Driver (1)
User land process is comprised of two parts Kernel Portion EPROCESS and KPROCESS ETHREAD and KTHREAD Token Handle Table Page Tables Etc.

User land process is comprised of two parts User land Portion Process Environment Block (PEB) Thread Environment Block (TEB) Windows subsystem (CSRSS.EXE) Etc.

Must find suitable parent to spawn child Has a user land portion Has kernel32.dll and ntdll.dll loaded in its address space Has an alterable thread (unless hijacking an existing thread) Allocate memory in parent process Write the equivalent of “shellcode” that calls CreateProcess Cause a thread in the parent to execute newly allocated code Hijack an existing thread Create an APC

Asynchronous Procedure Calls (APCs)
APCs are queued to execute in context of particular user thread APC routine can acquire resources, wait on object handles, incur page faults, and call system services Described by APC object Objects in thread-specific APC queue APC executed when thread is scheduled “kernel“-mode APCs don‘t need permission of target thread to run in its address space “user“-mode APCs do

Allocate memory in parent process Change virtual memory context to that of the parent KeAttachProcess/KeStackAttachProcess ZwAllocateVirtualMemory (HANDLE) -1 means current process MEM_COMMIT PAGE_EXECUTE_READWRITE

“Shellcode” that calls CreateProcess Copy function parameters into address space Pass the address of function parameters to calls Can us the FS register FS contains the address of the TEB TEB has a pointer to the PEB PEB has a pointer to the PEB_LDR_DATA PEB_LDR_DATA contains all the loaded DLLs

“Shellcode” continued As an alternative to using the FS register Find the address of ntdll.dll from the driver Parse its exports section Does not work with all DLLs Only address of ntdll.dll returned by ZwQuerySystemInformation

Cause a thread in the parent to execute newly allocated code - Hijack an existing thread Locate a thread within the parent process Change its Context record Change Context record back when done Problems: Low priority threads Blocked threads Changing Context back

Hijack and Context records lkd> dt nt!_CONTEXT +0x000 ContextFlags : Uint4B +0x004 Dr : Uint4B +0x008 Dr : Uint4B +0x00c Dr : Uint4B +0x010 Dr : Uint4B +0x014 Dr : Uint4B +0x018 Dr : Uint4B +0x01c FloatSave : _FLOATING_SAVE_AREA +0x08c SegGs : Uint4B +0x090 SegFs : Uint4B +0x094 SegEs : Uint4B +0x098 SegDs : Uint4B +0x09c Edi : Uint4B +0x0a0 Esi : Uint4B +0x0a4 Ebx : Uint4B +0x0a8 Edx : Uint4B +0x0ac Ecx : Uint4B +0x0b0 Eax : Uint4B +0x0b4 Ebp : Uint4B +0x0b8 Eip : Uint4B +0x0bc SegCs : Uint4B +0x0c0 EFlags : Uint4B +0x0c4 Esp : Uint4B +0x0c8 SegSs : Uint4B +0x0cc ExtendedRegisters : [512] UChar

Cause a thread in the parent to execute newly allocated code - Create an APC Threads can be notified to run an Asynchronous Procedure Call (APC) APC has a pointer to code to execute To be notified, thread should be Alertable

Alertable Threads and APCs

Finding an Alertable Thread PETHREAD FindAlertableThread(PEPROCESS eproc) { PETHREAD start, walk; if (eproc == NULL) return NULL; start = *(PETHREAD *)((DWORD)eproc + THREADOFFSET); start = (PETHREAD)((DWORD)start - THREADFLINK); walk = start; do DbgPrint("Looking at thread 0x%x\n",walk); if (*(PUCHAR)((DWORD)walk + ALERTOFFSET) == 0x01) return walk; walk = *(PETHREAD *)((DWORD)walk + THREADFLINK); walk = (PETHREAD)((DWORD)walk - THREADFLINK); }while (walk != start); }

Walkthrough: CreateProcessfromDriver
Walk through code and use of APC

Lab Exercise: CreateProcessFromDriver
Change the process that is created or the process which spawns the child Compile code

Module 18 Handle exploitation

Kernel Structures: Handle Tables
Handles are an index into the Handle Table for a particular object Objects represent processes, threads, tokens, events, ports, etc. The kernel/object manager must do the translation from a handle to an object Single point of access ensures security checks can be performed

Two types of handle tables Those associated with a process Those associated with the kernel In 2000, this is an independent handle table In XP and 2003, these handles are associated with the System process

Differences between handle tables Windows 2000 – three layered table approach Top level Middle level Subhandle table level 255 usable entries Windows XP and 2003 Only the subhandle table is allocated to begin As many entries that will fit in a page minus one

First DWORD Windows 2000 First bit is a lot bit in Windows 2000 Last three bits are flags Audit on Close Inheritable Protect from Close Windows XP and 2003 Lock bit is least significant bit Bits in the middle are a pointer to the object header

Second DWORD Access Mask If we are in the kernel and parse the handle table directly, we do not have to worry about the access mask

Handle Table entries are 8 bytes each lkd> dt nt!_HANDLE_TABLE +0x000 TableCode : Uint4B +0x004 QuotaProcess : Ptr32 _EPROCESS +0x008 UniqueProcessId : Ptr32 Void +0x00c HandleTableLock : [4] _EX_PUSH_LOCK +0x01c HandleTableList : _LIST_ENTRY +0x024 HandleContentionEvent : _EX_PUSH_LOCK +0x028 DebugInfo : Ptr32 _HANDLE_TRACE_DEBUG_INFO +0x02c ExtraInfoPages : Int4B +0x030 FirstFree : Uint4B +0x034 LastFree : Uint4B +0x038 NextHandleNeedingPool: Uint4B +0x03c HandleCount : Int4B +0x040 Flags : Uint4B +0x040 StrictFIFO : Pos 0, 1 Bit

Two types of handle tables Those associated with a process Those associated with the kernel In 2000, this is an independent handle table In XP and 2003, these handles are associated with the System process

Differences between handle tables Windows 2000 – three layered table approach Top level Middle level Subhandle table level 255 usable entries Windows XP and 2003 Only the subhandle table is allocated to begin As many entries that will fit in a page minus one

First DWORD Windows 2000 First bit is a lot bit in Windows 2000 Last three bits are flags Audit on Close Inheritable Protect from Close Windows XP and 2003 Lock bit is least significant bit Bits in the middle are a pointer to the object header

Second DWORD Access Mask If we are in the kernel and parse the handle table directly, we do not have to worry about the access mask

Handle Table entries are 8 bytes each lkd> dt nt!_HANDLE_TABLE +0x000 TableCode : Uint4B +0x004 QuotaProcess : Ptr32 _EPROCESS +0x008 UniqueProcessId : Ptr32 Void +0x00c HandleTableLock : [4] _EX_PUSH_LOCK +0x01c HandleTableList : _LIST_ENTRY +0x024 HandleContentionEvent : _EX_PUSH_LOCK +0x028 DebugInfo : Ptr32 _HANDLE_TRACE_DEBUG_INFO +0x02c ExtraInfoPages : Int4B +0x030 FirstFree : Uint4B +0x034 LastFree : Uint4B +0x038 NextHandleNeedingPool: Uint4B +0x03c HandleCount : Int4B +0x040 Flags : Uint4B +0x040 StrictFIFO : Pos 0, 1 Bit

Lab Exercise: Handle_List
Open DbgView Run Handle_List to get a list of the handles in the System process

Subvert logging Sliver project uses process injection
Like CreateProcessFrom Driver project Gets the address of functions in remote process GetProcAddress

Sliver – Spawn.c CreateFileInProcess() Finds specified EPROCESS
Allocates memory in remote process Writes Shellcode to call CreateFile() Schedules an APC

Sliver - Syncro.c Used to parse shared resources
Raises every CPU to DISPATCH_LEVEL

Lab Exercise: Subvert Logging
Choose an application that continually logs and subvert it. Use Keylogger from filter driver lab

Module 19 Memory cloaking

Paging Separates virtual and physical address spaces.
Virtual and physical memory address spaces are divided into fixed size blocks. Virtual blocks known as “pages”. Physical blocks known as “frames”. Virtual blocks are mapped to physical blocks/ The OS handles virtual  physical block mappings. Virtual address space may be larger than physical address space. Virtually contiguous memory blocks do not have to be physically contiguous. Pages do not have to be present in main memory (can be temporarily stored on the hard disk when memory needs are high).

Virtual to Physical Memory Mapping
Physical Address Space Virtual Address Space Frame 1 Page 1 Frame 2 Page 2 Page 3 Frame n Page 24 Paging - virtual and physical memory address spaces are divided into same size blocks. Virtual address space may be > than the physical address space Virtually contiguous blocks do not have to be physically contiguous! Page n

Paging Overview General Concepts
Page Tables Page Directories Page Sizes Intel Architecture - 32 bit, PAE, 64 bit Virtual Address Structure PTE Formats Virtual To Physical Address Translation The Role Of The Translation Lookaside Buffer

Page Tables Virtual to physical mapping information is kept in page tables in structures called PTE’s. Page Table Physical Memory Page 5 Page 1 PTE 1 PTE 2 PTE 3 PTE 4 . PTE N Page 25 Page 33 Page 7 Page 4 Page 56 Page 11

Page Directories Page directories contain pointers to page tables
Enables page tables to be paged out (smaller physical memory overhead to maintain MM structures since only page directory needs to be physically present). Page Table 1 Page Directory Page 1 25 4 33 56 Physical Memory PTE 1 PTE 2 PTE 3 PTE 4 . PTE N PDE 1 PDE 2 PDE 3 PDE 4 . PDE N Page Table 2 PTE 1 PTE 2 PTE 3 PTE 4 . PTE N

Page Sizes Small pages (4 kilobytes) Large pages (4 megabytes) Pros
Less internal fragmentation Cons Shorter TLB reach Large pages (4 megabytes) Greater TLB reach Greater potential for internal fragmentation

(32-bit) x86 PDE Format Virtual to physical mapping information is kept in page tables in structures called PTE’s & PDE’s. Available Global Page size (large page if set) Reserved Accessed Cache disabled Write through User / Supervisor Read / Write Present Page frame # Avail Gl Ps A Cd Wt U/S R/W P 31 12 11 9 8 7 6 5 4 3 2 1

(32-bit) x86 PTE Format Virtual to physical mapping information is kept in page tables in structures called PTE’s and PDE’s. Available Global Reserved Dirty Accessed Cache disabled Write through User / Supervisor Read / Write Present Page frame # Avail Gl L A Cd Wt U/S R/W P 1 2 3 4 5 6 7 8 9 11 12 31

(32-bit) x86 PDE Format Virtual to physical mapping information is kept in page tables in structures called PTE’s and PDE’s. Reserved Available Global Page Size (4MB page if set) Dirty Accessed Cache disabled Write through User / Supervisor Read / Write Present Page frame # Res Avail Gl L A Cd Wt U/S R/W P 31 22 21 12 11 9 8 7 6 5 4 3 2 1

(32-bit) x86 Virtual Address
Virtual addresses form indexes into page tables. Page tables may be single or multi-level. X86 uses a 2 level page table structure w/ support for 4K and 4MB sized pages. Page directory index Page table index Byte index Virtual Page Number 10 bits 12 bits 31 0 (LSB)

(32-bit) x86 Virtual to Physical
KPROCESS Virtual Address Physical Memory Page Directory Index Page Table Index Byte Index CR3 Physical address Page Directory Page Table Page Frame PTN PFN (1 per process) (up to 512 per process)

The Paging Performance Problem
Virtual memory incurs a steep performance hit! 2 level page table scheme like x86: Best Case: 3 memory accesses per reference! (page dir + page table + offset) Worst Case: 3 memory accesses + 2 disk I/O requests per memory reference! (page dir + I/O + page table + I/O + offset) Solution: Translation Lookaside Buffer (TLB) The TLB is a high speed hardware cache of frequently used virtual to physical mappings (PTE’s).

Translation Lookaside Buffer (TLB)
On memory access, TLB is searched first for the virtual to physical translation! High speed associative memory “Hit”  translation was found in the TLB “Miss”  translation was not found in the TLB x86 Uses Split TLB architecture ITLB: holds virtual to physical translations for code DTLB: holds virtual to physical translations for data Modern TLB’s have extremely high “hit” rates and seldom incur the performance hit of a page table walk.

TLB Diagram TLB Virtual Address . tag data Virtual Page Number = 17
Page Frame 100 Virtual Page 64 Invalid Virtual Page 17 Page Frame 84 Associative Memory (Simultaneous read and compare) Virtual Page 6 Invalid Virtual Page 7 Page Frame 100 Virtual Page 65 Page Frame 844

Memory Access Path with TLB (Hit)
ITLB Physical Memory Is it a code access? VPN = 12, Frame = 132 Page Table / Page Dir Frame 132 Memory Access (Code, VPN=12) 27 (Invalid) 27 (Invalid) FAULT DTLB Is it a data access? VPN = 12, Frame = 132 Frame 0 Frame 122 Frame 27 Frame 13 Frame 3 Disk (Page File)

Memory Access Path (TLB Miss with Page Fault)
ITLB Physical Memory Page Table / Page Dir Is it a code access? MISS Frame 132 VPN = 25, Frame = 13 FAULT VPN = 25, Frame = 13 (NOT PRESENT) VPN = 25, Frame = 13 (PRESENT) Memory Access (Code, VPN=25) Page Fault Handler DTLB Is it a data access? Frame 0 Frame 122 Frame 27 Frame 13 Frame 3 Disk (Page File)

TLB Management The TLBs must be flushed whenever a page table or directory entry is changed. TLBs can only be flushed at ring 0. Reloading cr3 flushes all non global TLB entries Explicitly by mov cr3, xxxxxxxx Implicitly by executing a task switch The invlpg instruction flushes a single TLB entry. NOTE: The page global enable flag (PGE) in cr4 and the global flag in PTE entries can be used to prevent a page from being flushed on a task switch or reload of cr3. A global PTE, however, is not immune to flushing by invlpg.

32-bit Segmentation and Paging Summary

Allocate a Fake Page REAL Page Directory PTN PFN PFN FAKE Real Page
(1 per process) Fake Page

Cloak Page REAL Page Directory PTN PFN PFN FAKE Real Page Fake Page

Uncloak Page REAL Page Directory PTN PFN PFN FAKE Real Page Fake Page

Shadow Walker Jamie Butler and Sherri Sparks

Motivation The most advanced public kernel rootkits are sitting ducks for primitive signature scans and integrity checking techniques. Large parts of rootkit drivers themselves sit in non paged memory leaving them vulnerable to simple signature scans of system memory. Rootkit modifications to operating system components in memory give them away to memory integrity checkers heuristic checkers like VICE. Need a method to hide the rootkit driver code and its modifications to kernel memory.

Early Viruses Faced a Similar Problem
Viruses sought to hide their code from file system signature scanners. Their solution: Polymorphism / Metamorphism Attempts to vary the appearance of the viral code from one variant to another. Functionally equivalent, but semantically different copies of the code. Few rootkits have integrated viral polymorphic techniques.

Introducing Shadow Walker
An alternative to viral polymorphism – Virtual Memory Subversion! Proof of concept demonstration that a rootkit is capable of transparently controlling the contents of memory viewed by other applications and kernel drivers. Minimal performance impact !

Implications of Virtual Memory Subversion
In-memory security scanners rely upon the integrity of their view of memory even if they don’t rely upon Operating System API’s (which may potentially be hooked). If we can control a scanner’s memory reads we can fool signature scanners and potentially make a known rootkit, virus, or worm’s code immune to in-memory signature scans! We can also fool integrity checkers and other heuristic scanners which rely upon their ability to detect modifications to code (i.e. VICE).

Hiding Executable Code
We take an offensive spin on the defensive PaX technology. We want to hide code, therefore we also want to differentiate between read / write and execute accesses to the hidden code. Read accesses of the code section of a rootkit driver may indicate presence of a scanner. Nearly the inverse of PaX: Software implementation of Execute / Diverted Read-Write semantics.

Implementation Issues
We need a way to filter execute and read / write accesses. We need a way to “fake” the read / write memory accesses when we detect them. We need to ensure that performance is not adversely affected.

Differentiating Between Execute and Read / Write
We can trap memory accesses by marking their PTE’s “non present” and hooking the page fault handler. In the page fault handler, we have access to the saved instruction pointer and the faulting address. If instruction pointer == faulting address, then it is an execute access! Otherwise, it is a read/write. We also need to differentiate between page faults due to the memory hook and normal page faults. Pages must be nonpaged memory. Pages must be locked down in memory.

Faking Read / Writes by Exploiting the split TLB (1)
Normal Synchronized ITLB and DTLB translate code and data memory accesses to the same physical frame. ITLB Is it a code access? VPN = 12, Frame = 2 Frame 2 Page Table / Page Dir Memory Access (VPN=12) Frame 8 27 (Invalid) FAULT 27 (Invalid) DTLB Is it a data access? Frame 52 VPN = 12, Frame = 2

Faking Read / Writes by Exploiting the split TLB (2)
Desynchronized ITLB and DTLB translate code and data memory accesses to different physical frames. rootkit code ITLB Is it a code access? VPN = 12, Frame = 2 Frame 2 Page Table / Page Dir Memory Access (VPN=12) Frame 8 FAULT 27 (Invalid) 27 (Invalid) DTLB Is it a data access? Frame 52 VPN = 12, Frame = 52 random garbage

Software TLB Control Reloading cr3 causes all TLB entries except global entries to be flushed. This typically occurs on a context switch. The invlpg causes a specific TLB entry to be flushed. Executing a data access instruction causes the DTLB to be loaded with the mapping for the data page that was accessed. Executing a call causes the ITLB to be loaded with the mapping for the page containing the code executed in response to the call.

Shadow Walker Components
Memory Hook Engine Hook Installation Module Custom Page Fault Handler Modified FU Rootkit

Memory Hook Installation
Install new PF handler (Int 0E). Insert page into global hash table of hooked pages for quick lookup in PF handler. Mark page not present. Flush the TLB to ensure that we trap all subsequent memory accesses in the PF handler.

Custom Page Fault Handler
Primary task is to filter read / write and execute accesses for hooked pages. Passes down faults on unhooked pages to the OS page fault handler. Manually loads ITLB on execute access to hooked page. Manually loads DTLB on data access to hooked page. Most memory references will be resolved via the TLB path and will not generate page faults. Page faults on hooked pages will occur: On the first execute and data accesses to the page. On TLB cache line evictions of a hooked mapping. On explicit TLB flush (i.e. context switch).

PF Handler Pseudocode Pseudocode for enforcing execute diverted read / write semantics on kernel pages. Page Fault Handler: if( ProcessorMode == USER_MODE ) jmp PassDownToOs if( FaultingAddress == USER_PAGE ) //faulting address is from a hidden page if( FaultingAddress == HIDDEN_PAGE) { if(FaultingAddress == EIP) jmp LoadItlb //execute access else jmp LoadDtlb } else jmp PassDownToOs Load Itlb: ReplaceFrame(PTE.FaultingAddress) PTE.FaultingAddress == PRESENT CallIntoHiddenPage //load ITLB PTE.FaultingAddress == NOT PRESENT ReplaceFrame(old_PTE.FaultingAddress) jmp ReturnWithoutPassdownToOs Load Dtlb ReadFaultingAddress //load DTLB

What about Data Hiding? Less optimal
DTLB must be kept empty of hidden page mapping. One page fault per data access to hidden page. For memory access to go through, data accesses must be filtered in PF handler and the DTLB must be loaded w/ the correct mapping. Memory hook must maintain control (i.e. DTLB must be flushed after access). Protocol based approach between driver and PF handler. Debugging approach (i.e. single step).

Modified FU Rootkit Proof of concept rootkit hidden by our memory hook engine. Runs as a system thread and regularly scans the active process list looking for processes named _fu_ and unlinks them. No dependence on userland initialization. No symbolic link No functional device In memory rootkit Could be installed from a kernel exploit to avoid disk detection.

Impact on System Performance
Modern TLB’s have extremely high “hit” rates. Therefore, most translations on our “hidden pages” will go through the TLB path rather than the slower page fault path. Using the memory hook engine to hide a rootkit driver’s code has no subjectively noticeable impact on overall system performance!

Known Limitations No PAE Support
No Hyperthreading / Multiprocessor Support Currently hides only 4K sized kernel pages (i.e. system space / upper 2 GB of the virtual address space). Cannot Hide 4MB Ntoskrnl.exe page.

Lab Exercise: Shadow Walker
Walk through code

Additional DISCUSSION Topics
Module 20 Additional DISCUSSION Topics

PatchGuard 64-bit Windows OS (XP, Vista, and Windows 7)
Prevents hooking Keeps tab on IDT GDT SSDT SYSTENTER (MSR) Core modules (ndis.sys and ntoskrnl.exe) It’s hidden on where it’s run

Master Boot Record (MBR) Rootkit
BIOS loads the MBR -> loads Volume Boot Record(VBR) -> Loads OS Rootkit replaces MBR so it can patch OS as it loading Vbootkit and Mebroot

Alureon Rootkit (TDL) 64-bit version deactivates driver signing on 64-bit Vista and Windows 7 Uses MBR to load SSDT hooks

Offensive Rootkit Development

Similar presentations

Presentation on theme: "Offensive Rootkit Development"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Offensive Rootkit Development

Similar presentations

Presentation on theme: "Offensive Rootkit Development"— Presentation transcript:

Similar presentations

About project

Feedback