Files
knowledge-kit/Chapter1 - iOS/1.88.md
√(noham)² b334988376 Update 1.88.md
2026-02-22 22:15:50 +01:00

7.8 KiB

Principle of fishhook

Hook classifications

  • Method Swizzle: Uses the Objective-C runtime to dynamically change the mapping between SEL (method selector) and IMP (method implementation), altering the Objective-C method call flow. Mainly used for Objective-C methods.
  • fishhook: A tool provided by Facebook to dynamically modify the linking of Mach-O files. It uses Mach-O loading principles to modify pointers in the lazy and non-lazy symbol pointer tables to hook C functions.
  • Cydia Substrate: Formerly Mobile Substrate, it hooks Objective-C methods, C functions, and function addresses. It's not limited to iOS; Android can also use it. Official site: http://www.cydiasubstrate.com

There are only two types of hooks:

  • inline hook: directly modifies the function entry code or some code inside the function to jump to your own code
  • address replacement: includes replacing entry table addresses, exit table addresses, addresses inside structs, etc. This is the simplest but not always effective—calls that don't go through the address table won't be hooked.

Application

You often meet hooking Objective-C methods, but for functions like NSLog or objc_msgSend the Objective-C runtime isn't enough. With fishhook, hooking C functions is no longer difficult.

Why put "C functions" in quotes? Keep reading.

Hooking system C functions

Take NSLog as an example.

You can see the hook succeeded.

struct rebinding {
  const char *name;     // name of the function to hook, C string
  void *replacement;    // new function address
  void **replaced;      // pointer to store the original function address
};

Hooking user-defined C functions

Create a user C function handleTouchAction, but the hook fails.

This raises curiosity: why can system C functions be hooked but not user-defined C functions? Continue exploring.

Peek into the principle

FishHook is a Facebook tool that can dynamically modify Mach-O linking. It leverages Mach-O loading mechanics and modifies lazy and non-lazy symbol pointer tables to hook system C functions.

Mach-O file permissions

Mach-O is divided into code segment, data segment, etc.:

  • Code segment: readable, executable, not writable
  • Data segment: readable, writable, not executable

System shared cache

We know NSLog's implementation is in the Foundation library, while user-defined functions reside in the app's own executable Mach-O.

iOS shared cache: since iOS 3.1, Apple packaged system libraries into a large cache file located at /System/Library/Caches/com.apple.dyld/ to reduce redundancy and optimize memory usage.

  • Accessible by all processes
  • Architecture-specific cache files (e.g., dyld_shared_cache_arm64 for ARM64)
  • Optimizes dynamic library loading by avoiding repeated loading of the same libraries across apps, speeding startup and improving performance

When an app's Mach-O is loaded by dyld, the address of NSLog is not yet fixed because its real implementation lives in the Foundation framework inside the shared cache.

That raises a question: when compiled, clang cannot know the actual runtime addresses of Foundation functions (on any device or architecture). How is this solved?

Static linking vs dynamic linking

Linking can be static or dynamic. Early computers used static linking. Static linking has drawbacks:

  • Large memory and disk waste because each program contains copies of common library functions like printf, scanf, etc.
  • Development and distribution inconvenience: if a third-party lib.o updates, the app must be relinked and redistributed.

Dynamic linking splits modules into separate files and resolves links at runtime. This solves space and update problems but requires OS-level support and a dynamic linker. Dynamic linking introduces runtime overhead, but lazy binding reduces it: symbols are resolved only when first used.

To avoid repeated symbol resolution at load time, PIC (Position Independent Code) was introduced.

PIC technology

Relocation at load time can break shared instruction sharing among processes. PIC separates parts of code that require modification and places them with data so the instruction section can remain unchanged and be shared. This allows code to be loaded at different addresses without rewriting instructions.

Benefits:

  • Shared code: multiple apps can share a single dynamic library instance
  • dyld can optimize symbol binding and improve startup speed
  • Supports code relocation for flexible updates and patches

When your code contains a call to NSLog, at compile time the IDE shows the declaration, but the compiled executable doesn't know NSLog's runtime address. How does this work?

With PIC, workflow:

  • At compile time, the Mach-O's data segment contains a region called the symbol table (writable). All references to symbols from shared cache libraries are set up to point to symbol entries. For example, a reference to NSLog results in a NSLog symbol in the Mach-O; the app's call to NSLog references that symbol.
  • When dyld loads the Mach-O, it performs symbol binding. Dyld reads load commands in the header, finds required libraries, and binds symbols. For NSLog, dyld writes the real Foundation NSLog address into the Mach-O's _DATA symbol entry for NSLog.

Practical exploration

Experiment to verify the full process.

Step 1: You can see NSLog in the Lazy Symbol Pointers as the first entry. "lazy" means it's bound only when used. Set breakpoints to verify.

Step 2: At the NSLog breakpoint, in LLDB run image list to view images. The first image is the app's main executable; its image base is 0x0000000100da5000.

Step 3: Use image base + offset to compute the NSLog address: memory read 0x0000000102eec000+0xC000 to inspect memory.

Step 4: Set the breakpoint to proceed so NSLog runs once; then disassemble the address (dis -s addr) to view assembly.

Step 5: Continue execution past the breakpoint, call rebind_symbols, then inspect memory again. After rebind, the address changed; disassembly now shows your custom function.

Detailed mapping steps:

Step 1: In Lazy Symbol Pointers you see the first symbol NSLog at index 1.

Step 2: In the Dynamic Symbol Table, the first entry relates to NSLog. Its Data value 00000084 (hex) equals 132 (decimal).

Step 3: Use that index to find the 132nd entry in the Symbol Table. Its Data value 000000AA is an offset.

Step 4: In the String Table, the first position 0000CFE4 plus offset 0xAA equals 0xD08E, which is the symbol name location corresponding to NSLog.

Functions like NSLog, dispatch_once, etc., use stubs that point to Lazy Symbol Pointers, which in turn point to a stub_helper and ultimately to dyld_stub_binder. The real address is resolved on the first call.

fishhook leverages this behavior by replacing entries in the Lazy Symbol Pointers with addresses of custom functions to achieve hooks. This is why fishhook cannot hook C functions defined inside the same binary.

What fishhook actually does: it modifies the system symbol table entries so specific symbols' addresses are replaced with custom function addresses—i.e., it hooks external C functions that are dynamically bound.

Therefore fishhook cannot hook user-defined C functions inside the same Mach-O, because those functions are not called via the symbol binding process that fishhook manipulates. User functions are directly linked within the binary rather than resolved through the dynamic symbol pointers.