Skip to content

Commit 001227b

Browse files
committed
Initial release
1 parent c3af917 commit 001227b

27 files changed

+2988
-0
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
.vscode/
2+
*$py.class

README.md

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
<p align="center">
2+
<img src="images/logo.png" alt="stack string explorer logo" height=200/>
3+
</p>
4+
5+
# Stack String Explorer
6+
Stack String Explorer is a ghidra plugin to find and report stack strings (and other constant strings)
7+
8+
- Adds identified strings in the defined strings window for easy search and filtering
9+
- Inserts comments above stack strings in listing & decompilation views
10+
- Reconstructs strings formed by multiple instructions
11+
- Analysis scope options (single function, selection, full binary)
12+
- Headless analyser support
13+
- Find repeated use of similar strings across the code base
14+
15+
![Stack Strings Demo](images/StackStringsToolInUse.gif)
16+
17+
Through optimizations or deliberate obfuscation, sometimes strings are stored as immediates in instruction operands rather than helpful heap locations. Tools like ghidra display these as hex constants and make no effort to reconstruct or decode them. Stack String Explorer searches a binary for any such constant strings and displays them such that they can be grouped and filtered.
18+
19+
## ⬇️ Installation
20+
- Install Ghidra https://github.com/NationalSecurityAgency/ghidra/
21+
- Clone this repo
22+
- Add the `ghidra_scripts` directory as an addition Ghdira scripts directory. To do this:
23+
* Open `Window -> Script Manager` <img src=images/ScriptManagerIcon.png alt="Script Manger Icon" height=25> and click `Manage Script Directories` <img src=images/ManageScriptDirectories.png alt="Manage Script Directories Icon" height=25>
24+
* Click the plus button and add the path to this repo's `ghidra_scripts` directory.
25+
26+
## ▶️ Usage
27+
#### GUI:
28+
- Open a binary in ghidra and navigate to `Window -> Script Manager` <img src=images/ScriptManagerIcon.png alt="Script Manager Icon" height=25>
29+
- Search for `StackStringExplorer.py` and double click the name to run
30+
- Refresh the Defined Strings window <img src="images/refresh.png" alt="Refresh Icon" height=25> or check the console to see new strings
31+
32+
<img src=images/ScriptManager.png alt="Script Manger">
33+
34+
#### Headless:
35+
- Set up `StackStringExplorer.properties` (see below)
36+
- Run the command: `$ analyzeHeadless <PROJECT_PATH> <PROJECT_NAME> -import <TARGET_FILENAME> -postScript StackStringExplorer.py`
37+
38+
## ⚙️ Settings
39+
#### GUI:
40+
Click through the popups on program start. A default configuration is provided. Below is a full explanation of each setting should you need to diverge from the defaults.
41+
42+
#### Headless:
43+
Download the `StackStringExplorer.properties` file and place in the same location as `StackStringExplorer.py`. Write your configuration choice after the `=` on each line. Lines that start with `#` or `!` are ignored. Domain selection is not available in headless mode, it will always run on all functions.
44+
45+
---
46+
### 👟 Domain
47+
Which part of the program to analyze
48+
49+
| Option | Description |
50+
|-------------------|-------------------------------------------------------------------------------|
51+
| Current Selection | Run on all discovered instructions in the selected region |
52+
| Current Function | Run on all discovered instructions in the function the cursor is currently in |
53+
| All Functions | Run on all discovered functions |
54+
55+
---
56+
### 🔧 General Settings
57+
| Settings | Description |
58+
|----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
59+
| Minimum String Length | The shortest length of string to identify. Use to reduce false positives. |
60+
| Lookahead | The number of instructions to search after an instruction containing a string for more components of the same string. If the lookahead is too small, strings may be reported as multiple sub strings. If the lookahead is too large, unrelated string components may be concatenated together. |
61+
| Minimum Length of Interest | The minimum number of characters moved in a single instruction for it to be considered of interest e.g. if a string is moved in 3 byte blocks it will be discarded with a length of interest greater than 3. |
62+
| Reverse | Whether to reverse the order in which string components are concatenated. |
63+
64+
---
65+
### 📊 Analysis
66+
Which analysis techniques to run
67+
68+
| Option | Description |
69+
|-------------------|-----------------------------------------------------------------------------------------------------------------------------|
70+
| Simple Scrape | Grabs all scalar arguments to instructions in the order they are used, concatenating nearby constants. |
71+
| Simulate Regional | Simulates regions of instructions around instructions of interest independently, then extracts strings from modified memory. |
72+
| Simulate All | Simulates all the instructions in the domain then extracts strings from modified memory. |
73+
74+
75+
---
76+
### 📮 Address Filtering
77+
How strictly to filter out strings which might be addresses
78+
79+
| Option | Description |
80+
|--------|----------------------------------------------------------------------------------------------------------------------------------------------|
81+
| None | Don't filter out any strings based on how much they resemble an address. |
82+
| Some | Filter out constants that are within 255 bytes of the address of the originating instruction. This generally removes return addresses. |
83+
| All | Filter out any constants marked as addresses by their OperandType. If an operand has been incorrectly flagged, this may remove valid strings. |
84+
85+
---
86+
### 📝 Output
87+
How to display the results
88+
89+
| Option | Description |
90+
|------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
91+
| Print to console | Stack strings will be printed to the console with their length and the location of the originating instruction. |
92+
| Add pre-comment | Adds a comment above each originating instruction with the extracted string. If the string spans multiple instructions only the first will be commented. |
93+
| Add to defined strings | Adds strings to the defined strings window. This is implemented by adding strings to a memory overlay block and adding cross reference to the originating instruction. **This requires an exclusive checkout for shared projects** |
94+
95+
The Defined Strings window must be refreshed <img src="images/refresh.png" alt="Refresh Icon" height=25> for the stack strings to appear.
96+
97+
Strings that have already been added to defined strings by a previous run of this program will not be reported again in the console and will not be added to defined strings twice. The comment will be replaced regardless of previous script executions.
98+
99+
## 📐 Support and Limitations
100+
Stack String Explorer operates on Pcode and can support any architecture Ghidra supports. It relies on Ghidra's identification of functions and instructions.
101+
102+
103+
#### Simple Scrape
104+
Finds any constants used in any form so will find every string regardless of how it is built.
105+
If the string is accessed out of order, it may reorder parts of a string or concatenate unrelated strings.
106+
107+
#### Simulate Regional
108+
Simulates where each string is stored in memory so it is able to order string components correctly and separate strings stored in different locations. It supports strings formed using move or store-like instructions, as well as inferring strings from compare-like instructions.
109+
110+
Simulate Regional is the best option for the majority of applications, however it may miss some context before an instruction of interest and fail to notice strings components are related
111+
112+
#### Simulate All
113+
Operates in the same manner as Simulate Regional and is able to identify the correct component order.
114+
When run on large section, e.g. All Functions or Current Function, it can encounter ambiguity if multiple strings are stored in the same location during the simulated range. This leads to components being reported individually and concatenation of unrelated components.
115+
116+
Simulate All is best used in specific areas to manually increase the range of the simulation where Simulate Regional has missed some context, particularly if there is relevant context in the instructions preceding the instruction containing the scalar string.
117+
118+
### General Limitations
119+
- If the same string is found in multiple places it will be added only once to the defined strings window with multiple cross references to the different locations. However, if the same string is found in a new location on a separate run of the program, it will be added separately and not included in the same list of cross references.
120+
121+
## 🧶 Examples
122+
123+
Identifies strings moved over multiple instructions
124+
![Moved over multiple instructions example](images/MovedOverMultipleInstructionsExample.png)
125+
126+
Identifies strings moved to locations other than the stack
127+
![Moved to non-stack location example](images/NonStackExample.png)
128+
129+
## 📁 File structure
130+
- `ghidra_scripts`
131+
- `StackStringExplorer.py` - contains the main loader for this script. It gets the parameters, sets up analysis techniques, and runs the analysis. It relies heavily on the scripts in `stackstring_helper_scripts`
132+
133+
- `stackstring_helper_scripts`
134+
- `technique.py` - contains an abstract class describing a generic analysis technique
135+
- `regional_technique` - contains an abstract class describing an analysis technique that runs only - instructions around ones likely to contain a string
136+
- `simulator.py`, `scraper.py`, `region_simulator` - contain classes for each of the analysis - techniques
137+
- `abstract_address.py`, `stack_string.py` - contain classes used by the simulator to represent - values stored in memory
138+
- `parameters.py` - contains a class for describing the input configuration of the program
139+
- `general_utilities`, `specific_utilities`, `io_utilities` - contain functions used across other files
140+
- `stack_strings_enums` - contains enums used across other files
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Apply configuration for Headless mode
2+
3+
# Simple Scrape - grabs all constants in the order they are used
4+
# Simulate Regional - simulates a region of code around an instruction of interest
5+
# then extracts strings from modified memory
6+
# Simulate All - simulates all the instructions then extracts strings from modified memory
7+
#
8+
# Simple scrape is brute force and reliable, but fails if strings are not created
9+
# in order. It can also concatenate unrelated strings.
10+
#
11+
# Simulate regional is the most nuanced and accurate technique, but only simulates forward from a
12+
# seen string so may miss some context.
13+
#
14+
# Simulate all is brute force and unreliable. It often misses strings if they are created
15+
# using the same registers and has too much ambiguity to stitch together the correct string.
16+
# This mode should be used on specific selections when a look-behind is desired and simulate
17+
# regional is insufficient.
18+
# In the majority of cases, Simple Scrape or Simulate Regional are superior to Simulate All.
19+
20+
# Select True/False for whether to run each analysis technique
21+
Analysis Enable Simple Scrape = False
22+
Analysis Enable Simulate Regions = True
23+
Analysis Enable Simulate All = False
24+
25+
# Select the minimum length of string to report
26+
Config Minimum String Length (discard shorter strings) = 3
27+
# Select the number of instructions to look ahead from an instruction of interest for
28+
# further instructions building the same string.
29+
# The larger the lookahead the more likely unrelated strings are to interfere with each other
30+
# The smaller the lookahead the more likely related string components are to be reported seperately
31+
Config Lookahead (no. instructions between string components) = 3
32+
# Select the minimum number of characters accessed by a single instruction to make it
33+
# an instruction of interest
34+
Config Minimum length of interest (discard strings moved in smaller blocks) = 2
35+
# Select True/False for whether the string components should be built into one long string
36+
# in the opposite order to how they are stored on memory
37+
Config Reverse the order of string components? = False
38+
# Select level of filtering for strings that may actually be addresses
39+
# none - Don't filter out any addresses
40+
# some - Filter out some addresses based on file location
41+
# all - Aggressively filter addresses based on OperandType
42+
Config Address Filtering: = all
43+
44+
# Select True/False for whether to output the results in each way
45+
Output Print to console = True
46+
Output Add pre-comment = False
47+
Output Add to defined strings (Requires Exclusive Checkout) = False

ghidra_scripts/StackStringExplorer.py

Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
"""
2+
Finds stack strings and displays according to user preferences
3+
"""
4+
5+
# Finds stack strings and adds to defined strings
6+
# @author
7+
# @category Strings
8+
9+
import os
10+
import sys
11+
12+
# Add . to the system path to allow importing library code
13+
sys.path.insert(
14+
0,
15+
os.path.join(
16+
os.path.dirname(os.path.dirname(os.path.abspath(__file__))),
17+
"stackstring_helper_scripts",
18+
),
19+
)
20+
# pylint: disable=wrong-import-position
21+
from technique import Technique
22+
from simulator import Simulator
23+
from region_simulator import RegionSimulator
24+
from scraper import Scraper
25+
from stack_strings_enums import (
26+
AnalysisTechnique,
27+
Domain,
28+
)
29+
from specific_utilities import check_parameters
30+
from parameters import Parameters
31+
from io_utilities import (
32+
display,
33+
get_preferences_gui,
34+
get_preferences_headless,
35+
)
36+
37+
try:
38+
# Typing information for VSCode - Ghidra will not load this section
39+
# Requires ghidra_stubs from https://github.com/VDOO-Connected-Trust/ghidra-pyi-generator
40+
import typing
41+
42+
if typing.TYPE_CHECKING:
43+
from typing import List
44+
from ghidra.ghidra_builtins import (
45+
currentProgram,
46+
currentSelection,
47+
currentAddress,
48+
monitor,
49+
getFunctionContaining,
50+
getInstructionAt,
51+
isRunningHeadless,
52+
)
53+
54+
# pylint: disable=pointless-statement
55+
currentProgram, currentSelection, currentAddress, monitor
56+
getFunctionContaining, getInstructionAt, isRunningHeadless,
57+
# pylint: disable=bare-except
58+
except:
59+
pass
60+
61+
62+
def stack_strings(params):
63+
# type: (Parameters) -> None
64+
"""
65+
Detects stack strings within a program
66+
67+
:param params: A Parameters object encapsulating the inputs for the program
68+
"""
69+
70+
# identified strings
71+
strings = []
72+
simple_scrape = AnalysisTechnique.SIMPLE_SCRAPE_GUI in params.analysis_techniques
73+
simulate_regional = (
74+
AnalysisTechnique.SIMULATE_REGIONAL_GUI in params.analysis_techniques
75+
)
76+
simulate_all = AnalysisTechnique.SIMULATE_ALL_GUI in params.analysis_techniques
77+
78+
# Collect instructions
79+
instruction_sets = []
80+
81+
# Current Selection
82+
if params.domain == Domain.CURRENT_SELECTION:
83+
if currentSelection is None:
84+
raise ValueError("No Selection Found")
85+
86+
instruction_set = []
87+
address_iterator = currentSelection.getAddresses(True)
88+
89+
for address in address_iterator:
90+
inst = getInstructionAt(address)
91+
if inst is not None and inst not in instruction_set:
92+
instruction_set.append(inst)
93+
94+
instruction_sets.append(instruction_set)
95+
96+
# Current Function
97+
elif params.domain == Domain.CURRENT_FUNCTION:
98+
func = getFunctionContaining(currentAddress)
99+
if func is None:
100+
raise ValueError("No Function Found")
101+
102+
func_body = func.getBody()
103+
listing = currentProgram.getListing()
104+
inst_iterator = listing.getInstructions(func_body, True)
105+
106+
instruction_sets.append(inst_iterator)
107+
108+
# All functions
109+
elif params.domain == Domain.ALL_FUNCTIONS:
110+
# get each function
111+
func_iterator = currentProgram.getFunctionManager().getFunctionsNoStubs(True)
112+
for func in func_iterator:
113+
114+
func_body = func.getBody()
115+
listing = currentProgram.getListing()
116+
inst_iterator = listing.getInstructions(func_body, True)
117+
118+
instruction_sets.append(inst_iterator)
119+
120+
# set up analysis
121+
techniques = [] # type: List[Technique]
122+
123+
if simple_scrape:
124+
scraper = Scraper(
125+
params.look_ahead,
126+
params.min_length,
127+
params.len_of_interest,
128+
params.address_filtering_intensity,
129+
)
130+
techniques.append(scraper)
131+
if simulate_regional:
132+
region_simulator = RegionSimulator(
133+
params.look_ahead,
134+
params.min_length,
135+
params.len_of_interest,
136+
params.reverse,
137+
params.address_filtering_intensity,
138+
)
139+
techniques.append(region_simulator)
140+
if simulate_all:
141+
all_simulator = Simulator(
142+
params.reverse, params.min_length, params.address_filtering_intensity
143+
)
144+
techniques.append(all_simulator)
145+
146+
# set up progress tracking
147+
monitor.initialize(len(instruction_sets))
148+
monitor.setMessage("Scanning for strings...")
149+
150+
# Analyse instructions
151+
for instruction_set in instruction_sets:
152+
153+
# get each pcode instruction in this set
154+
for inst in instruction_set:
155+
op_iterator = inst.getPcode()
156+
for op in op_iterator:
157+
# run analysis
158+
for technique in techniques:
159+
technique.run(inst, op)
160+
161+
# end analysis and decode strings for this function
162+
for technique in techniques:
163+
technique.end_function()
164+
strings.extend(technique.get_strings())
165+
technique.reset()
166+
167+
monitor.incrementProgress()
168+
169+
# output the strings
170+
removed_duplicates = []
171+
for string in strings:
172+
if string not in removed_duplicates:
173+
removed_duplicates.append(string)
174+
175+
display(removed_duplicates, params.output_options)
176+
177+
178+
if __name__ == "__main__":
179+
if isRunningHeadless():
180+
parameters = get_preferences_headless()
181+
else:
182+
parameters = get_preferences_gui()
183+
check_parameters(parameters)
184+
stack_strings(parameters)

images/ManageScriptDirectories.png

2.65 KB
Loading
139 KB
Loading

images/NonStackExample.png

40.1 KB
Loading

images/ScriptManager.png

7.6 KB
Loading

images/ScriptManagerIcon.png

1.44 KB
Loading

images/StackStringExample.png

49.4 KB
Loading

0 commit comments

Comments
 (0)