DO NOT MERGE - for comparison purposes #2

kyle-pena-kuzco · 2025-03-29T20:51:39Z

Motivation

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

…then doing bash toploc-scripts/analyze_activations.py

…lock may be unnecessary. will revisit.

…ing's implementation of capturing hidden states

…ngerprint

…e unfortunately

…n, even though they are being generated

…r all the tokens instead of just the last one. will investigate / refine later.

…request pipeline

…he ModelWorkerBatch

… runner .forward method

…is messy and will clean up shortly.

kyle-pena-kuzco · 2025-04-09T04:12:37Z

python/sglang/srt/layers/logits_processor.py

+        if logits_metadata.toploc_verification:
+            toploc_verification_hidden_states_to_store = (
+                pruned_states[sample_indices] if sample_indices else pruned_states
+            )


logits_processor contains information at the batch level, so basically the first dimension has all the inferences in the current batch concatenated together. each inference in a batch is called a sequence, "seq".

Thus if seq 1 has k tokens, and seq 2 has m tokens, then pruned_states has the hidden states for indices (k-1) and (k+m-1), which are the last tokens for seq 1 and seq 2.

In some cases there's only 1 sequence in a batch, but not in all cases.

Slicing out the hidden states for the last token in each sequence is what pruned_states is. There's also this sample_indices but that has to do with more exotic usages of sglang that I don't currently think are relevant.

kyle-pena-kuzco · 2025-04-09T04:22:47Z

python/sglang/srt/managers/io_struct.py

+    origin_input_ids: Optional[List[List[int]]] = None
+    # Output token ids (for return_output_ids=True)
+    output_token_ids: Optional[List[List[int]]] = None
+


GenerateReqInput represents either a batch or a single request, depending on the context it's used in. I'm following the pattern here as is used for all the other fields.

kyle-pena-kuzco · 2025-04-09T04:23:52Z

python/sglang/srt/managers/schedule_batch.py

+
+        # Ensure CAPTURE_HIDDEN_MODE is *at least* LAST if toploc verification is enabled
+        if self.toploc_verification and capture_hidden_mode == CaptureHiddenMode.NULL:
+            capture_hidden_mode = CaptureHiddenMode.LAST


CaptureHiddenMode has to be at least LAST in order for toploc verification to work, because otherwise we can't capture the last layer's activations in order to generate or validate a fingerprint. CaptureHiddenMode signals to pytorch to retain these values so they can be cloned to the CPU.

kyle-pena-kuzco · 2025-04-09T04:26:40Z

python/sglang/srt/managers/scheduler_output_processor_mixin.py

+                                    toploc_verification_hidden_state,
+                                    req.toploc_verification_fingerprint_to_validate,
+                                )
+                            )


You'll see a very similar block of code relating to hidden_states here as well. The difference is that I'm invoking the fingerprint and/or fingerprint verification methods depending on what is being requested.

kyle-pena-kuzco · 2025-04-09T04:27:49Z

python/sglang/srt/managers/scheduler_output_processor_mixin.py


-                    if req.grammar is not None:
-                        req.grammar.accept_token(next_token_id)
-                        req.grammar.finished = req.finished()


I think this shouldn't have been dropped? this may have been an oversight or a side effect of syncing with the main branch.

kyle-pena-kuzco · 2025-04-09T04:28:24Z

python/sglang/srt/managers/tokenizer_manager.py

+                    logger.error(
+                        f"Error processing toploc verification fingerprint validation results: {e}"
+                    )
+


This looks like a lot, but it's all just what's required to pass things through in the API layer.

kyle-pena-kuzco · 2025-04-09T04:29:13Z

python/sglang/srt/server_args.py

+            default=128,
+            help="Top-k for TopLoc verification",
+        )
+


This is where we specify the flags.

kyle-pena-kuzco · 2025-04-09T04:34:12Z

python/sglang/srt/managers/scheduler_output_processor_mixin.py

+                    # No need to generate a fingerprint until the last decode step of the sequence
+                    req.toploc_verification_hidden_states.append(None)
+                    req.toploc_verification_fingerprints.append(None)
+


process_batch_result_decode is where "post processing" on a "decode" happens. A "decode" is the production of a single new token. We use the req.finished flag to only do toploc stuff on the last token.

…well

…as another setting to test

Kyle Pena and others added 30 commits March 27, 2025 18:55

polished up writing tensors on last hidden state. about to push to 3090

e35bfd2

added some helpful scripts

e0bde7f

some extra scripts

ec8d44e

script enhancements

e5a7e38

checkpoint: fingerprint can be generated by running an inference and …

5d1c520

…then doing bash toploc-scripts/analyze_activations.py

got the proofs probably in the logitsoutput processor. the threading …

c833e7e

…lock may be unnecessary. will revisit.

checkpoint before i switch to piggybacking on eagle speculative decod…

8a6460d

…ing's implementation of capturing hidden states

temporary commit before opening draft PR for comparison purposes

8dec243

stripped out some of my first pass hidden state activation code

4fe0579

stripped out some more stuff

fbb7c48

checkpoint before finishing wiring up returning proofs if --toploc-fi…

dce61ab

…ngerprint

checkpoint just prior to adding a lot of logging for verification toploc

0097130

backed out of some weird changes that claude made

95c81dd

more weird changes undone

cc6cb05

checkpiont before simplifying flag setting

32f82da

proofs are generating, they just aren't being included on the respons…

dde6093

…e unfortunately

checkpoint - can't seem to get proofs to get received and transmissio…

499d745

…n, even though they are being generated

verification proofs are being included with the response, although fo…

49091f2

…r all the tokens instead of just the last one. will investigate / refine later.

checkpoint on working the verification_proof_to_validate through the …

0f76945

…request pipeline

fixed a typing issue with list vs str for verification_proof_to_validate

af11ec3

checkpoint: got the verification proof to validate all the way into t…

f290f4c

…he ModelWorkerBatch

got verification proof to validate into the ForwardBatch of the model…

456b485

… runner .forward method

got verification to execute (not yet returned)

b45b6a6

got the verification results appearing in the response

34c8a87

implemented input_token_ids in response if requested. implementation …

0f4a0cc

…is messy and will clean up shortly.

just got the output ids to go out with the response.

22b2a95

added scripts and prelim results with spoofing

e13288c

made proof generation only happen on prefill and last token

d0a6a09

added a nice verificatoin readme

7aed417

updated backgroudn color for diagrams

088100c

kyle-pena-kuzco commented Apr 9, 2025

View reviewed changes

added back example script

eb62074

kyle-pena-kuzco commented Apr 9, 2025

View reviewed changes

python/sglang/srt/server_args.py

default=128,

help="Top-k for TopLoc verification",

)

Copy link

Author

kyle-pena-kuzco Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where we specify the flags.

kyle-pena-kuzco commented Apr 9, 2025

View reviewed changes

kyle-pena-kuzco added 21 commits April 9, 2025 04:42

updated readme a little

616e90e

added a quick note

1899952

a few more notes on phases

5917383

added some notes on my mindset

983968c

small update to words

a40e33d

last small change

ab040fc

more words

a52f4e4

wrote some test scripts, will use these to capture stats

977f7f1

wrote a bunch of test scripts for toploc and some for replication as …

21617d0

…well

updated fingerprint batch size to 100

1e02b57

added prefill attack test script

b823804

added a bunch of scripts, re-worked replication testing flow

053aaee

script updates

1ed640b

re-organized the toploc-scripts folder.

e23fb49

changes to scripts. removed file that shouldn't be there

7bb8947

lots of tests for various verification methods

75ad272

updates to NLL based experiments. getting close i think

5164baf

added top-k post-hoc renormalization.

4ba2a34

pointed data collection scripts at a clean branch. added temperature …

4701e20

…as another setting to test

removed results from repo i mistakenly committed

06859b5

changed gitignore

0c34902

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DO NOT MERGE - for comparison purposes #2

DO NOT MERGE - for comparison purposes #2

Uh oh!

kyle-pena-kuzco commented Mar 29, 2025

Uh oh!

kyle-pena-kuzco Apr 9, 2025 •

edited

Loading

Uh oh!

kyle-pena-kuzco Apr 9, 2025

Uh oh!

kyle-pena-kuzco Apr 9, 2025 •

edited

Loading

Uh oh!

kyle-pena-kuzco Apr 9, 2025

Uh oh!

kyle-pena-kuzco Apr 9, 2025

Uh oh!

kyle-pena-kuzco Apr 9, 2025

Uh oh!

kyle-pena-kuzco Apr 9, 2025

Uh oh!

kyle-pena-kuzco Apr 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DO NOT MERGE - for comparison purposes #2

Are you sure you want to change the base?

DO NOT MERGE - for comparison purposes #2

Uh oh!

Conversation

kyle-pena-kuzco commented Mar 29, 2025

Motivation

Modifications

Checklist

Uh oh!

kyle-pena-kuzco Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kyle-pena-kuzco Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

kyle-pena-kuzco Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kyle-pena-kuzco Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

kyle-pena-kuzco Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

kyle-pena-kuzco Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

kyle-pena-kuzco Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

kyle-pena-kuzco Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kyle-pena-kuzco Apr 9, 2025 •

edited

Loading

kyle-pena-kuzco Apr 9, 2025 •

edited

Loading