Skip to content

Conversation

@PhilippTakacs
Copy link
Contributor

So you don't need to manual clear the tb after change code.

see #2258

So you don't need to manual clear the tb after change code.

see unicorn-engine#2258
@gerph
Copy link
Contributor

gerph commented Dec 4, 2025

Presumably, this will make the calls to mem_write very costly as they need to search all the TBs for the possible blocks to be flushed.

Consider a data copy which reads from one area of memory to another, stopping when a terminal condition is met...
eg

while (true):
    byte = mem_read(address)
    if byte == 255:
         break
    mem_write(address, byte ^ 255)
    address+=1

(yes, that's a contrived example, but if you think about string operations, or a compression operation you might have something similar)

Such an example is now slowed down by a redundant TB flush on every mem_write call.

It's also potentially wrong, because on some architectures you must perform operations to flush the cache in order to allow the newly written instruction to be executed. For example, on ARM if you had executed that instruction previously (or, as in #2258, it had been pulled in as part of the cache lines for existing code) it would (depending on the translation tables) be cached in the instruction cache. Unless you explicitly flushed that cache, the execution of that address would execute the old instruction. That's what the cache is there for. (sorry, I know ARM best, but possibly other architectures have this behaviour - the example was for x86, and if that's normal for x86 then it should be protected as such in this change).

If you want to have the behaviour of the CPU, the existing behaviour (where you must explicitly flush the TB if it has been executed and you changed the address) is more correct. It depends on what you are trying to do - if you just want to run code, then it might help to have the cached addresses automatically flushed. But if you're trying to see what the CPU would execute then you do not want any automatic behaviour - convenience behaviour like this would break any code that tried to show the behaviour of a race condition on a code block store. Would that matter? If you were trying to simulate a bad actor exploiting odd behaviour then maybe. If you were trying to test whether your code would work on a real system then certainly it would matter, because you're not doing what a real system does.

I would suggest that the automatic TB flush be protected by a run time configuration option to allow the user to decide which behaviour they prefer, and that the option be disabled so that performance and behavioural regressions are not seen by existing code - you should opt in to something that could change the behaviour of your application.

@PhilippTakacs
Copy link
Contributor Author

Presumably, this will make the calls to mem_write very costly as they need to search all the TBs for the possible blocks to be flushed

First of all uc_mem_write function is not changed it's only the uc_vmem_write function. Even when you want to use the mmu you can still use uc_vmem_translate in combination with uc_mem_write.

It's also potentially wrong, because on some architectures you must perform operations to flush the cache in order to allow the newly written instruction to be executed. For example, on ARM if you had executed that instruction previously (or, as in #2258, it had been pulled in as part of the cache lines for existing code) it would (depending on the translation tables) be cached in the instruction cache. Unless you explicitly flushed that cache, the execution of that address would execute the old instruction

I would argue that the uc_vmem_write api should behave like you would execute code on the cpu doing this write. So when your architecture doesn't flush the instruction cache for you you still need to do this yourself. On an architecture which does clear the instruction cache automatically, it should just work. As far as I see this is exactly what happen with my patch.

To make it clear: I don't need this patch. I have just remembered the change while make my other PR.

@PhilippTakacs
Copy link
Contributor Author

As far as I see this is exactly what happen with my patch

Also I can write some tests to check, if my assumption is correct. But this need some time.

@gerph
Copy link
Contributor

gerph commented Dec 4, 2025

First of all uc_mem_write function is not changed it's only the uc_vmem_write function. Even when you want to use the mmu you can still use uc_vmem_translate in combination with uc_mem_write.

Oh! I missed the v on that... ok, so that's a lot less impactful as I suspect the number of users using mmu are far fewer.

I would argue that the uc_vmem_write api should behave like you would execute code on the cpu doing this write. So when your architecture doesn't flush the instruction cache for you you still need to do this yourself. On an architecture which does clear the instruction cache automatically, it should just work. As far as I see this is exactly what happen with my patch.

That's exactly what I was saying - I didn't see any code that was omitting the flush on ARM architectures, which means that it's not the same as you would see if you executed the code on an ARM. The line:

if (uc_ctl_remove_cache(uc, address, address + len) != UC_ERR_OK) {

has no conditions to stop it being flushed on ARM.

@PhilippTakacs
Copy link
Contributor Author

PhilippTakacs commented Dec 5, 2025

Now I understand what your problem is. I think we speak about different thinks. I want to remove the TB cache, which is an extra cache from unicorn/qemu holding prebuild executions blocks.

I assume that clearing the TB cache on arm has no effect on the ICache of arm. So when you change the memory without removing the cached TB the cached TB is emulated. When removing the cached TB (my patch) the instructions are fetched from ICache, translated to a new TB and executed. So when you want some similar behavior like in #2258 you still need to clear the ICache.

This is only my assumption, I haven't though about the ICache before you mentioned it.

@gerph
Copy link
Contributor

gerph commented Dec 5, 2025

I assume that clearing the TB cache on arm has no effect on the ICache of arm.

I'm treating the TBs as the instruction cache - they're essentially a cached block which can be run if you start execution. If you modify the memory, the cached block will be used rather than the modified form, so they're equivalent to the ICache in that sense. They don't have the same effect of having cache lines populated, and you may get a lot more cached into the TB than you would have with real hardware, but they're very close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants