Unlike some other architectures, RISC-V does not expose the current
privilege mode in any architecturally-defined register. That is intentional
to make it easier to implement virtualization in software, but a Unicorn
caller operates outside of the emulated hart and so it can and should be
able to observe and change the current privilege mode in order to properly
emulate certain behaviors of a real CPU.
The current privilege level is therefore now exposed as a new
pseudo-register using the name "priv", which matches the name of the
virtual register used by RISC-V's debug extension to allow the debugger
to read and change the privilege mode while the hart is halted. Unicorn's
use of it is conceptually similar to a debugger.
The bit encoding of this register is the same as specified in RISC-V Debug
Specification v1.0-rc3 Section 4.10.1. It's defined as a "virtual"
register exposing a subset of fields from the dcsr register, although here
it's implemented directly inside the Unicorn code because QEMU doesn't
currently have explicit support for the CSRs from the debug specification.
If it supports "dcsr" in a future release then this implementation could
change to wrap reading and writing that CSR and then projecting the "prv"
and "v" bitfields into the correct locations for the virtual register.
* Remove global variable from aarch64 tcg target
This obviously breaks trying to run two unicorn instances at once on
aarch64. It appears a similar variable had already been moved to the
state struct for i386 tcg target.
* Reenable writing to jit region while calling tb_add_jump
On arm macs, every place that writes to jit code needs to have
tb_exec_unlock called first. This is already in most necessary places,
but not this one.
* Don't forget to call restore_jit_state in uc_context_restore
Every time UC_INIT is used, restore_jit_state must be used on the return
path, or occasional assertion failures will pop up on arm macs.
* Restore pc before calling into tlb fill hook
In my application it is important to have correct pc values available
from this hook.
* enable notdirty_write for snapshots when possible
Snapshots only happens when the priority of the memory region is smaller
then the snapshot_level. After a snapshot notdirty can be set.
* disable notdirty_write for self modifying code
When SMC access the memory region more then once the
tb must be rebuild multible times.
fixes#2029
* notdirty_write better hook check
Check all relevant memory hooks before enabling notdirty write.
This also checks if the memory hook is registered for the affected
region. So it is possible to use notdirty write and have some hooks
on different addresses.
* notdirty_write check for addr_write in snapshot case
* self modifying code clear recursive mem access
when self modifying code does unaligned memory accese sometimes
uc->size_recur_mem is changed but for notdirty write not changed back.
This causes mem_hooks to be missed. To fix this uc->size_recur_mem is
set to 0 before each cpu_exec() call.
Some structs, specically CPUARMState is 16-bytes aligned.
This causes segment fault because gcc tends to vectorize
the assignment of the struct with infamous movaps tricks.
Without this patch, we fail on manylinux with 2.17 glibc
in release mode in i686.
qemu_memalign will ensure the alignment across platforms.
* optimize ram block handling
Save the last element of the ram_list. This allows to
faster find where to add new elements when they are not
bigger then page size.
* save ram_list freed
this keeps the optimization for find_ram_offset() intact after snapshot
restore.
* cow only clear the tlb of affected pages
* update flatview when possible
Building each flatview new when the memory has changed is quite
expensive when many MemoryRegions are used. This is an issue when using
snapshots.
* update benchmark for new api
* save flatview in context
this avoids rebuilding the flatview when restore a context.
* init context flatview with zero
* address_space_dispatch_clear remove subpage with higher priority
* docutemnt the options for UC_CTL_CONTEXT_MODE
Specialy stress that with UC_CTL_CONTEXT_MEMORY it is not possible to
use the context with a different unicorn object.
Every store would always cause the tb_invalidate_phys_page_fast path to be invoked,
amounting to a 40x slowdown of stores compared to loads.
Change this code to only worry about TB invalidation for regions marked as
executable (i.e. emulated executable).
Even without uc_set_native_thunks, this change fixes most of the performance
issues seen with thunking to native calls.
Signed-off-by: Andrei Warkentin <andrei.warkentin@intel.com>
The wfi exception trigger behavior should take into account user mode,
hstatus.vtw, and the fact the an wfi might raise different types of
exceptions depending on various factors:
If supervisor mode is not present:
- an illegal instruction exception should be generated if user mode
executes and wfi instruction and mstatus.tw = 1.
If supervisor mode is present:
- when a wfi instruction is executed, an illegal exception should be triggered
if either the current mode is user or the mode is supervisor and mstatus.tw is
set.
Plus, if the hypervisor extensions are enabled:
- a virtual instruction exception should be raised when a wfi is executed from
virtual-user or virtual-supervisor and hstatus.vtw is set.
Signed-off-by: Jose Martins <josemartins90@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20210420213656.85148-1-josemartins90@gmail.com
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
This line appears to be trying to undo the effect of adding 4 to pc above,
but does so incorrectly and so ends up returning with next_pc earlier than
it was prior to decoding.
This causes the translator to malfunction because it does not expect
pc_next to decrease during decoding: this is effectively reporting that
the invalid construction has a negative size, which is impossible. The
decoder uses the increase in next_pc to decide the translation block size,
but converts it to uint16_t thereby causing a block containing _only_ an
invalid instruction to be treated as having size 65532 (reinterpreted -4)
and therefore the translation loop tries to find the next translation block
at 65532 bytes after the invalid instruction, which can cause a spurious
instruction access/page fault if the page containing that address is not
mapped as executable.
In practice we don't need to readjust the pc at all here because it is
correct to report that the invalid instruction is four bytes long. This
allows the translation loop to correctly find the next instruction, and
to avoid producing spurious TLB fills that might cause incorrect exceptions.
* Add a quick test helper macro to test_x86.c
* Add regression tests for bswap and rex prefixes
* Properly ignore REX prefixes when appropriate
* Fix bswap ax emulator detection