Decided to tackle hardware transactional memory because I was tired of seeing pathetic lock contention numbers in our parallel workloads. The MESI protocol clearly wasn't cutting it - watching 42% of cycles go to waste on coherency traffic was physically painful to look at.
Started with a quad-core RV64GC base because who doesn't love the elegance of RISC-V's clean ISA? Extended it with custom HTM instructions (XBEGIN, XEND, XABORT - the usual suspects). But the real fun began when I had to make the cache coherency protocol transaction-aware.
Microarchitectural implementation showing the integration of transactional logic with the core pipeline, featuring specialized commit/abort pathways and cache controller interaction mechanisms.
The initial MESI to TMESI transition was... interesting. Watching the state diagram explode from 4 states to 11 was both terrifying and fascinating. Had to track transactional reads, speculative writes, and all those lovely corner cases that make cache coherency the nightmare that it is.
The initial cache design was embarrassingly naive. Watching it thrash under transactional loads was almost educational. Fixed these through several iterations of increasingly less terrible designs:
Distribution of transaction aborts across different categories, with false conflicts and capacity aborts representing significant optimization opportunities.
Cache performance degradation analysis across varying cache sizes, showing optimal performance in the 16B-64B range for transactional workloads.
Near-linear throughput scaling from 1 to 4 cores, achieving 13.4M transactions/second with minimal contention overhead.
Power consumption profile showing distinct patterns during commit and conflict resolution phases, with optimization opportunities in conflict handling.
Current performance metrics (running on Artix-7, because FPGAs are life):
My favorite bug: A perfectly horrible race condition where a cache line would get evicted right as a transaction was committing, but only if the moon was in the right phase and the CPU temperature was precisely 47.3°C. Took two weeks and an unhealthy amount of coffee to track that one down.
But hey, at least it's faster than mutex locks. Sometimes. When the planets align.