This week covers the fundamentals of RTL design, synthesis, and gate-level simulation (GLS) using open-source tools like Icarus Verilog (iverilog), GTKWave, and Yosys with the Sky130 process design kit (PDK). The focus is on creating Verilog RTL designs, simulating them to verify functionality, and synthesizing them into gate-level netlists using standard cell libraries. Key topics include understanding timing libraries (.lib files), exploring hierarchical versus flat synthesis, and applying combinational and sequential optimizations to reduce area, power, and delay. The course also addresses common pitfalls like simulation-synthesis mismatches caused by improper coding practices, such as missing sensitivity lists or incorrect use of blocking and non-blocking assignments. Additionally, it introduces scalable coding techniques using for-loops and for-generate constructs for efficient hardware design. Practical labs reinforce these concepts by simulating designs, synthesizing them with Yosys, and verifying netlists through GLS, ensuring alignment between RTL and synthesized hardware behavior.
Main Project: https://github.com/RupamBora-ASIC/rupambora_RISC-V-SoC-Reference-Tapout-Program_VSD
π
- Introduction
- Day 1 β Verilog RTL Design and Synthesis
- Day 2 β Timing Libraries and Coding Styles
- Day 3 β Combinational and Sequential Optimizations
- Day 4 β GLS and Simulation Mismatches
- Day 5 β Synthesis Optimizations
- Conclusion
Week 1 focuses on RTL synthesis and gate-level simulation (GLS) using open-source tools.
- Simulation: Icarus Verilog (iverilog) + GTKWave
- Synthesis: Yosys with SKY130 standard-cell library
- Write and simulate RTL with iverilog, analyze waveforms in GTKWave.
- Synthesize RTL into gate-level netlist mapped to SKY130 cells using Yosys.
- Reuse same testbench for RTL and GLS to confirm functional equivalence.
- Understand
.lib
timing libraries, PVT corners, and their effect on synthesis. - Compare hierarchical vs flat synthesis flows.
- Study combinational and sequential optimizations (constant propagation, Boolean simplification, retiming, flop removal).
- Identify simulation-synthesis mismatches (SSM) caused by bad coding practices.
- Apply correct Verilog coding styles for
if
/case
, blocking vs non-blocking, resets, and loops. - Use
for
(insidealways
) for logic evaluation andfor-generate
(outsidealways
) for hardware replication.
- Event-driven simulation: Simulator updates outputs only on input changes; VCD records transitions.
- Technology libraries (.lib): Define timing, power, and area for cells at different PVT corners.
- Optimizations: Automatic removal of dead logic, constant propagation, unused outputs; area and power reduction.
- Coding practices:
- Use
always @(*)
for combinational logic. - Use non-blocking (
<=
) for sequential logic. - Add
default
in allcase
statements. - Avoid incomplete branches to prevent inferred latches.
- Use
- Synthesis vs Simulation: Netlist preserves RTL I/O, enabling direct testbench reuse. GLS validates real hardware behavior beyond RTL simulation.
- Pre vs post-synthesis waveforms (screenshots).
- Yosys synthesis statistics (area, cell count, optimization logs).
- Notes on observed optimizations and mismatches.
- A simulator checks if RTL design follows the specification.
- RTL design = Verilog code implementing the spec.
- In this course, we use iverilog (Icarus Verilog, open-source).
- Simulator output = VCD (Value Change Dump) file.
Item | Description | I/O |
---|---|---|
Design (DUT) | Verilog code that implements logic. Example: inverter, counter, ALU. | Has primary inputs and primary outputs |
Testbench (TB) | Applies inputs (stimulus) to DUT and checks outputs. Instantiates the DUT. | No primary inputs/outputs |
- Stimulus = input values applied to DUT.
- Observer = mechanism to watch outputs.
- Event-driven: output is evaluated only when inputs change.
- No input change β no output evaluation.
- Dumps results to VCD file.
- VCD contains only value changes, not constant values.
[ design.v ] + [ testbench.v ] β iverilog β [ simulation executable ]
executable (vvp) β generates β [ dump.vcd ]
dump.vcd β viewed with β GTKWave
# compile design and testbench
iverilog -o sim.out design.v tb.v
# run simulation
vvp sim.out
# open waveform in GTKWave
gtkwave dump.vcd
Design (inverter)
// design.v
module inverter(input a, output y);
assign y = ~a;
endmodule
Testbench
// tb.v
module tb;
reg a;
wire y;
inverter uut (.a(a), .y(y));
initial begin
$dumpfile("dump.vcd"); // VCD file
$dumpvars(0, tb); // dump all signals in tb
a = 0; #10;
a = 1; #10;
a = 0; #10;
$finish;
end
endmodule
-
Run simulation to generate dump.vcd.
-
Open
dump.vcd
in GTKWave. -
Inspect waveforms:
- Input toggles
- Corresponding output response
π Insert diagram of Design β Testbench β Simulator β VCD β GTKWave
here
π Insert GTKWave screenshot of inverter waveform here
- Simulator = tool to check spec compliance.
- DUT (design) has inputs/outputs.
- Testbench applies stimulus and observes outputs.
- iverilog generates VCD.
- GTKWave visualizes VCD as waveforms.
πΌ Back to Table of Contents
- Create a working directory:
mkdir VLSI && cd VLSI
-
Clone VSD flow:
git clone <vsdflow-repo-link>
-
Clone Sky130 RTL Design and Synthesis Workshop:
git clone https://github.com/kunalg123/sky130RTLDesignAndSynthesisWorkshop.git cd sky130RTLDesignAndSynthesisWorkshop
Directory contents:
mylib/
βββ lib/ # Sky130 standard-cell .lib timing models
βββ VerilogModel/ # Standard-cell Verilog models
verilog_files/ # All lab RTL designs and testbenches
π Insert screenshot of directory structure here
-
Move to
verilog_files/
:cd verilog_files/
-
Each design has a 1:1 matching testbench:
- Example:
goodmux.v
βtb_goodmux.v
- Example:
-
Compile and simulate:
# compile design + testbench iverilog -o sim_goodmux goodmux.v tb_goodmux.v # run executable β generates dump.vcd ./sim_goodmux
π Insert terminal screenshot of compile + run here
-
Launch GTKWave:
gtkwave dump.vcd
-
Steps inside GTKWave:
- Expand tb β uut hierarchy.
- Drag signals (I0, I1, sel, Y) to waveform pane.
- Use Zoom Fit to view full simulation time.
- Use zoom in/out for details.
- Use forward/backward arrows to trace signal transitions.
π Insert GTKWave screenshot with signals plotted here
Design:
module goodmux(input I0, input I1, input sel, output Y);
assign Y = sel ? I1 : I0;
endmodule
Testbench:
module tb_goodmux;
reg I0, I1, sel;
wire Y;
goodmux uut(.I0(I0), .I1(I1), .sel(sel), .Y(Y));
initial begin
$dumpfile("dump.vcd");
$dumpvars(0, tb_goodmux);
I0 = 0; I1 = 0; sel = 0;
#300 $finish;
end
always #75 sel = ~sel;
always #50 I0 = ~I0;
always #25 I1 = ~I1;
endmodule
Expected behavior:
sel=0
β Y follows I0sel=1
β Y follows I1
Waveform observations:
- Output Y switches immediately when
sel
toggles. - Matches mux functional specification.
π Insert waveform screenshot highlighting sel, I0, I1, Y
verilog_files/
contains both designs and matching testbenches.- iVerilog compiles RTL + testbench β generates VCD dump.
- GTKWave is used for waveform-based functional verification.
- Testbench applies stimulus, but does not check output automatically.
- Verification is done by observing signals in GTKWave.
πΌ Back to Table of Contents
- Yosys = open-source RTL synthesizer.
- Converts RTL (Verilog behavioral code) β gate-level netlist.
- Uses technology library (
.lib
) for standard cells. - Netlist = same design, but expressed as instances of standard cells.
- RTL design file (
design.v
). - Standard cell library (
.lib
).
# Load RTL
read_verilog design.v
# Load standard cells
read_liberty -lib sky130_fd_sc_hd__tt_025C_1v80.lib
# Run synthesis
synth -top top_module
# Write netlist
write_verilog netlist.v
- Gate-level netlist.v containing standard cells (e.g., NAND, NOR, MUX, DFF).
-
Primary inputs/outputs remain same between RTL and netlist.
-
Same testbench can be reused.
-
Flow:
- Simulate netlist with
iverilog
. - Generate VCD.
- View in GTKWave.
- Compare waveforms with RTL simulation.
- Simulate netlist with
-
β If waveforms match β synthesis is correct.
iverilog netlist.v tb.v VerilogModel/*.v -o sim_gls
./sim_gls
gtkwave dump.vcd
-
Converts behavioral Verilog (RTL) β logic gates.
-
Steps:
- Parse RTL.
- Map operations (
assign
,always
) β gates + flops. - Optimize with constraints.
- Write gate-level Verilog (netlist).
Flow:
Specification β RTL (Verilog) β Yosys + .lib β Netlist (gates)
-
.lib
contains standard cells:- Combinational (AND, OR, NAND, NOR, INV, XOR).
- Sequential (DFF, latch).
-
Multiple flavors:
- 2-input, 3-input, 4-input gates.
- Slow / Medium / Fast versions.
-
Universal gates (NAND/NOR) β sufficient for any Boolean logic.
- To avoid setup violation:
T_clk β₯ T_cqA + T_comb + T_setupB
- Max frequency:
F_clk = 1 / T_clk
- Use fast cells to reduce
T_comb
.
- To avoid hold violation:
T_cqA + T_comb β₯ T_holdB
- Sometimes need slow cells to add delay.
Cell Type | Delay | Area/Power | Use Case |
---|---|---|---|
Fast (wide transistor) | Low | High | Critical paths (setup) |
Slow (narrow transistor) | High | Low | Hold fixing, power saving |
- Too many fast cells β high area/power, possible hold violations.
- Too many slow cells β performance bottleneck.
- Constraints guide synthesizer to choose balance.
module top(input clk, rst, a, b, sel, output reg y);
wire d = sel ? b : a;
always @(posedge clk or posedge rst)
if (rst) y <= 0;
else y <= d;
endmodule
mux2x1 u1 (.A(a), .B(b), .S(sel), .Y(net1));
dff u2 (.D(net1), .CLK(clk), .RST(rst), .Q(y));
? :
β multiplexer cell.always
β flip-flop cell.- Connections made using
.lib
cells.
-
Yosys maps RTL β netlist using
.lib
. -
Netlist I/O = RTL I/O β same testbench works.
-
Verification = simulate netlist and compare with RTL.
-
.lib
has multiple cell flavors to balance:- Setup (performance).
- Hold (reliability).
- Power/area.
πΌ Back to Table of Contents
- Use Yosys to synthesize RTL (
good_mux.v
) into a gate-level netlist using the Sky130 standard cell library. - Verify that the synthesized design matches expected MUX behavior.
- Generate and analyze the structural Verilog netlist.
After cloning the repo:
βββ mylib/
β βββ lib/
β βββ sky130_fd_sc_hd__tt_025C_1v80.lib # Sky130 library
βββ verilog_files/
βββ good_mux.v # RTL design
π Placeholder for directory screenshot
yosys
- Installed as part of VSD-Flow.
- Prompt changes to
yosys>
when active.
π Placeholder for Yosys prompt screenshot
read_liberty -lib ../mylib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
Library naming breakdown:
Segment | Meaning |
---|---|
sky130 |
130nm node |
fd |
foundry design |
sc |
standard cell |
hd |
high density |
tt |
typical corner |
025C |
25 Β°C |
1v80 |
1.8 V |
read_verilog good_mux.v
- Expected log:
Successfully finished Verilog frontend.
synth -top good_mux
- Ensures synthesis runs on the correct RTL module.
- Required when design has multiple modules.
abc -liberty ../mylib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
Yosys log shows:
-
Inputs:
i0
,i1
,sel
-
Output:
Y
-
Internal signals: none
-
Cells inferred:
sky130_fd_sc_hd__inv
(inverter)sky130_fd_sc_hd__nand2
(2-input NAND)sky130_fd_sc_hd__o21ai
(OR-AND-INVERT complex gate)
π Placeholder for Yosys log snippet
show
Graphical schematic (via Graphviz):
i0
β inverter β O21AI inputi1
+sel
β NAND β O21AI inputsel
also drives O21AI directly- O21AI output β
Y
Boolean equation realized:
Y = (i0 Β· selΜ
) + (i1 Β· sel)
β matches 2:1 MUX.
π Placeholder for schematic screenshot
Default (verbose):
write_verilog goodmux_netlist.v
Clean version:
write_verilog -noattr goodmux_netlist.v
module good_mux(i0, i1, sel, Y);
input i0, i1, sel;
output Y;
wire net4, net5;
sky130_fd_sc_hd__inv_1 u1 (.A(i0), .Y(net4));
sky130_fd_sc_hd__nand2_1 u2 (.A(i1), .B(sel), .Y(net5));
sky130_fd_sc_hd__o21ai_1 u3 (.A1(sel), .A2(net4), .B1(net5), .Y(Y));
endmodule
π Placeholder for netlist screenshot
- Primary inputs:
i0
,i1
,sel
- Internal wires: e.g.,
net4
(inverter output),net5
(NAND output) - Primary output:
Y
- Top module: same as RTL (
good_mux
) β allows testbench reuse in GLS.
read_liberty β read_verilog β synth -top
β abc -liberty β show β write_verilog
- RTL mapped to Sky130 standard cells.
- Complex gates (e.g.,
O21AI
) used for optimization. - Netlist preserves module name and I/O.
- Functional equivalence confirmed with Boolean simplification.
- Always use
-noattr
for clean netlists.
- Simulator basics: iverilog compiles RTL + testbench β generates
dump.vcd
for waveform analysis in GTKWave. - Design vs Testbench: DUT has I/O; testbench provides stimulus and observes outputs, no I/O.
- Event-driven simulation: signals update only on input changes; VCD records only value transitions.
- Functional verification: waveforms confirm DUT behavior (e.g., inverter, mux).
- Yosys introduction: converts RTL to gate-level netlist using Sky130
.lib
. - Netlist properties: same I/O as RTL, structural Verilog with standard cells (INV, NAND, MUX, DFF).
- GLS flow: RTL testbench reused for netlist simulation; matching waveforms confirm synthesis correctness.
- Technology library role: provides timing, cell variants (fast/slow), and enables optimization for setup/hold.
πΌ Back to Table of Contents
-
.lib
(Liberty file) = timing, power, and area model of standard cells. -
Required for:
- Synthesis (map RTL β gates, optimize area/power/timing).
- Static Timing Analysis (STA).
- Power estimation.
-
Contains:
- Cell types: INV, NAND, NOR, AND, OR, AOI/OAI, DFF, MUX.
- Cell variants (different drive strengths).
- Leakage power (all input combinations).
- Propagation delay (rise/fall).
- Area in Β΅mΒ².
- Operating conditions (PVT: Process, Voltage, Temperature).
.lib
files. They are generated by foundry characterization.
.lib
files are characterized at specific PVT corners.
Parameter | Symbol | Example | Notes |
---|---|---|---|
Process | P | TT / SS / FF | Variation from fabrication (transistor dimensions, doping). |
Voltage | V | 1.8 V, 0.9 V | Affects delay and power. Higher V = faster, more power. |
Temperature | T | -40Β°C to 125Β°C | Higher T = slower transistors, lower T = faster. |
Example filename:
sky130_fd_sc_hd__tt_025C_1v80.lib
Breakdown:
sky130
β 130nm technology.fd_sc_hd
β Foundry, standard cell, high density.tt
β Typical process.025C
β 25 Β°C.1v80
β 1.80 V supply.
β Designers must check circuits across all PVT corners to ensure reliable silicon.
π Screenshot placeholder: .lib
header with PVT info
At the top of .lib
you will see:
library (sky130_fd_sc_hd__tt_025C_1v80) {
technology (cmos);
delay_model : "table_lookup";
time_unit : "1ns";
voltage_unit : "1V";
power_unit : "1nW";
current_unit : "1mA";
resistance_unit : "1kohm";
capacitance_unit : "1pf";
operating_conditions("tt_025C_1v80") {
process : 1.0;
voltage : 1.8;
temperature : 25.0;
}
}
- Delay model: lookup table (delay depends on input slew + output load).
- Units: defined once, used throughout all cell definitions.
π Screenshot placeholder: Units section of .lib
Each cell starts with the keyword:
cell (cell_name) {
...
}
-
Example cells:
and2_0
,and2_2
,and2_4
(2-input AND gate, different strengths).a2111o_1
(complex AND-OR gate).
-
Each cell entry contains:
- Leakage power for all input combinations.
- Area.
- Pins: capacitance, direction, timing arcs.
- Power and timing tables.
π Snippet placeholder: .lib
showing a cell definition
Sky130 provides multiple drive strengths for the same function.
Cell | Area (Β΅mΒ²) | Delay | Power | Notes |
---|---|---|---|---|
and2_0 |
6.256 | Slowest | Lowest | Narrow transistors |
and2_2 |
7.500 | Medium | Medium | Balanced |
and2_4 |
8.750 | Fastest | Highest | Wider transistors |
- Wider transistors β faster, but more area + higher power.
- Smaller transistors β slower, but area/power efficient.
π Screenshot placeholder: .lib
entries of and2_0, and2_2, and2_4
Alongside .lib
, there are behavioral Verilog models used for Gate-Level Simulation (GLS).
Example: sky130_fd_sc_hd__and2_0.v
module and2_0 (input A, input B, output X);
assign X = A & B;
endmodule
-
.lib
= timing, power, area. -
.v
= logical functionality. -
Two variants:
- Without power pins: for functional simulation.
- With power pins (_pp.v): for power-aware simulation.
π Screenshot placeholder: Verilog model for and2_0
.lib
files = technology view of cells (timing, power, area).- PVT = process, voltage, temperature variations.
- Tools (Yosys, STA engines) use
.lib
for optimization and analysis. - Cell flavors balance area vs speed vs power.
- Always run multi-corner STA before tapeout.
πΌ Back to Table of Contents
This lab explains hierarchical synthesis, flat synthesis, and submodule-level synthesis in Yosys.
We use the file multiple_modules.v
for demonstration.
- sub_module1 β 2-input AND gate
- sub_module2 β 2-input OR gate
- Top module (
multiple_modules
):- Instantiates
U1
= sub_module1 (A, B β net1) - Instantiates
U2
= sub_module2 (net1, C β Y)
- Instantiates
// Submodule 1: AND gate
module sub_module1(input A, B, output Y);
assign Y = A & B;
endmodule
// Submodule 2: OR gate
module sub_module2(input A, B, output Y);
assign Y = A | B;
endmodule
// Top module
module multiple_modules(input A, B, C, output Y);
wire net1;
sub_module1 U1 (.A(A), .B(B), .Y(net1));
sub_module2 U2 (.A(net1), .B(C), .Y(Y));
endmodule
yosys
read_liberty -lib ../mylib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog multiple_modules.v
synth -top multiple_modules
abc -liberty ../mylib/sky130_fd_sc_hd__tt_025C_1v80.lib
show multiple_modules
write_verilog -noattr multiple_modules_hier.v
-
Hierarchy preserved:
multiple_modules
instantiatessub_module1
andsub_module2
.
-
Netlist contains 3 modules:
multiple_modules
,sub_module1
,sub_module2
. -
show
displays U1, U2 blocks instead of gate-level details.
- Expected β OR gate in sub_module2.
- Actual β NAND + 2 inverters (De Morganβs theorem).
Equation:
Y = A OR B
Y = ~(~A & ~B) β NAND + input inverters
-
CMOS libraries favor NAND/NOR because:
- NAND β stacked NMOS (better mobility).
- NOR/OR β stacked PMOS (slower, wider transistors, more area).
-
Tools optimize for drive strength and area.
π [Insert screenshot: hierarchical netlist view]
flatten
show multiple_modules
write_verilog -noattr multiple_modules_flat.v
- Hierarchies removed.
- Only one module β
multiple_modules
. - AND, NAND, and inverters instantiated directly.
- No U1, U2 instances.
π [Insert screenshot: flat netlist view]
read_liberty -lib ../mylib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog multiple_modules.v
synth -top sub_module1
abc -liberty ../mylib/sky130_fd_sc_hd__tt_025C_1v80.lib
show sub_module1
write_verilog -noattr submodule1_netlist.v
-
Only
sub_module1
synthesized (AND gate). -
Other parts of design ignored.
-
Useful for:
- Reuse β synthesize one multiplier, replicate for all instances.
- Divide-and-Conquer β break large SoCs into smaller blocks for better optimization.
π [Insert schematic: submodule-only netlist]
Mode | Output Netlist | Use Case |
---|---|---|
Hierarchical | Top + submodules preserved | Debug, modular IP, readability |
Flat | Single netlist, all gates | Final optimization, signoff |
Submodule-only | Only selected module mapped | Repeated IPs, large design blocks |
synth -top <mod>
β control synthesis scope.flatten
β remove hierarchy, merge into single-level netlist.- Libraries optimize OR as NAND + inverters for CMOS efficiency.
-
Netlists:
netlist/multiple_modules_hier.v
netlist/multiple_modules_flat.v
netlist/submodule1_netlist.v
-
Screenshots:
screenshots/hier_netlist.png
screenshots/flat_netlist.png
screenshots/submodule_netlist.png
-
Reports:
- Area/cell stats (hier vs flat)
- Note on De Morgan optimization
πΌ Back to Table of Contents
- Combinational logic has unequal path delays β causes glitches.
- Example:
Y = (A & B) | C
- If
A=0β1
,B=0β1
,C=1β0
, output may glitch low due to AND/OR delay mismatch.
- If
- With multiple stages, glitches propagate β unstable outputs.
- D flip-flops (flops):
- Capture data only at clock edge.
- Block glitches between stages.
- Provide stable outputs to next stage.
π Key: Place flops between combinational blocks to ensure stable timing.
πΈ [Insert glitch waveform screenshot here]
- Flops must start from a known value.
- Initialization via reset or set.
- Types:
- Asynchronous Reset/Set β works immediately, independent of clock.
- Synchronous Reset/Set β effective only at clock edge.
Q
resets immediately when reset = 1.- Sensitivity list:
clk
+reset
.
always @(posedge clk or posedge arst) begin
if (arst)
Q <= 1'b0;
else
Q <= D;
end
- Reset high β
Q=0
immediately. - Otherwise β
Q=D
on clock edge.
πΈ [Insert async reset waveform]
Q
sets to 1 immediately when set = 1.
always @(posedge clk or posedge aset) begin
if (aset)
Q <= 1'b1;
else
Q <= D;
end
πΈ [Insert async set waveform]
- Reset works only on clock edge.
- Sensitivity list:
clk
.
always @(posedge clk) begin
if (srst)
Q <= 1'b0;
else
Q <= D;
end
πΈ [Insert sync reset waveform]
- Async reset has highest priority.
- Sync reset checked at clock edge if async not active.
always @(posedge clk or posedge arst) begin
if (arst)
Q <= 1'b0;
else if (srst)
Q <= 1'b0;
else
Q <= D;
end
πΈ [Insert schematic of combined reset flop]
# Async reset
iverilog dff_async_reset.v tb_dff_async_reset.v
./a.out
gtkwave tb_dff_async_reset.vcd
# Async set
iverilog dff_async_set.v tb_dff_async_set.v
./a.out
gtkwave tb_dff_async_set.vcd
# Sync reset
iverilog dff_sync_reset.v tb_dff_sync_reset.v
./a.out
gtkwave tb_dff_sync_reset.vcd
- Async reset β Q drops immediately.
- Async set β Q rises immediately.
- Sync reset β Q changes only on clock edge.
πΈ [Insert GTKWave screenshots for each]
read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog dff_async_reset.v
synth -top dff_async_reset
dfflibmap -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
abc -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
-
Async Reset (active-high RTL vs active-low cell)
- Inverter inserted between reset and flop.
- Netlist:
reset -> inverter -> flop
.
-
Async Set
- Same as above if mismatch between RTL and library cell polarity.
-
Sync Reset
-
No sync-reset flop in library.
-
Implemented as:
- Simple DFF
- MUX/AND gate on
D
input.
wire d_mux = (srst) ? 1'b0 : D; sky130_fd_sc_hd__dfxtp_1 flop (.CLK(clk), .D(d_mux), .Q(Q));
-
-
Combined Async + Sync Reset
- Async reset pin connected to flop.
- Sync reset multiplexed into D-path.
πΈ [Insert Yosys schematics/netlist screenshots]
Y = A * 2
β{A, 1'b0}
Y = A * 4
β{A, 2'b00}
Y = A * 8
β{A, 3'b000}
- No cells inferred β wiring only.
πΈ [Insert Yosys stat screenshot showing 0 cells]
Y = A * 9
β(A << 3) + A
.- Optimized as
{A, A}
(concatenation). - No hardware required, wiring only.
πΈ [Insert Yosys schematic showing concatenation]
Flop Type | Behavior | Clock Dep. | Library Mapping |
---|---|---|---|
Async Reset | Immediate reset to 0 | No | Flop + inverter (if needed) |
Async Set | Immediate set to 1 | No | Flop + inverter (if needed) |
Sync Reset | Reset at clk edge | Yes | Flop + MUX/AND on D |
Combined | Async > Sync > Data | Mixed | Flop + inverter + mux |
- Flops block glitches and stabilize outputs.
- Always initialize flops using reset/set.
- Yosys maps to available library cells; if cell missing, logic is synthesized using gates.
- Multipliers by constants (powers of 2, 9 for 3-bit input) optimized to wiring.
- β Verilog code (src/)
- β Testbenches (tb/)
- πΈ Simulation waveforms (waves/)
- πΈ Yosys schematics & stat reports (screenshots/)
- π Observations (reports/)
.lib
files are the technology view of standard cells containing timing, power, and area data.- PVT (Process, Voltage, Temperature) corners define the operating conditions of libraries.
- Hierarchical synthesis preserves module boundaries; flat synthesis merges all logic; submodule-level synthesis allows block-level optimization and reuse.
- CMOS libraries prefer NAND/NOR implementations β tools may map OR to NAND + inverters for efficiency.
- Flip-flops are required to block glitches and stabilize signals across stages.
- Async resets/sets act immediately; sync resets work only at clock edge; combined flops give priority to async reset.
- Yosys maps flops to available SKY130 cells; missing features (like sync reset) are synthesized using mux/logic around DFF.
- Multiplication by constants (powers of 2, 9 for 3-bit case) is optimized to pure wiring without extra gates.
πΌ Back to Table of Contents
In digital design, logic optimization is applied to make circuits efficient in terms of area, power, and timing.
Optimizations are grouped into two categories:
- Combinational logic optimizations
- Sequential logic optimizations
Synthesis tools (like Yosys, Synopsys DC) perform many of these automatically.
- Reduce transistor count β save silicon area.
- Reduce power consumption.
- Improve performance by reducing delay.
If an input is tied to a constant (0
or 1
), the logic can collapse.
Original expression:
Y = (A Β· B + C)'
If A = 0
:
Y = (0 Β· B + C)' = (C)' = C'
Result:
- Gate network reduces to a single inverter.
- AOI gate realization: 6 MOS transistors
- Optimized inverter: 2 MOS transistors
- Savings: ~67% in area and power.
π [Insert schematic screenshot before/after]
Large boolean expressions can be reduced using K-map or Quine-McCluskey.
assign y = a ? (b ? c : (c ? a : 1'b0)) : ~c;
Simplification steps (tool or manual):
-
Break down ternary muxes into equations.
-
Simplify:
Y = AΜ Β·CΜ + AΒ·C
-
Final optimized form:
Y = A β C
Result:
- Complicated nested mux logic β single XOR gate.
- Shorter logic cone, faster evaluation, fewer cells.
π [Insert Yosys schematic/log screenshot]
If the D input of a flop is tied to a constant:
- With reset β
Q = 0
- Without reset β
Q = 0
- Q is always constant.
Result:
- Flop + downstream logic can be removed and replaced by constant driver.
always @(posedge clk or posedge rst)
if (rst) Q <= 1'b0;
else Q <= 1'b0;
- Here,
Q
is always 0 β can be optimized away.
π [Insert schematic/waveform]
always @(posedge clk or posedge set)
if (set) Q <= 1'b1;
else Q <= 1'b0;
-
Behavior:
set=1
β Q=1 immediately (async).set=0
β Q=0 only after next clock (sync).
-
Q is not constant.
-
This flop cannot be removed.
π [Insert waveform showing async vs sync behavior]
These are used in industrial flows, but not covered in lab.
Technique | Description | Purpose |
---|---|---|
State Optimization | Remove unused FSM states. | Reduce state bits, smaller FSM. |
Logic Cloning | Duplicate registers near sinks. | Reduce routing delay in large chips. |
Retiming | Move logic across flops. | Balance delays, increase f_max. |
- Original: Path delays = 5ns and 2ns β max freq = 200 MHz.
- After retiming: 4ns and 3ns β max freq = 250 MHz.
- Uses slack to increase performance.
π [Insert retiming diagram]
- Flop A drives two distant flops B and C.
- Routing delay too large.
- Solution: Clone flop A near B and C.
- Reduces long interconnect delay.
π [Insert cloning floorplan diagram]
- Constant propagation β replaces tied inputs with constants.
- Boolean simplification β collapses complex expressions.
- Sequential constant propagation β removes useless flops.
- Async set/reset cases often prevent optimization.
- Advanced techniques (state optimization, retiming, cloning) improve performance but need physical context.
πΌ Back to Table of Contents
- Learn how Yosys simplifies combinational RTL using:
- Constant propagation
- Boolean simplification
- Dead logic removal (
opt_clean -purge
)
- Verify that mux-based RTL reduces to minimal AND/OR gates.
opt_check.v
opt_check2.v
opt_check3.v
opt_check4.v
(exercise)multiple_modules_opt.v
(exercise, hierarchical)
Basic flow used for all files:
yosys
read_liberty -lib ../mylib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog <design>.v
synth -top <module_name>
opt_clean -purge # removes constants + unused cells
abc -liberty ../mylib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
write_verilog -noattr <module_name>_opt.v
Notes:
- Always run
opt_clean -purge
after synthesis. - Use
flatten
before optimization when the design has multiple modules.
RTL:
assign y = a ? b : 1'b0;
Simplification:
Y = AΒ·B + AΜ
Β·0 = AΒ·B
Expected: 2-input AND gate.
Result: sky130_fd_sc_hd__and2_0
π Schematic placeholder β AND gate
RTL:
assign y = a ? 1'b1 : b;
Simplification:
Y = AΒ·1 + AΜ
Β·B = A + B
Identity Used: Absorption Law (A + A'B = A + B
).
Expected: OR gate. Mapped Result: NAND + 2 inverters (De Morganβs theorem). Reason: CMOS prefers NAND/NOR cells (fewer stacked PMOS).
π Schematic placeholder β NAND + inverters implementing OR
RTL:
assign y = a ? (c ? b : 1'b0) : 1'b0;
Simplification:
Inner mux: (C ? B : 0) = BΒ·C
Outer mux: (A ? (BΒ·C) : 0) = AΒ·BΒ·C
Expected: 3-input AND gate.
Result: sky130_fd_sc_hd__and3_0
π Schematic placeholder β AND3 gate
- Run the same flow.
- Verify optimized gate-level result.
- Insert schematic screenshot.
Special Handling:
flatten
opt_clean -purge
abc -liberty ../mylib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
- Flattening required β allows optimizer to see across module boundaries.
- Without flatten β redundant logic may remain.
π Schematic placeholder β flattened optimized netlist
Module | RTL Expression | Simplified Logic | Gate Mapped |
---|---|---|---|
opt_check.v |
a ? b : 1'b0 |
AΒ·B |
AND2 |
opt_check2.v |
a ? 1'b1 : b |
A + B |
OR2 (NAND+inverters) |
opt_check3.v |
a ? (c ? b : 0) : 0 |
AΒ·BΒ·C |
AND3 |
opt_check4.v |
β exercise β | TBD | TBD |
multiple_modules_opt |
hierarchical modules | Flatten + reduce | Optimized flat gates |
opt_clean -purge
is essential for constant removal and cell cleanup.- Boolean simplification directly maps RTL mux into single gates.
- OR gates are not mapped directly in SKY130 β instead realized using NAND + inverters.
- Flattening is mandatory for multi-module optimization.
- Optimized netlists use fewer cells and reduce area.
πΌ Back to Table of Contents
- Learn how synthesis tools optimize flip-flops with constant inputs.
- Distinguish between:
- Sequential constants β flop output depends on clock (flop kept).
- Combinational constants β flop output is always constant (flop removed).
- Verify with simulation and Yosys synthesis.
- Source:
dff_const1.v
,dff_const2.v
,dff_const3.v
- Exercises:
dff_const4.v
,dff_const5.v
- Testbenches:
tb_dff_const*.v
yosys
read_liberty -lib ../mylib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog dff_constX.v
synth -top dff_constX
dfflibmap -liberty ../mylib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
abc -liberty ../mylib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
write_verilog -noattr dff_constX_opt.v
dfflibmap
is required β maps inferred flops to standard cells.- SKY130 flops use active-low reset/set β if RTL uses active-high, tool inserts inverter.
always @(posedge clk or posedge reset)
if (reset) q <= 0;
else q <= 1;
-
Behavior:
- Reset β
q=0
. - After reset deassert β
q=1
only on next clock edge.
- Reset β
-
Result: Flop preserved (q is not constant).
π Waveform: q
waits for clock.
π Synthesis: 1 DFF + inverter on reset path (library expects active-low).
always @(posedge clk or posedge reset)
if (reset) q <= 1;
else q <= 1;
-
Behavior:
- Reset β
q=1
. - Normal operation β
q=1
.
- Reset β
-
Result:
q
is constant1
. Flop optimized away.
π Waveform: q
always high.
π Synthesis: no flop, direct tie to 1
.
always @(posedge clk or posedge reset) begin
if (reset) begin
q1 <= 0;
q <= 1;
end else begin
q1 <= 1;
q <= q1;
end
end
-
Behavior:
q1
: 0 on reset, then 1 after next clock.q
: 1 on reset, then followsq1
.- Effect:
q
dips to 0 for exactly one clock cycle.
-
Result: Both flops preserved.
π Waveform: one-cycle glitch on q
.
π Synthesis: 2 DFFs, inverters on set/reset.
Design | Behavior | Optimization |
---|---|---|
dff_const1 | q syncs with clk after reset | Flop kept |
dff_const2 | q = 1 always | Flop removed |
dff_const3 | two flops, one-cycle glitch | Both kept |
dff_const4 | exercise | TBD |
dff_const5 | exercise | TBD |
- A flop is removed only if its output is a true constant.
- If output depends on clock edge, flop must remain.
- Active-high resets/sets β synthesis inserts inverter (SKY130 cells use active-low).
- Always confirm with simulation + yosys statistics.
πΌ Back to Table of Contents
- In synthesis, any signal not driving a primary output is removed.
- This applies to:
- Unused register bits
- Unused combinational logic feeding them
- Tools perform fanout tracing: if a node does not contribute to outputs, it is optimized out.
module counter_opt(input clk, input rst, output q);
reg [2:0] count;
always @(posedge clk or posedge rst)
if (rst) count <= 3'b000;
else count <= count + 1;
assign q = count[0]; // only LSB used
endmodule
- 3-bit up counter β rolls over at 7.
- Output
q
= LSB (count[0]
). count[1]
andcount[2]
unused.
- 1 DFF kept (
count[0]
). - 2 DFFs removed (
count[1:2]
). - Incrementer logic also removed.
- Remaining circuit: toggle flop β
D = ~Q
. - Area/power minimized.
module counter_opt2(input clk, input rst, output q);
reg [2:0] count;
always @(posedge clk or posedge rst)
if (rst) count <= 3'b000;
else count <= count + 1;
assign q = (count == 3'b100); // uses all bits
endmodule
- Output
q
= 1 whencount = 4 (100)
. - Depends on all 3 bits.
-
3 DFFs kept (
count[0]
,count[1]
,count[2]
). -
Incrementer logic preserved (adder-like).
-
Comparator implemented as:
q = count[2] & ~count[1] & ~count[0]
Mapped to 3-input NOR with one inverted input.
read_liberty -lib ../mylib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog counter_opt.v # or counter_opt2.v
synth -top counter_opt
dfflibmap -liberty ../mylib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
abc -liberty ../mylib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
Design | Flops Kept | Comb. Logic | Notes |
---|---|---|---|
counter_opt |
1 | 1 inverter | Toggle flop, unused logic removed |
counter_opt2 |
3 | Adder + NOR | All flops and logic preserved |
- Rule: Only logic in the transitive fanout of primary outputs is kept.
- Unused register bits and their feeding logic are removed.
- This reduces area and power.
- Always check synthesis reports and netlists to confirm optimizations.
- Combinational optimizations: constant propagation, Boolean simplification, and dead logic removal reduce gate count; flattening enables cross-module cleanup.
- Sequential optimizations: flops with constant outputs are removed; clock-dependent flops are preserved; async set/reset may block optimization.
- Unused outputs: only logic driving primary outputs is kept; unused flops and their feeding logic are pruned.
- Technology mapping: Yosys uses
dfflibmap
+abc
to map optimized logic and flops to SKY130 standard cells. - Advanced methods (theory): state optimization, retiming, and logic cloning improve performance and timing closure in large designs.
- Key outcome: synthesis aggressively minimizes area/power while preserving functional intent, confirmed by GLS.
πΌ Back to Table of Contents
- Gate-Level Simulation (GLS) = simulate the synthesized netlist with the same testbench used for RTL.
- The netlist is the RTL translated into standard cells (e.g.,
AND2X1
,DFF
). - I/O ports are identical β testbench works without change.
- Verify logical correctness after synthesis.
- Catch synthesis-simulation mismatches (SSM).
- Validate timing if delay-annotated models are used (
SDF
).
# Compile: netlist + testbench + std-cell models
iverilog -o gls_sim \
design_netlist.v \
tb_design.v \
../mylib/VerilogModel/*.v
# Run simulation β generates dump.vcd
vvp gls_sim
# View waveform
gtkwave dump.vcd
- Extra step vs RTL sim: include std-cell Verilog models (so simulator knows what gates mean).
- Missing sensitivity list
- Blocking (
=
) vs Non-blocking (<=
) misuse - Non-standard Verilog coding
Buggy MUX RTL:
// β Wrong: ignores I0, I1 changes
always @(sel) begin
if (sel) Y = I1;
else Y = I0;
end
- Problem:
Y
updates only whensel
changes. - Simulation looks latch-like.
- Synthesis builds correct MUX. β Mismatch.
Fix:
// β
Correct: reacts to all inputs
always @(*) begin
if (sel) Y = I1;
else Y = I0;
end
1. Blocking (=
)
- Executes in order, like C.
- Risk: order-dependent bugs in sequential logic.
Example β shift register:
// β Wrong: only one flop
always @(posedge clk) begin
q0 = D;
q = q0; // q0 already got D
end
2. Non-Blocking (<=
)
- Executes in parallel.
- Use for sequential circuits.
// β
Correct: two flops
always @(posedge clk) begin
q0 <= D;
q <= q0;
end
// β Wrong: Y uses old Q0
always @(*) begin
Y = Q0 & C;
Q0 = A | B;
end
- Simulation:
Y
uses stale Q0, looks like a flop. - Synthesis: no flop β mismatch.
- Fix: reorder, or use non-blocking.
- Sequential logic β always use non-blocking (
<=
). - Combinational logic β blocking is ok, but watch ordering.
- Always use
always @(*)
for combinational sensitivity lists. - Run GLS to ensure netlist matches RTL intent.
πΌ Back to Table of Contents
- Run Gate-Level Simulation (GLS) using netlists and standard-cell models.
- Compare RTL simulation vs GLS.
- Show a synthesis-simulation mismatch (SSM) caused by missing sensitivity list.
To run GLS, we need:
- Netlist (
*_net.v
) from synthesis. - Testbench (same as RTL sim).
- Standard-cell Verilog models (
../mylib/verilog_model/*.v
).
RTL Simulation
iverilog design.v tb_design.v
./a.out
gtkwave dump.vcd
Synthesis with Yosys
read_liberty -lib ../mylib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog design.v
synth -top <top_module>
abc -liberty ../mylib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
write_verilog -noattr design_net.v
GLS
iverilog -o gls_sim \
../mylib/verilog_model/primitives.v \
../mylib/verilog_model/sky130_fd_sc_hd.v \
design_net.v tb_design.v
./gls_sim
gtkwave dump.vcd
RTL Code
module ternary_operator_mux(input I0, I1, sel, output Y);
assign Y = sel ? I1 : I0;
endmodule
RTL Simulation
Y = I0
whensel=0
.Y = I1
whensel=1
.- Hierarchy shows only
UUT
(no standard cells).
Synthesis
-
Netlist maps into NAND, INV, OAI cells.
-
Boolean equation:
Y = (~sel & I0) | (sel & I1)
GLS
- Gate instances like
_6
,_7
,_8
appear. - Behavior matches RTL.
- No mismatch.
πΈ Insert waveform screenshots here
RTL Code (Buggy)
module badmux(input I0, I1, sel, output reg Y);
always @(sel) begin
if (sel) Y = I1;
else Y = I0;
end
endmodule
Problem
- Sensitivity list has only
sel
. Y
does not update whenI0
orI1
change.- Simulation looks like latch/flop behavior.
RTL Simulation
Y
updates only whensel
toggles.- Ignores activity on
I0
andI1
.
Synthesis
- Yosys infers correct mux logic (ignores sensitivity list).
- Netlist is same as
ternary_operator_mux
.
GLS
-
Behavior matches real mux:
sel=0
βY=I0
sel=1
βY=I1
-
Confirms mismatch with RTL sim.
πΈ Insert waveform screenshots here
Design | RTL Sim Result | GLS Result | Notes |
---|---|---|---|
ternary_operator_mux |
Correct mux | Correct mux | No mismatch |
badmux (buggy) |
Latch-like behavior | Correct mux | Sensitivity list missing I0,I1 |
- RTL simulators depend on sensitivity lists.
- Synthesis ignores them; hardware is inferred correctly.
- Mismatch occurs when RTL sim β synthesized hardware.
- Always use
always @(*)
for combinational logic. - GLS is mandatory to validate post-synthesis behavior.
πΌ Back to Table of Contents
- Show how blocking assignments (
=
) in combinational logic can cause synthesis-simulation mismatch (SSM). - Compare RTL simulation vs Gate-Level Simulation (GLS).
- Demonstrate why GLS must always be trusted over RTL sim.
We want to implement:
D = (A | B) & C
This should be pure combinational logic.
module blocking_caveat (input A, B, C, output reg D);
reg X;
always @(*) begin
D = X & C; // uses old value of X
X = A | B; // X updated after D
end
endmodule
- Problem: Blocking assignments execute sequentially.
D
is computed beforeX
is updated, soD
uses the previous value of X.- RTL simulation shows flop-like behavior even though no flop exists.
Command:
iverilog blocking_caveat.v tb_blocking_caveat.v
./a.out
gtkwave dump.vcd
Observation:
- Output
D
sometimes goes high when bothA=0
andB=0
. - This happens because
D
uses stale X from the previous cycle. - RTL sim incorrectly shows a registered version of X.
π Screenshot placeholder: screenshots/blocking_rtl_sim.png
Yosys flow:
read_liberty -lib ../mylib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog blocking_caveat.v
synth -top blocking_caveat
abc -liberty ../mylib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
write_verilog -noattr blocking_caveat_net.v
Netlist result:
- Simple OR2 + AND2 gate network.
- No flop, no latch.
- Implements correct combinational logic:
D = (A | B) & C
.
π Screenshot placeholder: screenshots/blocking_netlist.png
Command:
iverilog -o gls_sim \
../mylib/verilog_model/primitives.v \
../mylib/verilog_model/sky130_fd_sc_hd.v \
blocking_caveat_net.v tb_blocking_caveat.v
./gls_sim
gtkwave dump.vcd
Observation:
- GLS waveform matches Boolean expectation.
- Example:
A=0, B=0, C=1 β D=0
(correct). - No stale value effect.
π Screenshot placeholder: screenshots/blocking_gls.png
Inputs (A,B,C) | Expected D | RTL Sim D (Buggy) | GLS D (Correct) |
---|---|---|---|
0,0,1 | 0 | 1 β | 0 β |
1,0,1 | 1 | 0 β | 1 β |
- RTL sim misbehaves because it evaluates in execution order, not logic order.
- GLS shows correct behavior because synthesis builds gates, not sequential semantics.
π Screenshot placeholder: screenshots/blocking_comparison.png
- Blocking assignments update variables immediately, in sequence.
- In combinational always blocks, this can lead to simulation artifacts where values appear flopped.
- Synthesis tools, however, see only the logic equation β pure combinational netlist.
- Mismatch arises between RTL sim and GLS.
wire X = A | B;
assign D = X & C;
always @(*) begin
X = A | B;
D = X & C;
end
Not recommended for pure combinational paths.
- Blocking assignments can cause false sequential behavior in simulation.
- Always prefer
wire
+assign
for combinational logic. - Use non-blocking (
<=
) for sequential (flop-based) logic. - GLS is the reference β it matches hardware behavior.
- Run GLS to catch mismatches before tape-out.
- GLS is mandatory: validates synthesized netlist against RTL using the same testbench.
- Extra step in GLS: include standard-cell Verilog models (
sky130_fd_sc_hd.v
,primitives.v
). - SSM (Simulation-Synthesis Mismatch) occurs due to:
- Missing sensitivity lists in combinational blocks.
- Misuse of blocking (
=
) in sequential logic. - Ordering issues in combinational always blocks.
- RTL simulator vs synthesis:
- RTL sim depends on coding style (sensitivity list, assignment type).
- Synthesis infers hardware equations, ignores sensitivity list bugs.
- Key rules:
- Use
always @(*)
for all combinational logic. - Use non-blocking (
<=
) for sequential logic. - Avoid blocking assignment ordering pitfalls in combinational blocks.
- Use
- Trust GLS over RTL sim: GLS reflects real hardware mapped to standard cells.
πΌ Back to Table of Contents
always @(*) begin
if (cond1) Y = C1; // highest priority
else if (cond2) Y = C2;
else if (cond3) Y = C3;
else Y = E; // lowest priority
end
- Hardware: Priority MUX chain
- Only the first true condition executes
- Common for: priority encoders, control signals
π [Placeholder: screenshots/if_priority_mux.png
]
always @(*) begin
if (cond1) Y = A;
else if (cond2) Y = B;
// no else β latch inferred
end
- If
cond1=0
andcond2=0
β tool must hold previousY
- Synthesizer inserts a transparent latch (unwanted in combinational logic)
π [Placeholder: screenshots/inferred_latch_if.png
]
always @(posedge clk or posedge rst) begin
if (rst) count <= 3'b000;
else if (en) count <= count + 1;
// no else β counter holds value (intended behavior)
end
- Here, latch is intended because the flop naturally stores state
- Used in counters, registers, FSM state holding
π [Placeholder: screenshots/counter_flop.png
]
always @(*) begin
case (sel)
2'b00: Y = C1;
2'b01: Y = C2;
2'b10: Y = C3;
2'b11: Y = C4;
default: Y = 0; // β
recommended
endcase
end
- Hardware: 4:1 MUX
- Each branch is parallel, no priority
π [Placeholder: screenshots/case_mux.png
]
case (sel) // sel is 2-bit
2'b00: Y = A;
2'b01: Y = B;
// missing 2'b10, 2'b11 β latch
endcase
β
Fix: Always add default
.
case (sel)
2'b00: begin X = A; Y = B; end
2'b01: begin X = C; end // β Y unassigned β latch
default: begin X = D; Y = B; end
endcase
β Fix: Assign all outputs in every branch.
case (sel)
2'b10: Y = C1;
2'b1?: Y = C2; // overlaps with 2'b10 & 2'b11
endcase
- For
sel=2'b10
: both branches match β order-dependent, unpredictable - Unlike
if-else
,case
does not exit early
β Fix: Ensure mutually exclusive case items.
π [Placeholder: screenshots/case_overlap.png
]
Construct | Hardware Inferred | Priority? | Risks | Best Practice |
---|---|---|---|---|
IfβElse | Priority MUX | Yes | Incomplete β latch | Always close with else |
Case | Multiplexer | No | Incomplete, partial, overlap | Use default , assign all outputs |
- IfβElse = use when priority matters
- Case = use for parallel selection
- Always add
else
ordefault
- Assign all outputs in all branches
- Avoid overlapping cases
πΌ Back to Table of Contents
- Show how incomplete
if
constructs create inferred latches. - Compare RTL simulation vs synthesis behavior.
- Verify results with Icarus Verilog (simulation) and Yosys + SKY130 PDK (synthesis).
module incomplete_if(input I0, input I1, input I2, output reg Y);
always @(*) begin
if (I0) Y = I1;
// else missing β latch inferred
end
endmodule
- Inputs:
I0
,I1
- Output:
Y
(reg) - Issue: Missing
else
β output is not defined whenI0=0
. - Result: Tools preserve state using transparent D latch.
if
statement β MUX withI0
as select.- Missing
else
β output must hold last value β latch behavior. - Equivalent hardware: D latch (
EN = I0
,D = I1
,Q = Y
).
Command:
iverilog incomplete_if.v tb_incomplete_if.v -o sim.out
./sim.out
gtkwave incomplete_if.vcd
Observed behavior:
I0=1
βY = I1
(transparent).I0=0
βY
holds previous value.
π [Insert waveform screenshot: incomplete_if_sim.png
]
Command:
yosys> read_liberty -lib sky130_fd_sc_hd__tt_025C_1v80.lib
yosys> read_verilog incomplete_if.v
yosys> synth -top incomplete_if
yosys> abc -liberty sky130_fd_sc_hd__tt_025C_1v80.lib
yosys> show
Result:
-
1 D latch inferred:
- Enable =
I0
- D =
I1
- Q =
Y
- Enable =
π [Insert Yosys schematic screenshot: incomplete_if_synth.png
]
module incomplete_if2(input I0, input I1, input I2, input I3, output reg Y);
always @(*) begin
if (I0) Y = I1;
else if (I2) Y = I3;
// else missing β latch inferred
end
endmodule
- Inputs:
I0
,I1
,I2
,I3
- Output:
Y
(reg) - Issue: No final
else
. - Result: If
I0=0
andI2=0
, thenY
is latched.
-
Priority MUX chain:
I0=1
βY = I1
I0=0, I2=1
βY = I3
- Else β latched
-
Latch enable =
(I0 | I2)
-
Data input =
(I0 ? I1 : I3)
Command:
iverilog incomplete_if2.v tb_incomplete_if2.v -o sim.out
./sim.out
gtkwave incomplete_if2.vcd
Observed behavior:
I0=1
βY = I1
.I0=0, I2=1
βY = I3
.I0=0, I2=0
βY
latches previous value.
π [Insert waveform screenshot: incomplete_if2_sim.png
]
Command:
yosys> read_liberty -lib sky130_fd_sc_hd__tt_025C_1v80.lib
yosys> read_verilog incomplete_if2.v
yosys> synth -top incomplete_if2
yosys> abc -liberty sky130_fd_sc_hd__tt_025C_1v80.lib
yosys> show
Result:
-
Yosys infers:
- D latch with enable =
(I0 | I2)
. - Data =
MUX(I0?I1:I3)
. - Q =
Y
.
- D latch with enable =
π [Insert Yosys schematic screenshot: incomplete_if2_synth.png
]
- Incomplete if β inferred latch.
- Both simulation and synthesis confirm latch insertion.
- Latch enable = OR of all tested conditions.
- Dangers: Latches are level-sensitive β timing issues, glitches, unintended storage.
- Best practice: Always code a final else in combinational logic.
Corrected code (no latch):
always @(*) begin
if (I0) Y = I1;
else if (I2) Y = I3;
else Y = 1'b0; // defined default
end
β Synthesizes to pure mux logic, no latch.
πΌ Back to Table of Contents
This section explores Verilog case statements and the problems that occur when they are incomplete, partially assigned, or overlapping.
We cover 4 scenarios:
- Incomplete case (no default)
- Complete case (with default)
- Partial assignment case
- Overlapping case
File: incomplete_case.v
always @* begin
case (sel)
2'b00: y = i0;
2'b01: y = i1;
// missing sel=10, 11 β no default
endcase
end
- Inputs:
i0
,i1
,i2
,sel[1:0]
- Output:
y
sel=00 β y=i0
sel=01 β y=i1
sel=10
or11 β y holds previous value
(latch)
Latch Enable Condition: ~sel[1]
D Input Logic: y = sel[0] ? i1 : i0
iverilog incomplete_case.v tb_incomplete_case.v
./a.out
gtkwave tb_incomplete_case.vcd
- Matches expected behavior: latch when
sel[1]=1
.
read_verilog incomplete_case.v
synth -top incomplete_case
abc -liberty sky130.lib
show
- Latch inferred
- Q driven by
y
- Enable =
~sel[1]
- D =
mux(i0, i1, sel[0])
File: comp_case.v
always @* begin
case (sel)
2'b00: y = i0;
2'b01: y = i1;
default: y = i2;
endcase
end
- Complete case β no latches
sel=00 β y=i0
sel=01 β y=i1
sel=10/11 β y=i2
iverilog comp_case.v tb_comp_case.v
./a.out
gtkwave tb_comp_case.vcd
- Pure combinational 4-to-1 MUX observed
read_verilog comp_case.v
synth -top comp_case
abc -liberty sky130.lib
show
- No latches
- Only combinational gates (AND/OR/INV)
- Reduces to 4:1 MUX logic
File: partial_case_assign.v
always @* begin
case (sel)
2'b00: begin y = i0; x = i0; end
2'b01: y = i1; // x not assigned
default: begin y = i2; x = i1; end
endcase
end
- Inputs:
i0, i1, i2, sel[1:0]
- Outputs:
x, y
y
: complete β no latchx
: missing assignment atsel=01
β latch
sel[1] | sel[0] | x behavior |
---|---|---|
0 | 0 | i0 |
0 | 1 | latch |
1 | 0 | latch |
1 | 1 | i1 |
Latch Enable Condition:
sel[1] + ~sel[0]
read_verilog partial_case_assign.v
synth -top partial_case_assign
abc -liberty sky130.lib
show
y
: pure combinational pathx
: latch inferred with enable =sel[1] + ~sel[0]
File: bad_case.v
always @* begin
case (sel)
2'b00: y = i0;
2'b01: y = i1;
2'b10: y = i2;
2'b1?: y = i3; // overlaps with 10 and 11
endcase
end
sel=10
matches both2'b10
and2'b1?
- Different simulators and synthesizers may resolve differently
- Leads to simulationβsynthesis mismatch
iverilog bad_case.v tb_bad_case.v
./a.out
gtkwave tb_bad_case.vcd
y
may latch or show undefined behavior atsel=10
read_verilog bad_case.v
synth -top bad_case
abc -liberty sky130.lib
write_verilog bad_case_net.v
iverilog ../my_lib/primitives.v \
../my_lib/sky130_cells.v \
bad_case_net.v tb_bad_case.v
./a.out
gtkwave bad_case.vcd
- Synthesized netlist behaves as clean 4:1 MUX
- But mismatch vs RTL sim
- Always use
default
in case statements - Assign all outputs in all case branches
- Avoid overlapping case conditions
- Use GLS to verify actual hardware behavior
- Remember: incomplete = latches, overlapping = mismatches
πΌ Back to Table of Contents
In Verilog, loop constructs are used to either:
Construct | Location | Purpose | Synthesizable Hardware |
---|---|---|---|
for loop |
Inside always block |
Evaluate logic repeatedly | β Yes (single logic structure) |
for-generate |
Outside always |
Instantiate hardware repeatedly | β Yes (replicated instances) |
for
is for logic evaluation, not hardware replication.for-generate
is for replicating modules or gates, not evaluating expressions.
- Used inside
always
blocks (always @*
oralways @(posedge clk)
). - Repeats a logic check or assignment.
- Synthesizes to one hardware block, not multiple instances.
- Good for wide MUX, DEMUX, or priority logic.
module mux32 (
input [31:0] in,
input [4:0] sel,
output reg y
);
integer i;
always @(*) begin
y = 1'b0;
for (i = 0; i < 32; i = i + 1) begin
if (i == sel)
y = in[i];
end
end
endmodule
β Behavior:
- Loops over all 32 inputs.
- Assigns
y
fromin[sel]
.
β Why it matters:
- Changing
32
β256
instantly scales to 256:1 MUX. - No replication β synthesizer collapses it to one MUX.
module demux8 (
input din,
input [2:0] sel,
output reg [7:0] dout
);
integer i;
always @(*) begin
dout = 8'b0;
for (i = 0; i < 8; i = i + 1) begin
if (i == sel)
dout[i] = din;
end
end
endmodule
β Behavior:
- Initializes all outputs to
0
. - Drives only
dout[sel] = din
.
β Synthesis:
- Compiler infers 8 AND gates with decoder logic.
- Used outside
always
blocks. - Each loop iteration creates real hardware.
- Must use a
genvar
variable. - Common for repetitive module instantiation or bit-slice design.
module and_array (
input [7:0] a,
input [7:0] b,
output [7:0] y
);
genvar i;
generate
for (i = 0; i < 8; i = i + 1) begin : gen_and
and u_and (.a(a[i]), .b(b[i]), .y(y[i]));
end
endgenerate
endmodule
β Behavior:
- Instantiates 8 separate AND gates.
- Each instance has its own connectivity.
β Synthesis:
- Hardware contains 8 physical gates, each independent.
module rca #(
parameter N = 8
)(
input [N-1:0] a,
input [N-1:0] b,
output [N-1:0] sum,
output cout
);
wire [N:0] carry;
assign carry[0] = 1'b0;
genvar i;
generate
for (i = 0; i < N; i = i + 1) begin : gen_fa
full_adder fa (
.a(a[i]),
.b(b[i]),
.cin(carry[i]),
.sum(sum[i]),
.cout(carry[i+1])
);
end
endgenerate
assign cout = carry[N];
endmodule
β Behavior:
- Replicates
full_adder
N times. - Carry chains between stages.
- Perfect for scalable adders, multipliers, or shift-register chains.
Sometimes hardware should only exist under certain conditions (e.g., parameter-controlled).
generate
if (WIDTH == 8) begin
and8 u_and8 (.a(a), .b(b), .y(y));
end else begin
and16 u_and16 (.a(a), .b(b), .y(y));
end
endgenerate
β Usage:
- Used for parameterized designs.
- Synthesizer includes only the block that matches the condition.
β Do:
- Use
for
for iterative logic insidealways
. - Use
for-generate
for replicating instances outsidealways
. - Name generate blocks (
: label
) for better synthesis hierarchy. - Keep loop bounds static and constant.
β Donβt:
- Use
for-generate
insidealways
β syntax error. - Use non-constant loop limits β synthesis may fail.
- Forget
genvar
β simulation/synthesis mismatch possible.
Feature | for (in always ) |
for-generate (outside always ) |
---|---|---|
Location | Inside always |
Outside always |
Purpose | Expression evaluation | Hardware replication |
Hardware generated | Single logic structure | Multiple instances |
Example | MUX, DEMUX | Adders, Gate Arrays, FIFOs |
Requires genvar |
β No | β Yes |
Loop bounds | Must be static | Must be static |
for
β use it for logic evaluation insidealways
. Synthesizes to one hardware block.for-generate
β use it for hardware replication outsidealways
. Synthesizes to many hardware instances.if-generate
β use it for conditional instantiation based on parameters.
These constructs are fundamental for scalable RTL design, parameterized modules, and clean structural code.
πΌ Back to Table of Contents
Files:
- RTL:
mux_generate.v
- TB:
tb_mux_generate.v
- Input:
i0..i3
, selectsel[1:0]
, outputy
. - Internal bus
i_int[3:0]
groups the inputs. for
loop iterates 0β3, checks ifk == sel
, assignsy = i_int[k]
.- Functionally equivalent to a case statement, but scales easily.
iverilog mux_generate.v tb_mux_generate.v -o a.out
./a.out
gtkwave dump.vcd
Expected waveform:
sel=0 β y=i0
sel=1 β y=i1
sel=2 β y=i2
sel=3 β y=i3
π Note:
Case statement requires N
lines for N:1
MUX.
For-loop version stays at ~4 lines for any size.
Files:
- RTL (case):
dmux_case.v
- RTL (for):
dmux_for.v
- TB:
tb_dmux.v
- Input:
in
, selectsel[2:0]
, outputsO[7:0]
. - Outputs initialized to
0
to avoid inferred latches. for
loop sets onlyO[sel] = in
.
Case-based:
iverilog dmux_case.v tb_dmux.v -o a.out
./a.out
gtkwave dump.vcd
Loop-based:
iverilog dmux_for.v tb_dmux.v -o a.out
./a.out
gtkwave dump.vcd
Expected waveform:
sel=n β On = in
, all others0
.
π Note: Case-based scales to hundreds of lines for large DMUX. For-loop version stays compact (~13 lines).
Files:
- FA module:
fa.v
- RCA module:
rca.v
- TB:
tb_rca.v
-
Full Adder: 3 inputs (a, b, cin), 2 outputs (sum, cout).
-
RCA:
- Add two 8-bit numbers β 9-bit result.
- First FA instance:
cin=0
. - Remaining instances created with
for-generate
. - Carries chained automatically:
carry[i+1] = cout[i]
.
iverilog fa.v rca.v tb_rca.v -o a.out
./a.out
gtkwave dump.vcd
Expected behavior:
- Example:
221 + 93 = 314
- Matches rule:
n-bit + n-bit β (n+1)-bit
π Note:
Use genvar
(not integer
) for generate loops.
-
For-loop (inside always):
- Used for logic evaluation.
- Good for scalable MUX/DMUX.
-
For-generate (outside always):
- Used for hardware replication.
- Good for arrays of adders, multipliers, etc.
-
Best practices:
- Initialize outputs to avoid latches.
- Label generate blocks for readability.
- Prefer loops over huge case statements for scalable designs.
-
Synthesize each design with
yosys
+ SKY130 library. -
Compare RTL sim vs GLS waveforms.
-
Use
yosys stat
to check gate count and area. -
Extend designs:
- 256Γ1 MUX
- 1Γ256 DMUX
- 32-bit RCA
Check how loop constructs reduce code size.
-
IfβElse vs Case
ifβelse
β priority logic (MUX chain).case
β parallel logic (multiplexer).- Missing
else
ordefault
β latch inferred. - Overlapping
case
items β simulationβsynthesis mismatch.
-
Latch Risks
- Latches are level-sensitive β timing issues, glitches.
- Avoid by always assigning all outputs in all branches.
-
Best Practices
- Use
ifβelse
only when priority is needed. - Use
case
for selection logic. - Always add a final
else
ordefault
. - Ensure no overlapping case items.
- Use
-
Loop Constructs
for
(insidealways
) β logic evaluation, scales MUX/DMUX.for-generate
(outsidealways
) β hardware replication, scalable structures (e.g., adders).- Always use static bounds and
genvar
for generate loops.
-
Verification
- Incomplete constructs cause inferred latches (confirmed in simulation and synthesis).
- RTL sim β synthesis netlist if code is incomplete or overlapping.
- Always cross-check with GLS (Gate Level Simulation).
- Simulators are event-driven: Outputs change only when inputs toggle. VCD captures transitions, not static states.
- Synthesis maps RTL to cells: Netlist uses NAND, NOR, DFF, etc. Testbench reuse ensures GLS validates hardware vs RTL.
- Optimizations are automatic: Yosys prunes unused logic, propagates constants, simplifies Boolean expressions.
- Simulation-synthesis mismatches (SSM):
- Missing sensitivity lists
- Blocking assignments in combinational logic
- Incomplete or overlapping case statements
- GLS is mandatory: RTL sim alone is insufficient. GLS checks timing, resets, and real cell behavior.
- Correct coding prevents bugs: Always cover all branches, initialize outputs, use non-blocking for flops.
- Loops scale efficiently: Case-based 256:1 mux = hundreds of lines; loop-based = a few lines. For-generate replicates hardware cleanly.
πΌ Back to Table of Contents