|
slang-netlist
0.9.0
|
To setup a local build configuration, for example if you are using VSCode's CMake integration, create a CMake user preset that in CMakeUserPresets.json in the root directory of the project. This user preset can inherit from one of the standard ones in CMakePresets.json and add its own variables. For example:
Then to use this preset:
To build the documentation install the Python dependencies:
Pre-commit runs a set of lint checks on files when creating git commits, with the config in .pre-commit.
You can install the hooks explicitly with:
And you can run pre-commit checks on all files with:
To update the versions of the hooks used in the pre-commit configuration file:
You can use Docker to create an Ubuntu development environment, eg:
This is useful to debug issues seen in CI, especially when developing on MacOS.
To run the unit tests:
Report code coverage:
Profile performance using cachegrind:
The test suite includes tests using code from external projects, enabled with -DENABLE_EXTERNAL_TESTS=ON (this is the default).
RTLmeter** is a collection of large reference designs including BlackParrot, Caliptra, NVDLA, OpenPiton, OpenTitan, Servant, VeeR, Vortex, XiangShan and XuanTie; sourced from verilator/rtlmeter. It is fetched via CPM at configure time and each design is elaborated through the full netlist construction pipeline.
To run only the external tests:
The RTLmeter suite runs one unittest per design; failures print the slang-netlist stderr so the root cause is visible directly in the test output. Each design has a 5-minute timeout and the overall test has a 1-hour timeout. Because these are large, real-world designs, some may fail due to slang-netlist limitations and are useful for tracking coverage progress.
To skip the external tests entirely (e.g. for faster iteration):
The library builds a directed dependency graph (the "netlist") over an elaborated SystemVerilog AST provided by slang. The graph captures source-level static connectivity at bit-level granularity. The main components are described below.
The graph is built on a generic DirectedGraph<NodeType,EdgeType> template that stores nodes in an adjacency list of std::unique_ptr<NodeType>. Each edge is single-owned by its source node via std::unique_ptr<EdgeType> in outEdges; the target node holds a raw back-pointer in inEdges. A per-node outEdgeIndex (a NetlistNode* → used to dedupe parallel edges) is allocated lazily only once a node's out-degree exceeds NetlistEdge* mapoutEdgeIndexThreshold, so low-fan-out nodes pay no per-node map overhead. The netlist specialises this as NetlistGraph, holding NetlistNode and NetlistEdge objects.
NetlistNode is a polymorphic base with a NodeKind discriminator. Concrete subtypes are:
Port — an input or output port of a module instance.Variable — a net or variable declaration.State — a register (variable driven inside a clocked procedural block).Assignment — a continuous or procedural assignment expression.Conditional / Case — an if or case branch in a procedural block.Merge — a synthetic join point that merges two branches back together.Constant — a literal or constant-foldable RHS (including the zero-extension bits of a widening conversion). Carries the ConstantValue and the bit width it drives, and acts as a leaf source feeding the consuming Assignment or Port.NetlistEdge carries annotations for the driven symbol, its bit range, and an ast::EdgeKind that records clock sensitivity (used to distinguish combinational from sequential edges).
NetlistBuilder is the main AST visitor (slang::ast::ASTVisitor). The four-phase construction is orchestrated by the BuildPipeline class, which NetlistBuilder owns:
Phase 1 — Collect (sequential). BuildPipeline::runPhase1 traverses the AST sequentially via root.visit(builder). During this phase the collectingPhase flag is set, so handle() methods for ProceduralBlockSymbol and ContinuousAssignSymbol do not execute the DFA immediately — they push their symbols onto the deferredBlocks work list. All other handle() methods run normally, creating Port, Variable, and instance-structure nodes, registering port connections, and populating the VariableTracker.
Phase 2 — DFA dispatch (parallel or sequential). Each entry in deferredBlocks is dispatched as an independent task. In parallel mode BuildPipeline::runPhase2Parallel detaches the tasks to a BS::thread_pool; in sequential mode runPhase2Sequential calls the same per-block helpers in a simple loop.
Each task runs a local DataFlowAnalysis that computes reaching definitions for all variables referenced in the block (see Data flow analysis). On completion the task calls mergeDrivers to fold per-block driver intervals into the central ValueTracker. Nodes and edges created during the DFA are added directly to the shared NetlistGraph (per-node edgeMutex on addEdge, single nodesMutex on addNode — see Multithreading). Pending R-values — operands whose full set of drivers is not yet known — are accumulated in a thread-local DeferredGraphWork buffer.
Phase 3 — Drain (sequential). After all Phase 2 tasks have completed, PendingRvalueQueue::drainPerTask collects the thread-local pending R-value buffers into a single queue, freeing each per-task buffer as it is consumed to keep peak memory down.
Phase 4 — R-value resolution (parallel or sequential). BuildPipeline::finalize() calls PendingRvalueQueue::resolve, which iterates over every pending R-value and connects it to its driver(s) in the graph. For each pending entry, emitEdgesFor first checks whether a State or Variable node exists for the referenced symbol and bounds; if so, a single edge is added. Otherwise, it walks the driver intervals that overlap the pending range (via ValueTracker::forEachDriverInterval) and emits an edge per driver, annotated with the precise sub-range the R-value actually reads. When the queue size exceeds BuilderOptions::parallelRValueThreshold (1000 by default) and parallel is true, resolveParallel groups entries by target node so each target's incoming edges are emitted from a single thread. Grouping is done in place: the queue is sorted by target pointer and a run-start vector indexes each contiguous run, avoiding an unordered_map<NetlistNode*,vector<size_t>> that could cost many MB of transient overhead on large designs.
Key helper classes (each lives in its own translation unit under source/):
BuildPipeline — drives the four-phase build, owns deferredBlocks, and records timings into BuildProfile.NodeFactory — single point of construction for Port, Variable, State, Assignment, Conditional, Case, Merge, and Constant nodes, keeping VariableTracker and the per-node bookkeeping in sync.PendingRvalueQueue — owns the pending-R-value list and implements both the sequential and parallel resolution paths.PortConnectionHandler — materialises port nodes, records cuts from concat-shaped actuals, and emits port-connection edges.CanonicalBodyResolver — maps non-canonical instance bodies and their value symbols to their canonical counterparts so each instance of a multi-instantiated module gets its own per-bit routing (see Multi-instantiated modules).Other key supporting types:
ValueTracker / VariableTracker — interval-map-based structures that track which netlist nodes drive which bit ranges of each symbol.DriverMap — a thin wrapper around slang's IntervalMap, mapping bit ranges to driver lists.ExternalManager<T> — a handle-based allocator used because IntervalMap values must be trivially copyable.SymbolTable — per-graph intern table for SymbolReference, so every edge holds a pointer to a single shared entry rather than a copy of the name / hierarchical path / location.CutRegistry — side table mapping formal ports' internal symbols to the bit offsets at which external concats split them; consulted by port-node creation and by BitSliceList::pushLsp to keep paths through concat-shaped port connections bit-precise.Phase 2 of netlist construction is parallelised using a BS::thread_pool (bundled with slang). During Phase 1 (the sequential collecting pass), procedural blocks and continuous assignments are not executed immediately but are instead pushed onto a deferredBlocks work list. In Phase 2 each deferred block is dispatched as an independent task to the thread pool, where it runs its own DataFlowAnalysis and merges results back into the shared graph.
Thread safety is achieved through several mechanisms:
Compilation::freeze()) before Phase 2, making it safe for concurrent reads. The Python bindings then call Compilation::unfreeze() before NetlistGraph.build so the builder can continue elaborating; slang-netlist freezes only for the analysis pass and runs its own sequential VisitAll before that.DirectedGraph::addNode is serialised by a single nodesMutex. DirectedGraph::addEdge uses a per-node edgeMutex on the source node (and on the target node when updating its inEdges), with a strict source-before-target locking order to avoid deadlock when two threads add reciprocal edges.ValueTracker and VariableTracker use per-slot locking, so updates to distinct symbols do not contend.DeferredGraphWork buffers during Phase 2 and merged sequentially in Phase 3, so no mutex is required around the queue itself.NetlistNode IDs are allocated with an atomic counter.PathFinder — finds a path between two nodes using depth-first search; returns a NetlistPath.CombLoops / CycleDetector — detects combinational loops by only traversing edges without clock sensitivity (EdgeKind::None).DepthFirstSearch — generic DFS template parameterised on a visitor and an edge predicate; used internally by PathFinder, CycleDetector, and NetlistGraph's fan-in/fan-out queries. Header lives under source/ rather than include/netlist/ — it is not part of the public API.tools/driver/driver.cpp — the slang-netlist CLI binary. It links against the netlist library and exposes commands such as --report-registers, --comb-loops, --from/--to path queries, --netlist-dot graph export, and the --black-box pattern flag (see Black-boxed instances).bindings/python/pyslang_netlist.cpp — pybind11 Python module (pyslang_netlist); enabled with -DENABLE_PY_BINDINGS=ON.All fetched automatically via CPM at configure time:
The DataFlowAnalysis pass extends slang's AbstractFlowAnalysis to compute reaching definitions for each variable at every program point within a procedural block. A reaching definition records which netlist node(s) last wrote to each bit range of a variable. This is richer than slang's built-in DefaultDFA, which only tracks whether a bit range has been driven (for diagnostic purposes such as detecting use-before-assign), without recording what drove it.
Reaching definitions are maintained in the AnalysisState as ValueDrivers — a per-symbol interval map from bit ranges to lists of DriverInfo (a netlist node paired with the AST expression that produced it). When an R-value reference is encountered, the analysis looks up the current reaching definitions for that symbol and adds edges from the defining nodes to the current node. When an L-value reference is encountered, the reaching definitions are updated to record the new definition. Non-blocking assignments (<=) are deferred and applied at the end of the procedural block via finalize().
Nodes represent operations or state, and edges represent data dependencies. An edge points from producer to consumer, i.e. data flows from the source node to the target node.
Node types:
Port — a module port (input or output), annotated with direction, hierarchical path, and bit bounds.Variable — a declared variable (wire, reg, logic, etc.), annotated with hierarchical path and bit bounds.State — the persistent value of a register: created whenever a variable is driven on a clocked edge inside a procedural block, so combinational consumers downstream see the registered value rather than the raw procedural drivers.Assignment — a continuous or procedural assignment operation.Conditional — an if branch point within a procedural block.Case — a case branch point within a procedural block.Merge — a synthetic join point where control flow from branches reconverges.Constant — a literal or constant-foldable RHS, including the zero-extension bits of a widening conversion. Carries the ConstantValue and the bit width it drives, and acts as a leaf source feeding the consuming Assignment or Port.Edge annotations:
symbol — the SymbolReference identifying the driven variable.bounds — the DriverBitRange indicating which bits are carried by the dependency.edgeKind — ast::EdgeKind (e.g. PosEdge, NegEdge, None) recording clock sensitivity, used to distinguish combinational from sequential edges.disabled — flag used to exclude edges during analysis (e.g. by CycleDetector).When BuilderOptions::resolveAssignBits is true (the default), DataFlowAnalysis::handle(AssignmentExpression) and NetlistBuilder::handlePortConnection decompose both sides of the assignment or connection into a BitSliceList and zip them onto a common cut-point grid via alignSegments. The cut-point grid is the sorted union of every slice boundary on either side: the union is taken so that each resulting segment falls entirely within a single slice on the LHS and a single slice on the RHS, and so that operands can therefore be matched bit-for-bit. Each segment becomes its own Assignment node, with the LHS driven from the LSPs that fall within the segment and RHS edges added only from LSPs that actually contribute to those bits.
BitSliceList::build (in source/BitSliceList.cpp) walks an expression and recognises four structural kinds; everything else collapses to a single BitSliceSource::Kind::Opaque slice covering the expression's getSelectableWidth():
Concatenation — operands are appended in reverse so that the LSB operand has concatLo == 0.Replication — only when the count is a constant non-negative int64_t. A zero-count replication produces no slices.Conversion — equal-width conversions pass through; widening conversions emit the operand's slices first followed by a BitSliceSource::Kind::Padding slice for the extension bits, which produce no edges. Narrowing conversions cannot be represented in a pass-through slice list and fall back to opaque.ConditionalOp (?:) — only when both arms and the result have the same selectable width and the condition has no pattern. Both arms are re-segmented onto a shared cut-point grid and the condition expression is attached as an opaque source on every unified slice so that it shows up as a dependency of every bit.For port connections NetlistBuilder::buildPortSliceList builds the formal side from the Port nodes already registered for the symbol (one or more BitSliceSource::Kind::PortNode sources per segment, depending on how the port was split — an inout port has both an input-side and an output-side node at overlapping ranges) and aligns it against the actual-side slicelist the same way.
When BuilderOptions::propCutsAcrossPorts is true (the default), bit boundaries introduced by an external concat are propagated inward across the port so that the formal port and the module's internal assignments keep the same per-bit granularity.
The mechanism is a CutRegistry side-table that maps each formal port's internal ValueSymbol to a sorted set of bit offsets. It is populated during Phase 1 and read during Phase 2:
InstanceSymbol body is visited, NetlistBuilder::recordCutsFromPortConnections walks every actual expression with collectActualCuts and registers the implied cut offsets against the corresponding formal port's internal symbol. Concatenation operand boundaries, replication unit boundaries, and the operand/padding boundary inside a widening conversion all contribute cuts. LSP-shaped operands also contribute any cuts already registered against their own root, so cuts flow down a multi-level hierarchy of pass-through ports.NetlistBuilder::handle(PortSymbol) calls materializePortNodes, which splits each driver's bit range at every registered cut and emits one Port node per segment instead of a single whole-port node.BitSliceList::build threads the registry into pushLsp; DataFlowAnalysis::handle(AssignmentExpression) supplies it. When an LSP's root has registered cuts intersecting the LSP's bounds, pushLsp emits a sub-slice per cut segment instead of a single slice covering the whole LSP. Each sub-slice keeps the full-LSP srcLo / srcHi so the offset math in driveLhsLspSegment / driveRhsLspSegment still recovers the correct LSP-internal bit.Together, these give bit-precise paths through patterns such as x ux(.x({b,a}), .y({d,c})) where the formal port, the internal assignment, and the actual concat are all split at the same cut offsets, and the resulting graph has no cross-bit edges.
The flag is only meaningful when resolveAssignBits is true. With cut propagation off, port nodes and module-internal assignments stay whole-word at port boundaries, matching the behaviour of releases before this feature landed; the CLI exposes this as --no-prop-cuts-across-ports.
Slang's elaborator deduplicates equivalent instance bodies through InstanceCacheKey: when two instances of the same module are structurally identical (same parameters, ports, and members in the same order), the second instance's InstanceBodySymbol is marked non-canonical and points at the first via setCanonicalBody. AnalysisManager::getDrivers then stores drivers only against the canonical body's value symbols — querying a non-canonical body returns an empty driver list. With no compensation, only one instance of a multi-instantiated module would receive any port-node or internal-assignment connectivity.
NetlistBuilder always routes driver lookups for non-canonical bodies to the corresponding symbol in the canonical body, so each instance gets its own port nodes, assignment nodes, and per-bit routing. Two memoized maps drive this:
canonicalValueCache maps a value symbol from a non-canonical body to the corresponding value symbol in the canonical body. materializePortNodes and handle(VariableSymbol) consult this before calling AnalysisManager::getDrivers.canonicalBodyCache maps an instance body to its canonical counterpart. Unlike slang's getCanonicalBody, which only populates the outermost non-canonical instance, this cache covers every nested body too.Both caches are populated lazily by getCanonicalBody, which walks up the body chain looking for an anchor — a body whose canonical is already known, either from slang's setCanonicalBody or from a previous call. From the anchor, populatePairedBodies traverses the anchor and its canonical in lockstep, recording every paired ValueSymbol in canonicalValueCache and every paired InstanceBodySymbol in canonicalBodyCache. The walk recurses through GenerateBlockSymbol, GenerateBlockArraySymbol, InstanceArraySymbol, and nested instance bodies, so a single anchor lookup populates the entire subtree in one pass.
Positional matching is sound because InstanceCacheKey requires identical content (parameters, ports, members in the same order) before linking a canonical body. Nested non-canonical bodies are not given a direct canonical pointer by slang — once tryApplyFromCache returns true on the outer instance, the visitor stops descending — so this structural pairing is the only way to discover the deeper correspondences.
Both caches are populated only during Phase 1's sequential AST traversal; Phase 2's parallel DFA does not touch them, so they need no synchronization. The caches are cleared between builds.
This pairing is unconditional: per-instance routing is the default behaviour, since silently dropping every non-canonical instance was unsound. The cost is that materializePortNodes creates one set of port nodes per instance of every multi-instantiated module, and shared signals (e.g. a clock or reset) drive each instance via its own edge. To keep that affordable, DirectedGraph::addEdge's findEdgeTo dedup uses a per-node outEdgeIndex map once a node's out-degree exceeds outEdgeIndexThreshold, so high-fan-out drivers stay close to O(1) amortised per added edge instead of degrading to quadratic.
BuilderOptions::blackBoxes carries a list of wildcard patterns (* and ?, via Utilities::wildcardMatch) that are matched against each InstanceSymbol's definition name and against its hierarchical instance path; a single match flags the instance as a black box. The CLI exposes this via --black-box (repeatable) and the Python bindings accept it as a black_boxes keyword argument on NetlistGraph.build.
In NetlistBuilder::handle(InstanceSymbol const&), a matched instance still has its formal port nodes materialised via PortConnectionHandler::materializePortNodes — the parent's port wiring needs somewhere to terminate — but symbol.body.visit(*this) is skipped, so no internal variables, assignments, procedural blocks, or sub-instances of the matched module contribute to the graph. The external port-connection assignments are still emitted, so fan-in and fan-out queries terminate cleanly at the black-box boundary.
Cross-port cut propagation still runs first, so a concat-shaped actual against a black-boxed instance produces the same per-bit formal port nodes it would for a fully-elaborated instance.
The bit-aligned path falls back to handleAssignmentLegacy / handlePortConnectionLegacy in three situations:
resolveAssignBits is false. The CLI exposes this as --no-resolve-assign-bits and the Python bindings as the resolve_assign_bits keyword argument to NetlistGraph.build. With the flag off, BitSliceList::build returns a single opaque slice covering the whole expression, so every LSP on one side of an assignment fans into every LSP on the other side. This matches the behaviour of releases before bit-aligned resolution landed.Beyond the fallbacks, the structural decomposition itself does not elaborate operators. Any expression that is not a concatenation, replication, equal-width ?:, or width-changing conversion is opaque, and every LSP found inside it fans into the full bit range of the slice. In particular, this means that y = a & b still cannot be resolved bit-by-bit: the & operator is opaque, so the entire RHS is one slice spanning all bits of y, and bit 0 of y is recorded as depending on every bit of a and b. The same applies to other bitwise, arithmetic, relational, reduction, streaming-concatenation, function-call, non-constant select, and pattern-bearing conditional expressions.
In every fallback or opaque case the resulting graph is still sound — no real dependency is missed — but queries over the affected nodes may report paths that a deeper Boolean analysis could rule out.