Validate Real-Time AI with VectorCAST + RocqStat

A practical workshop showing automotive engineers how to integrate ML inference timing checks into VectorCAST + RocqStat with sample harnesses, scripts, and CI gates.

Hook: Stop guessing — verify ML inference timing like safety-critical code

Automotive teams building perception and decisioning stacks struggle with one recurring problem: you can functional-test an ML model until your CI is blue, but that doesn't prove it will meet hard real-time deadlines on target hardware. Missed inference deadlines = missed braking, missed lane-keeping — and unacceptable safety risk. In 2026, as Vector Informatik integrates RocqStat into the VectorCAST toolchain, teams finally have a practical path to add WCET-style inference timing validation into existing verification workflows.

The short answer (what you’ll get from this workshop)

This hands-on guide shows automotive engineers how to: (1) measure ML inference timing on embedded targets, (2) run statistical and static WCET checks with RocqStat, (3) feed timing data into VectorCAST test runs, and (4) gate CI with timing assertions. You’ll get sample C harnesses for embedded cycle counting, Python orchestration scripts to run VectorCAST and RocqStat, and CI snippets for automated verification.

Why this matters now (2026 context and trends)

In January 2026 Vector Informatik acquired RocqStat (StatInf technology) to bring advanced timing analysis into VectorCAST. This is a turning point: timing analysis is moving from siloed research tools into mainstream software verification flows.
Automotive software stacks are growing in ML content (perception, sensor fusion, ADAS). Regulators and OEMs now expect demonstrable timing safety, not just accuracy metrics.
Toolchain consolidation reduces friction: unified tooling makes it realistic to include WCET testing in CI/CD pipelines rather than as a last-minute manual check before release.

High-level strategy: map WCET concepts to ML inference

WCET testing in embedded systems traditionally focuses on deterministic code paths (task-level analysis, cache effects, interrupts). ML inference introduces new challenges:

Data-dependent timing: input size, network topology, activation sparsity and branchy preprocessing can change execution time.
Hardware heterogeneity: CPU vs. DSP vs. NPU behavior and drivers vary widely.
Non-determinism: thermal throttling, JIT kernels, and background OS activity add variability.

The goal is to turn those challenges into testable claims: establish a defensible worst-case bound (or statistically confident bound) for inference latency under the target runtime configuration, and automate that check in VectorCAST flows using RocqStat for analysis.

Workshop prerequisites

VectorCAST installed and configured for your project (unit/integration tests running)
RocqStat or the RocqStat-enabled VectorCAST build (post-acquisition integration early availability or separate StatInf/RocqStat install)
Target board or hardware-in-the-loop (HIL) environment with a method to collect high-precision timestamps (cycle counter, DWT, or hardware timer)
Model runtime (ONNX Runtime, TensorRT, ArmNN, vendor NPU SDK) built for target
Python 3.9+ for orchestration scripts and CI glue

Step 1 — Prepare a deterministic test harness

Create a small, focused C test harness that loads inputs, runs inference, and records high-resolution timing. The harness should:

Isolate preprocessing and model invocation paths so they can be timed separately.
Warm up the model to avoid one-time JIT or cache effects on the first measurement.
Allow parameterized test inputs so VectorCAST can run multiple cases (edge cases, high-compute inputs, adversarial-sized inputs).

Example: ARM Cortex-M cycle-counting harness (C)

Use the DWT cycle counter on Cortex-M for precise cycle timing. This snippet shows the core measurement loop; wrap it into a VectorCAST test case file.

/* inference_timing.c - simplified example for Cortex-M */
#include <stdint.h>
#include <stdio.h>

/* Enable DWT CYCCNT (platform-specific, requires TRCENA) */
static inline void enable_cycle_counter(void) {
    *((volatile uint32_t*)0xE000EDFC) |= (1 << 24); /* DEMCR.TRCENA */
    *((volatile uint32_t*)0xE0001000) |= 1;         /* DWT_CTRL.CYCCNTENA */
}

static inline uint32_t read_cycles(void) {
    return *((volatile uint32_t*)0xE0001004); /* DWT_CYCCNT */
}

/* Replace with your model invocation */
extern void model_infer(const float* input, float* output);

int run_inference_timing(const float* input, float* output, uint32_t iterations, uint32_t* out_cycles) {
    enable_cycle_counter();
    /* Warm-up */
    for (uint32_t i = 0; i < 5; ++i) model_infer(input, output);

    uint32_t start = read_cycles();
    for (uint32_t i = 0; i < iterations; ++i) {
        model_infer(input, output);
    }
    uint32_t end = read_cycles();

    *out_cycles = end - start;
    return 0;
}

/* Example VectorCAST test wrapper would call run_inference_timing and assert */

Step 2 — Define test cases targeting worst-case patterns

You need a test matrix that stresses the model and runtime. At minimum include:

Nominal inputs: typical production sensor data
High-compute inputs: large objects, dense features, inputs that maximize activation counts
Adversarial / corner inputs: crafted to trigger rare branches or pathological behavior
Cold start: first inference after power/reset (to capture driver/hotplug costs)
Concurrency cases: co-running tasks, interrupts enabled

Capture these as separate VectorCAST test cases (one input per case). VectorCAST’s test database will give you traceability between test ID, model version, and timing result.

Step 3 — Collect and normalize timing measurements

Raw cycle counts are useful, but you should normalize and process measurements before handing them to RocqStat:

Convert cycles to microseconds using the CPU clock frequency.
Remove outliers that are caused by obvious interrupts or unrelated background jobs (but keep a log of removed samples).
Compute descriptive stats: min, max, median, mean, standard deviation, and percentiles (95th, 99th).

Example Python snippet to process raw timings produced by the above harness:

# process_timings.py
import json
import statistics

CPU_FREQ_HZ = 200_000_000  # example

def cycles_to_us(cycles):
    return cycles / CPU_FREQ_HZ * 1e6

with open('timings.json') as f:
    samples = json.load(f)['cycles']

us_samples = [cycles_to_us(c) for c in samples]
clean = [s for s in us_samples if s < statistics.mean(us_samples) + 5*statistics.stdev(us_samples)]

report = {
    'count': len(clean),
    'min_us': min(clean),
    'p50_us': statistics.median(clean),
    'p95_us': sorted(clean)[int(0.95*len(clean))-1],
    'max_us': max(clean)
}
print(json.dumps(report, indent=2))

Step 4 — Introduce RocqStat for WCET-style analysis

RocqStat provides methods for deriving worst-case bounds from observed execution traces and for performing static analysis where applicable. With Vector's acquisition, expect deeper integration into VectorCAST — but the workflow remains:

Feed normalized timing traces (per test-case) into RocqStat as input workloads.
Run RocqStat’s statistical inference to compute a conservatively safe upper bound (e.g., bound with 1e-6 probability of exceedance).
Use RocqStat’s reports to form timing assertions and link them to VectorCAST test IDs for traceability and safety evidence.

Example RocqStat CLI invocation (template)

# rocqstat_cli_example.sh
# Placeholder: replace with actual rocqstat CLI installed in your environment
rocqstat analyze \
  --input timings_normed.json \
  --confidence 1e-6 \
  --output rocqstat_report.json

The output contains a statistically-backed WCET estimate you can assert against in your VectorCAST test case or CI checker.

Step 5 — Connect results into VectorCAST

VectorCAST is your test harness and traceability store. There are two integration approaches:

Inline timing assertion: add assertion logic inside VectorCAST test cases to fail if observed time > bound from RocqStat.
External checker: run RocqStat after VectorCAST finishes, then use VectorCAST's external tool or REST APIs to attach results and mark test failures/notes accordingly.

Example: a VectorCAST test wrapper reads rocqstat_report.json and calls vectorcast_test_case_fail() if the inference P99 > allowed deadline. Replace the pseudocode with your project-specific VectorCAST API routines.

Step 6 — CI integration (example GitHub Actions)

Automate the whole pipeline: build, deploy to target (or QEMU HIL), run VectorCAST tests that execute the harness, collect timing traces, run RocqStat, and fail the build if timing assertions fail.

# .github/workflows/wcet-check.yml (simplified)
name: WCET Inference Check
on: [push, pull_request]

jobs:
  wcet-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build artifacts
        run: ./build_toolchain.sh
      - name: Deploy to target/hil
        run: ./deploy_to_target.sh
      - name: Run VectorCAST tests
        run: |
          vectorcast_run --project vc_project --test-suite inference_timing
          # Bundle produced timings
          tar -czf timings.tar.gz ./vc_outputs/timings.json
      - name: Fetch timings
        uses: actions/upload-artifact@v4
        with:
          name: timings
          path: timings.tar.gz
      - name: Run RocqStat analysis
        run: |
          tar -xzf timings.tar.gz
          rocqstat analyze --input timings.json --confidence 1e-6 --output report.json
      - name: Gate build on timing
        run: |
          python3 check_timing_gate.py report.json --deadline 2000  # microseconds

Dealing with non-determinism and JITs

ML runtimes often include JIT kernels and lazy initialization. Best practices:

Separate cold-start and steady-state checks: verify and document both. Often your safety deadline applies to steady-state only, but cold-start must also be bounded for system-level requirements.
Warm the runtime: run repeated inferences prior to timing measurement to activate JITs.
Pin CPU frequency: disable DVFS during tests or record frequency data to normalize results.
Repeat runs: collect large sample counts so RocqStat has sufficient data to infer rare-event bounds.

Example: Python orchestration to run tests, collect and call RocqStat

# orchestrate.py (simplified)
import subprocess
import json

# 1) Start VectorCAST test run
subprocess.run(['vectorcast_run', '--project', 'vc_project', '--test-suite', 'inference_timing'], check=True)

# 2) Collect output
with open('vc_outputs/timings.json') as f:
    timings = json.load(f)

# 3) Normalize/process (reuse earlier script) and write file for RocqStat
# ... processing omitted for brevity ...
with open('timings_normed.json', 'w') as f:
    json.dump({'samples': [t/200_000_000*1e6 for t in timings['cycles']]}, f)

# 4) Run RocqStat
subprocess.run(['rocqstat', 'analyze', '--input', 'timings_normed.json', '--confidence', '1e-6', '--output', 'rocq_report.json'], check=True)

# 5) Evaluate result and exit appropriately
with open('rocq_report.json') as f:
    r = json.load(f)
if r['wcet_us'] > 2000:
    print('Timing gate failed')
    exit(2)
print('Timing gate passed')

Safety & compliance: ISO 26262 and audit trails

For ISO 26262 and other safety processes, you need reproducible evidence. VectorCAST already stores test artifacts and traceability. Add these pieces for timing evidence:

Raw traces: keep original cycle dumps and timestamps.
Processing scripts: versioned Python/R scripts used to normalize and filter samples.
RocqStat reports: include confidence levels and parameters used for analysis.
VectorCAST links: associate test case IDs, model version, and build IDs with each timing report.

Interpreting results: what a “fail” means

A timing gate should be conservative but actionable. If a test fails:

Investigate the input pattern that caused the exceedance (VectorCAST should show the test input).
Check for environmental causes: CPU frequency throttling, thermal events, background tasks.
Consider micro-optimizations (quantization, kernel fusion, better scheduler), or move the model to a different compute domain (NPU) with stricter isolation.
Document mitigation and rerun with the same inputs to confirm a permanent fix.

Advanced strategies and future-proofing (2026+)

As VectorCAST and RocqStat converge, your team should invest in the following to scale timing validation:

Model-aware static analysis: combine RocqStat’s statistical bounds with control-flow/graph-level static analysis of kernels for even tighter bounds.
Hardware-in-the-loop farms: run large-sample experiments across device variants to detect hardware-specific pathological cases.
Automated regression baselines: store and diff timing baselines in the CI system so any model or runtime change triggers timing review.
Template libraries: create VectorCAST templates for common ML runtimes (ONNX, TensorRT, ArmNN) so teams can plug-and-play timing checks.

Example test case matrix (minimum necessary coverage)

Nominal (N=1000): expected sensor inputs
High-load (N=1000): busy scenes with max compute
Edge-case (N=500): adversarial/corner inputs
Cold-start (N=50): first inference after reset
Concurrent (N=200): inference while background tasks run

Actionable takeaways

Instrument early: add cycle counters and timing harnesses during integration, not at the end of development.
Use RocqStat for defensible bounds: statistical WCET is the right tool when exact static WCET is infeasible for ML workloads.
Automate in CI: gate merges on timing assertions to prevent regressions.
Keep artifacts: raw traces + analysis scripts are necessary evidence for safety audits.

Real-world example: small case study

A Tier-1 supplier integrated the above flow in late 2025 proof-of-concept runs. They added a DWT-based harness for an Arm Cortex-M + NPU board, collected 10k inferences across typical and adversarial scenes, and used RocqStat to produce a P(>deadline) < 1e-6 bound for steady-state inference. Integrating the check into VectorCAST CI prevented a runtime regression introduced by a third-party kernel update — the CI failed and the issue was traced to a new kernel that increased memory-copy overhead. The fix (kernel update and model input alignment) eliminated the exceedance and the supplier shipped the change with full timing evidence in their ISO 26262 artifact package.

Limitations and caveats

RocqStat’s statistical bounds require representative and sufficiently large samples; under-sampling gives unreliable bounds.
Some NPUs expose black-box drivers; if you cannot collect low-level timings, you must rely on system-level end-to-end latency and isolate the NPU variance separately.
Dynamic updates to model structure at runtime (rare in automotive) complicate traceability; freeze model topology for safety-relevant releases.

"With RocqStat integrated into VectorCAST, teams can turn timing analysis from a separate artefact into a first-class part of their verification pipeline." — practical implication of Vector’s 2026 acquisition

Next steps & checklist for your team

Identify target hardware and timing measurement method (cycle counter, OS timer, HIL probe).
Implement the timing harness and add VectorCAST test cases for your model runtimes.
Collect an initial dataset (5k–20k samples across inputs) and run RocqStat locally to get a baseline.
Automate the pipeline in CI and configure a timing gate with conservative thresholds.
Store all artifacts (raw, processed, RocqStat report) in your test evidence store for safety audits.

Call to action

Ready to add defensible real-time ML checks to your automotive verification pipeline? Start by instrumenting one model with the harness above, run a short VectorCAST+RocqStat experiment, and push the results into CI. If you want a downloadable starter kit (VectorCAST test templates, DWT harnesses, Python orchestration scripts and CI examples) tailored to ONNX Runtime or TensorRT, sign up for our developer pack and get step-by-step integration files you can drop into your repo.

For commercial support or an enterprise workshop to onboard your team to VectorCAST+RocqStat timing validation, reach out — we can run a hands-on session using your target hardware and ML workloads and help you build automated, auditable timing gates for safety-critical releases.

How Automotive Teams Can Validate Real-Time AI with WCET Tools

Hook: Stop guessing — verify ML inference timing like safety-critical code

The short answer (what you’ll get from this workshop)

Why this matters now (2026 context and trends)

High-level strategy: map WCET concepts to ML inference

Workshop prerequisites

Step 1 — Prepare a deterministic test harness

Example: ARM Cortex-M cycle-counting harness (C)

Step 2 — Define test cases targeting worst-case patterns

Step 3 — Collect and normalize timing measurements

Step 4 — Introduce RocqStat for WCET-style analysis

Example RocqStat CLI invocation (template)

Step 5 — Connect results into VectorCAST

Step 6 — CI integration (example GitHub Actions)

Dealing with non-determinism and JITs

Example: Python orchestration to run tests, collect and call RocqStat

Safety & compliance: ISO 26262 and audit trails

Interpreting results: what a “fail” means

Advanced strategies and future-proofing (2026+)

Example test case matrix (minimum necessary coverage)

Actionable takeaways

Real-world example: small case study

Limitations and caveats

Next steps & checklist for your team

Call to action

Related Topics

flowqbot

Up Next

Best Practices for Structured Output From LLMs in Real Apps

AI Workflow Monitoring: What to Log, Alert On, and Review Each Week

OpenAI vs Anthropic vs Google Gemini API Pricing and Capability Comparison

Hook: Stop guessing — verify ML inference timing like safety-critical code

The short answer (what you’ll get from this workshop)

Why this matters now (2026 context and trends)

High-level strategy: map WCET concepts to ML inference

Workshop prerequisites

Step 1 — Prepare a deterministic test harness

Example: ARM Cortex-M cycle-counting harness (C)

Step 2 — Define test cases targeting worst-case patterns

Step 3 — Collect and normalize timing measurements

Step 4 — Introduce RocqStat for WCET-style analysis

Example RocqStat CLI invocation (template)

Step 5 — Connect results into VectorCAST

Step 6 — CI integration (example GitHub Actions)

Dealing with non-determinism and JITs

Example: Python orchestration to run tests, collect and call RocqStat

Safety & compliance: ISO 26262 and audit trails

Interpreting results: what a “fail” means

Advanced strategies and future-proofing (2026+)

Example test case matrix (minimum necessary coverage)

Actionable takeaways

Real-world example: small case study

Limitations and caveats

Next steps & checklist for your team

Call to action

Related Reading

Related Topics

flowqbot

Up Next

Best Practices for Structured Output From LLMs in Real Apps

AI Workflow Monitoring: What to Log, Alert On, and Review Each Week

OpenAI vs Anthropic vs Google Gemini API Pricing and Capability Comparison