Benchmarks
Stratified evaluation subsets from PaRoutes designed to measure performance across route lengths, topologies, and material availability.
Available Benchmarks
| Benchmark | Series | Targets | Description | Stock |
|---|---|---|---|---|
| mkt-lin-500 | mkt | 500 | Linear routes of lengths 2–6 (100 each) | buyables-stock |
| mkt-cnv-160 | mkt | 160 | Convergent routes of depths 2–5 (40 each) | buyables-stock |
| ref-lin-600 | ref | 600 | Linear routes of lengths 2–7 (100 each) | n5-stock |
| ref-cnv-400 | ref | 400 | Convergent routes of depths 2–5 (100 each) | n5-stock |
| ref-lng-84 | ref | 84 | All available routes with length 8–10 from n1 and n5 | n1-n5-stock |
Which Benchmark Should I Use?
Are you a chemist looking for the best model to use right now with commercially available materials? Use Market Series (mkt-*).
Are you an algorithm developer who wants fair comparison against other approaches? Use Reference Series (ref-*).
Why Stratified Benchmarks?
Metric Insensitivity
74% of routes in PaRoutes n5 are length 3-4. General metrics can mask significant performance differences on longer routes (5+ steps) or specific topologies (linear vs. convergent).
Stock Definition
Only ~46% of PaRoutes leaf molecules are in Buyables stock, suggesting many routes are arbitrary fragments cut off where patent descriptions ended.
Validation: We used seed stability analysis across 15 candidate subsets to ensure each benchmark is internally representative and minimizes variance.
Key Terminology
- Convergent Route
- Contains at least one reaction combining ≥2 non-leaf molecules. Represents complex synthetic strategies where multiple intermediates are brought together.
- Linear Route
- All reactions use at most one non-leaf molecule, representing sequential transformations.
- Route Length
- The number of reaction steps in the synthesis route from target to starting materials.
- Stock
- The set of commercially available or defined starting materials that can be used as leaves in a synthesis route.