







# Open3DBench: Open-Source Backend Implementation Flow for 3D-IC

Yunqi Shi

Nanjing University, School of Al

Email: shiyq@lamda.nju.edu.cn

github:



arXiv:



### Electronic Design Automation (EDA) and Physical Design (PD)

- Definition of Electronic Design Automation (EDA)
- Role of Physical Design (PD)



 $\hbox{[1] ``Placement in Advanced Technology Nodes'', Bei Yu, https://www.cse.cuhk.edu.hk/~byu/doc/2021-Place.pdf}$ 

### AI for EDA

### nature

Article Published: 09 June 2021

### A graph placement methodology for fast chip design AlphaChip

### Google DeepMind

How AlphaChip transformed computer chip design

26 SEPTEMBER 2024

Anna Goldie and Azalia Mirhoseini



### Why Open-Source EDA?

Commercial Tools

cādence°

Innovus

Integrity 3D-IC Platform





Le3DIC®3D-IC设计平台

**Open-Source Tools** 



- 3D-IC has the potential to sustain Moore's Law
- Three types of inter-connect implementation:
  - Micro-bumping
  - Hybrid Bonding Terminals (HBTs)
  - Monolithic Inter-tier Vias (MIVs)



• 3D-IC has the potential to sustain Moore's Law



3D-IC implemented with hybrid bonding terminals (HBTs)

• 3D-IC **physical design** research has attracted tremendous attention these years



#### **ICCAD 2022**

**CAD Contest** 



#### **Contest Problems**

Problem A Learning Arithmetic Operations from Gate-Level Circuit (Cadence Design Systems, Inc.)

| Problem B | 3D Placement with D2D Vertical Connections (Synopsys, Inc.) |
|-----------|-------------------------------------------------------------|
| Problem C | Microarchitecture Design Space Exploration                  |

(DAMO Academy)



• 3D-IC **physical design** research has attracted tremendous attention these years



## CAD Contest



#### **Contest Problems**

| Problem A | Multi-bit Large-scale Boolean Mat (Cadence Design Systems, Inc.) | ching |  |  |  |
|-----------|------------------------------------------------------------------|-------|--|--|--|
| Problem B | 3D Placement with Macros (Synopsys, Inc.)                        |       |  |  |  |
| Problem C | Static IR Drop Estimation Using W                                | · ·   |  |  |  |



- 3D-IC **physical design** research has attracted tremendous attention these years.
- We want to test our 3D EDA algorithms in a standardized and reliable way.

Taking 3D placement as an example:

#### **Contest benchmarks:**

- 3D Placement with D2D Vertical Connections @ ICCAD'22 Contest
- 3D Placement with Macros @ ICCAD'23 Contest



 Such contest benchmarks provide standardized comparison. But the host did not provide any implementation details (including valid PDK or design RTLs), narrowing the use of test cases.

[1] Zhao, Yuxuan, et al. "Analytical Heterogeneous Die-to-Die 3D Placement with Macros." TCAD 2024.

 $Initial\ score = HPWL\ of\ top\ die + HPWL\ of\ bottom\ die + \#terminals \\ \times terminal\ cost$ 

- 3D-IC **physical design** research has attracted tremendous attention these years.
- We want to test our 3D EDA algorithms in a standardized and reliable way.

Taking 3D placement as an example:

#### **Contest benchmarks:**

- 3D Placement with D2D Vertical Connections @ ICCAD'22 Contest
- 3D Placement with Macros @ ICCAD'23 Contest



 Such contest benchmarks provide standardized comparison. But the host did not provide any implementation details (including valid PDK or design RTLs), narrowing the use of test cases.



- 3D-IC **physical design** research has attracted tremendous attention these years.
- We want to test our 3D EDA algorithms in a standardized and reliable way.

Taking 3D placement as an example:

#### Workaround by 2D backend flow:

- Macro-3D [1], Pin-3D [2], 3D Net-to-Pad Assignment [3] use Innovus to perform 3D backend flow.
- TA-3D [4] builds a 3D timing model using 2D tool OpenSTA.
- Commercial tools and commercial PDKs prevent replicable comparisons due to license issue.
- Building our own workaround flow may be time consuming and sometimes not reliable enough.

<sup>[1]</sup> Bamberg, Lennart, et al. "Macro-3D: A physical design methodology for face-to-face-stacked heterogeneous 3D ICs." DATE 2020.

<sup>[2]</sup> Pentapati, Sai Surya Kiran, et al. "Pin-3D: A physical synthesis and post-layout optimization flow for heterogeneous monolithic 3D ICs." ICCAD 2020.

<sup>[3]</sup> Vanna-iampikul, Pruek, et al. "Placement-Aware 3D Net-to-Pad Assignment for Array-Style Hybrid Bonding 3D ICs." ISPD 2025.

<sup>[4]</sup> Kim, Donggyu, et al. "TA3D: Timing-Aware 3D IC Partitioning and Placement by Optimizing the Critical Path." MLCAD 2024

#### Main purpose:

Benchmarking everything in 3D backend flow



#### Key idea:

Duplicate the original 2D metal layers and implement the 3D design on one die







[1] Bamberg, Lennart, et al. "Macro-3D: A physical design methodology for face-to-face-stacked heterogeneous 3D ICs." DATE 2020.

- PDK preparation: Modify NG45 to NG45\_3D
  - Duplicate the metal layer in techlef
  - Duplicate the instance lef and lib to distinguish top and bottom die





#### Design preparation:

Any 2D design (RTL / netlist) supported by OpenROAD-flow-scripts



<sup>[1]</sup> Bamberg, Lennart, et al. "Macro-3D: A physical design methodology for face-to-face-stacked heterogeneous 3D ICs." DATE 2020.

#### 3D Placement:

#### Adopt 2D DREAMPlace [1] for workaround





[1] Lin, Yibo, et al. "DREAMPlace: Deep learning toolkit-enabled GPU acceleration for modern VLSI placement." DAC 2019.

Hybrid Bonding Terminal (HBT)

Top Die

HBT layer
(metal layer, resource = 0)
Insert a buffer pin
serving HBT

Bottom Die

Substrate



[1] Vanna-iampikul, Pruek, et al. "Placement-Aware 3D Net-to-Pad Assignment for Array-Style Hybrid Bonding 3D ICs." ISPD 2025.

#### PPA evaluation

Since we establish the whole design on a 2D vision, and have defined the 3D connections properly, the original 2D OpenSTA [1] can serve 3D timing analysis.

HotSpot [2] supports 3D inherently.



<sup>[1]</sup> https://github.com/The-OpenROAD-Project/OpenSTA

<sup>[2]</sup> https://github.com/uvahotspot/HotSpot

#### Some evaluations



| Designs                            | Methods       | Area               | rWL    | Overflow | WNS    | TNS       | Power  | $T_{max}$ | Runtime |
|------------------------------------|---------------|--------------------|--------|----------|--------|-----------|--------|-----------|---------|
| 2 20.610                           |               | (mm <sup>2</sup> ) | (m)    | (#)      | (ns)   | (ns)      | (W)    | (°C)      | (s)     |
| ariane133                          | Hier-RTLMP-2D | 2.25               | 8.20   | 132      | -2.18  | -5766.41  | 0.393  | 58.84     | 3667    |
|                                    | DREAMPlace-2D | 2.25               | 7.18   | 112      | -1.69  | -4098.04  | 0.389  | 58.69     | 1556    |
|                                    | Open3D-Tiling | 1.00               | 6.21   | 0        | -1.40  | -3049.41  | 0.360  | 58.35     | 1743    |
|                                    | Open3D-DMP    | 1.00               | 5.59   | 0        | -1.34  | -2648.76  | 0.360  | 58.21     | 1739    |
| ariane136                          | Hier-RTLMP-2D | 2.25               | 8.63   | 127      | -2.51  | -7072.67  | 0.514  | 63.40     | 1779    |
|                                    | DREAMPlace-2D | 2.25               | 7.80   | 148      | -2.71  | -7561.23  | 0.508  | 61.14     | 1720    |
|                                    | Open3D-Tiling | 1.00               | 6.32   | 0        | -2.38  | -6125.24  | 0.471  | 60.93     | 1791    |
|                                    | Open3D-DMP    | 1.00               | 6.05   | 0        | -2.45  | -6603.91  | 0.471  | 62.27     | 1870    |
| black_parrot                       | Hier-RTLMP-2D | 1.76               | 12.41  | 68       | -6.96  | -6289.17  | 0.398  | 55.14     | 1819    |
|                                    | DREAMPlace-2D | 1.76               | 12.23  | 334      | -6.57  | -5268.85  | 0.399  | 55.26     | 1728    |
|                                    | Open3D-Tiling | 0.81               | 8.08   | 0        | -5.76  | -2251.30  | 0.376  | 62.29     | 1895    |
|                                    | Open3D-DMP    | 0.81               | 7.79   | 0        | -5.67  | -4067.11  | 0.374  | 60.97     | 1920    |
|                                    | Hier-RTLMP-2D | 0.56               | 3.00   | 30       | -1.88  | -523.27   | 0.152  | 52.63     | 1063    |
| har har                            | DREAMPlace-2D | 0.56               | 2.89   | 36       | -1.30  | -246.99   | 0.153  | 53.08     | 916     |
| bp_be                              | Open3D-Tiling | 0.30               | 2.40   | 0        | -1.21  | -188.86   | 0.144  | 61.17     | 998     |
|                                    | Open3D-DMP    | 0.30               | 2.42   | 0        | -0.89  | -108.89   | 0.144  | 59.11     | 1053    |
|                                    | Hier-RTLMP-2D | 0.48               | 1.81   | 6        | -1.40  | -942.36   | 0.302  | 64.09     | 449     |
|                                    | DREAMPlace-2D | 0.48               | 1.73   | 30       | -1.51  | -978.59   | 0.305  | 63.62     | 239     |
| bp_fe                              | Open3D-Tiling | 0.24               | 1.38   | 0        | -1.53  | -729.42   | 0.284  | 87.33     | 398     |
|                                    | Open3D-DMP    | 0.24               | 1.30   | 0        | -1.37  | -814.64   | 0.283  | 82.32     | 388     |
| bp_multi                           | Hier-RTLMP-2D | 1.21               | 6.20   | 36       | -7.97  | -12072.10 | 1.143  | 86.18     | 868     |
|                                    | DREAMPlace-2D | 1.21               | 5.63   | 9        | -8.30  | -10946.20 | 1.126  | 85.50     | 760     |
|                                    | Open3D-Tiling | 0.64               | 4.06   | 0        | -7.01  | -9246.70  | 1.062  | 112.09    | 883     |
|                                    | Open3D-DMP    | 0.64               | 4.03   | 0        | -8.03  | -9812.57  | 1.050  | 98.09     | 935     |
| bp_quad                            | Hier-RTLMP-2D | 12.96              | 46.63  | 3429     | -3.66  | -39020.00 | 1.822  | 66.05     | 8010    |
|                                    | DREAMPlace-2D | 12.96              | 41.99  | 3968     | -2.05  | -31231.90 | 1.848  | 68.17     | 6336    |
|                                    | Open3D-Tiling | 6.25               | 50.19  | 0        | -2.62  | -31124.70 | 1.840  | 69.78     | 7973    |
|                                    | Open3D-DMP    | 6.25               | 40.39  | 0        | -1.83  | -26966.20 | 1.832  | 66.96     | 7981    |
|                                    | Hier-RTLMP-2D | 1.10               | 5.62   | 14428    | -2.14  | -1975.79  | 0.250  | 54.86     | 1175    |
| swerv_wrapper                      | DREAMPlace-2D | 1.10               | 5.54   | 9540     | -1.86  | -1429.90  | 0.254  | 53.48     | 1092    |
|                                    | Open3D-Tiling | 0.56               | 3.63   | 0        | -1.26  | -972.80   | 0.232  | 62.17     | 2085    |
|                                    | Open3D-DMP    | 0.56               | 3.46   | 0        | -1.23  | -958.01   | 0.234  | 60.49     | 1744    |
| 3D improvements over 2D†           |               | 51.19%↑            | 24.06% | 100%↑    | 16.24% | 30.84%↑   | 5.72%↑ | -10.04%↓  | -24.82% |
| 3D-DMP improvements over 3D-Tiling |               | Equal              | 5.96%↑ | Equal    | 7.22%↑ | -4.49%↓   | 1.98%  | 3.56%↑    | 0.23%   |



**Netlist Preparation** 

```
BUSBITCHARS "[]";
DIVIDERCHAR "/";
MACRO HBT_BOTIN
ORIGIN 0 0 ;
FOREIGN HBT_BOTIN 0 0 ;
 SIZE 7 BY 7;
 SYMMETRY X Y ;
 SITE FreePDK45_38x28_10R_NP_162NW_340 ;
 PIN TOP
   DIRECTION OUTPUT:
   USE SIGNAL ;
    LAYER metal_top_pin ;
      RECT 0.0 0.0 0.1 0.1 ;
 END TOP
 PIN BOT
   DIRECTION INPUT;
   USE SIGNAL ;
    LAYER metal_bottom_pin ;
     RECT 0.0 0.0 0.1 0.1 :
```

```
DIVIDERCHAR "_" ;
BUSBITCHARS "[]";
DESIGN bp_multi_top ;
UNITS DISTANCE MICRONS 2000;
DIEAREA ( 0 0 ) ( 1600000 1600000 ) :
ROW ROW_0 FreePDK45_38x28_10R_NP_162NW_340
ROW ROW_1 FreePDK45_38x28_10R_NP_162NW_340
ROW ROW 2 FreePDK45 38x28 10R NP 162NW 34
ROW ROW_3 FreePDK45_38x28_10R_NP_162NW_34
ROW ROW_4 FreePDK45_38x28_10R_NP_162NW_340
ROW ROW_5 FreePDK45_38x28_10R_NP_162NW_34
ROW ROW_6 FreePDK45_38x28_10R_NP_162NW_34
ROW ROW_7 FreePDK45_38x28_10R_NP_162NW_34
ROW ROW_8 FreePDK45_38x28_10R_NP_162NW_34
ROW ROW_9 FreePDK45_38x28_10R_NP_162NW_34
ROW ROW 10 FreePDK45 38x28 10R NP 162NW 3
ROW ROW_11 FreePDK45_38x28_10R_NP_162NW_3
ROW ROW_12 FreePDK45_38x28_10R_NP_162NW_3
ROW ROW 13 FreePDK45 38x28 10R NP 162NW 3
ROW ROW_14 FreePDK45_38x28_10R_NP_162NW_3
ROW ROW_15 FreePDK45_38x28_10R_NP_162NW_3
ROW ROW 16 FreePDK45 38x28 10R NP 162NW 3
ROW ROW_17 FreePDK45_38x28_10R_NP_162NW_34
ROW ROW_18 FreePDK45_38x28_10R_NP_162NW_3
```



```
DIVIDERCHAR "_" ;
                                            BUSBITCHARS "[]";
DIVIDERCHAR "/";
                                            DESIGN bp_multi_top ;
MACRO HBT BOTIN
                                            UNITS DISTANCE MICRONS 2000
 CLASS CORE :
                                            DIFARFA ( 0 0 ) ( 1600000 1600
 ORIGIN 0 0;
                                            ROW ROW_0 FreePDK45_38x28_10R_
 FOREIGN HBT_BOTIN 0 0;
                                            ROW ROW_1 FreePDK45_38x28_10R_
 SIZE 7 BY 7;
 SYMMETRY X Y :
                                            ROW ROW_2 FreePDK45_38x28_10R_
 SITE FreePDK45_38x28_10R_NP_162NW_340
                                            ROW ROW_3 FreePDK45_38x28_10R_
                                            ROW ROW_4 FreePDK45_38x28_10R_
  DIRECTION OUTPUT ;
                                                                              Standard Format
                                            ROW ROW 5 FreePDK45 38x28 10R
   USE SIGNAL ;
                                            ROW ROW_6 FreePDK45_38x28_10R_
                                                                              i.e., .lef, .def, .lib
    LAYER metal_top_pin ;
                                            ROW ROW_7 FreePDK45_38x28_10R_
     RECT 0.0 0.0 0.1 0.1 ;
                                            ROW ROW_8 FreePDK45_38x28_10R_
                                            ROW ROW_9 FreePDK45_38x28_10R_
                                            ROW ROW 10 FreePDK45 38x28 10R
  DIRECTION INPUT ;
                                            ROW ROW_11 FreePDK45_38x28_10R
  USE SIGNAL :
                                            ROW ROW_12 FreePDK45_38x28_10R
                                            ROW ROW_13 FreePDK45_38x28_10R
    LAYER metal_bottom_pin ;
                                            ROW ROW 14 FreePDK45 38x28 10R
     RECT 0.0 0.0 0.1 0.1 ;
                                            ROW ROW_15 FreePDK45_38x28_10R
  MUSTJOIN TOP
                                            ROW ROW_16 FreePDK45_38x28_10R
 END BOT
                                            ROW ROW_17 FreePDK45_38x28_10R
                                            ROW ROW_18 FreePDK45_38x28_10R
                                      DieSize 0 0 41570 41557
 NumTechnologies 1
 Tech TA 1010
 LibCell N MC1 20 73 1
                                       TopDieMaxUtil 80
 Pin P1 5 26
                                      BottomDieMaxUtil 80
 LibCell N MC2 30 73 2
 Pin P1 6 32
                                      TopDieRows 0 0 41570 73 569
 Pin P2 24 37
                                      BottomDieRows 0 0 41570 73 569
                                                                              Contest Specific
 LibCell N MC3 70 73 2
 Pin P1 6 32
                                       TopDieTech TA
 Pin P2 44 36
                                                                                      Format
                                      BottomDieTech TA
 LibCell N MC4 40 73 2
 Pin P1 6 32
                                      TerminalSize 52 52
 Pin P2 24 36
 LibCell N MC5 40 73 2
                                      TerminalSpacing 52
                                      TerminalCost 10
 Pin P1 7 32
 Pin P2 30 36
 LibCell N MC6 50 73 2
                                      NumInstances 136679
 Pin P1 6 32
                                      Inst C1 MC1
 Pin P2 33 35
                                      Inst C2 MC2
 LibCell N MC7 130 73 2
                                       Inst C3 MC3
 Pin P1 6 32
                                      Inst C4 MC2
```









**Placement Output** 



**Placement Output** 

3D Database









**HBT** Legalization



**HBT Candidates** 



Hungarian Algorithm

Sliding Window

Quad-Tree

Reinforcement Learning







(b) Matching



(c) Legalization

**HBT** Legalization



Hungarian Algorithm

Sliding Window

Quad-Tree

Reinforcement Learning

**HBT** Legalization

Formally, HBT Legalization is a large-scale bipartite matching problem, minimizing the total displacement.



Georgia Tech. and Synopsys adopt Hungarian algorithm to optimize the assignment<sup>[1]</sup>, with complexity  $O(|P|^2 \times |C|)$ , requiring hours for large cases.

[1] Pruek, et al. "Placement-Aware 3D Net-to-Pad Assignment for Array-Style Hybrid Bonding 3D ICs". ISPD 2025.



Hungarian Algorithm

Sliding Window

Quad-Tree

Reinforcement Learning

**HBT** Legalization

Georgia Tech. proposes a sliding window-based method<sup>[2]</sup>, recursively moving the window to scan the whole canvas, and perform Hungarian algorithm inside the window.



[2] Sai Pentapati, et al. "On Legalization of Die Bonding Bumps and Pads for 3D ICs". ISPD 2023.



Hungarian Algorithm

Sliding Window

Quad-Tree

Reinforcement Learning

**HBT** Legalization

CUHK proposes a Quadratic-tree-based method<sup>[3]</sup>, recursively partitioning the canvas into four rectangular areas and deciding assignments in a bottom-up manner.



[3] Liu, Siting, et al. "Routing-aware legal hybrid bonding terminal assignment for 3D face-to-face stacked ICs." ISPD 2024.



**HBT** Legalization

We propose a two-stage legalization procedure:

- 1) Greedy assign to a nearest legal position.
- 2) Adopt reinforcement learning (RL) to dynamically choose the window, and perform Hungarian algorithm for refinement.



Rip-up and Reassign



Hungarian Algorithm

Sliding Window

Quad-Tree

Reinforcement Learning

**HBT** Legalization

#### **Benefit:**

Unlike sliding window-based method, we only have to deal with a small proportion of regions that are critical in HBT resources.



Rip-up and Reassign



Hungarian Algorithm

Sliding Window

Quad-Tree

Reinforcement Learning

**HBT** Legalization

#### **Benefit:**

Unlike sliding window-based method, we only have to deal with a small proportion of regions that are critical in HBT resources.



**Before Legalization** 



After Legalization



Hungarian Algorithm

Sliding Window

Quad-Tree

Reinforcement Learning

**HBT** Legalization

#### **Benefit:**

Unlike sliding window-based method, we only have to deal with a small proportion of regions that are critical in HBT resources.



Our method introduce significant efficiency improvement of over 100x, compared to sliding window-based method, with reduced displacement.





Clock Tree Synthesis

Routing

Timing Analysis

Thermal

**Final Evaluation** 







(b) Open3D-Tiling rWL = 2.40 m WNS = -1.21 ns TNS = -188.86 ns



(c) Open3D-DMP  

$$rWL = 2.42 \text{ m}$$
  
 $WNS = -0.89 \text{ ns}$   
 $TNS = -108.89 \text{ ns}$ 



Clock Tree Synthesis

Routing

Timing Analysis

Thermal



**Final Evaluation** 

### Limitations

- Heterogeneous
- TSV consideration
- 3D power distribution network
- 3D buffering & sizing
- ...



## Thank you!

Yunqi Shi

Nanjing University

Email: shiyq@lamda.nju.edu.cn

github:



arXiv:

