Three-Dimensional Device Simulation on A Super Parallel Computer : ADENA

S. Odanaka\*, K. Zaiki\*, and T. Nogi\*\*

| *Semiconductor Research Center     | **Department of Applied Mathematics  |
|------------------------------------|--------------------------------------|
| Matsushita Electric Industrial Co. | and Physics, Faculty of Engineering, |
| Moriguchi, Osaka, Japan            | Kyoto University, Kyoto, Japan       |

## Abstract

This paper presents parallel algorithm and performance in three-dimensional device simulation on a super parallel computer : ADENA (<u>Alternating</u> <u>Direction</u> <u>Editing</u> <u>Nexus</u> <u>Array</u>) [1], [2]. The massively parallel computation of the splitting-up CG and BCG methods is easily obtained on the ADENA computer.

To realize the physical device modeling as a practical device CAD, the high-speed computation in large-scale simulations is required, especially in the three-dimensional case [3]. We presents a three-dimensional device simulation developed on the ADENA computer. The ADENA is a parallel MIMD computer and its architecture concept is based on ADE (Alternating Direction Edition) with no memory conflict. As shown in Fig. 1, this computer has been realized by using a three-dimensional network and twodimensional array of processors. It has 256-PEs (Processing Element) and each PE consists of a 64 bit-floating point processor [4] and 2 M byte of local data memory.

Until now, it has been shown that the three-dimensional device [3], [5] and process models [6] are successfully solved by the vectorized ICCG and ILUBCG (or ILUCGS) methods with the list vector method using the vector processors. However, the performance of the vector processor strongly depends on the device speed in the computer system and hence higher-speed vector processor reveals the need for expensive devices fabricated in the highest VLSI technology. The massively parallel computation is becoming a key technology to achieve the next generation of three-dimensional process and device CAD. For this purpose, the parallel computation algorithms must be developed to solve the iteration scheme for a system of linear algebraic equations in the process and device modeling.

In general, the iteration process is described as

$$C \cdot (u^{n+1} - u^n) = -\tau_{n+1} (A u^n - f) \quad (n = 0, 1, 2, \dots) \quad (1)$$

where the matrix C is an approximate factorization of the matrix A. The ADENA computer allows massively parallel computation of the splittig-up operator method [7], which is a generalization of the ADI method. In this work, the matrix C is proposed as follows:

 $C = (D + \Lambda_1) \cdot D^{-1} \cdot (D + \Lambda_2) \cdot D^{-1} \cdot (D + \Lambda_3), \qquad (2)$ 

(8-3)

$$A = D + A_1 + A_2 + A_3$$
,

where D is diagonal elements in the matrix A. A1, A2, and A3 are offdiagonal elements in the x-, y-, and z-directions, respectively. The triangular systems at each direction are solved by the forward and the backwrad substitution. This factorization is consist with the computer architecture of the ADENA. The convergence rate is further accelerated by using the CG and BCG methods. We call them the splitting-up CG and BCG methods [2], respectively. This approach is also effective in solving a full set of Poisson equation and two-carrier current continuity equations. In fact, Fig. 2 demonstrates relative residual versus iteration for SPUPCG and SPUPBCG methods compared with the results of the ICCG and ILUBCG methods. The simulated device structure is a scaled 0.5µm n-MOSFET having a threedimensional structure. The result indicates that the SPUPCG and SPUPBCG methods lead to the almost same convergence rate as the ICCG and ILUBCG methods.

Fig. 3 shows CPU time per iteration versus number of grid nodes for the SPUPBCG method on the ADENA in a 40-MHz operation. The result is also compared with that calculated using the ILUBCG method on the vector-type supercomputer VP-200. We can see two important results in Fig. 3. One is massively parallel computation of the SPUPBCG method is achieved by using the ADENA computer. In the case of the VP-200, the CPU time is rapidly increased as the the number of grid nodes is increased. The other is the actual computation speed of the matrix solver reaches to that on the VP-200 in large-scale simulations. At present the parallel computation on the ADENA allows the three-dimensional device CAD with one tenth of the cost for the vector-type supercomputer.

## Acknowledgement

The authors wish to thank Dr.S. Horiuchi, Dr.T. Takemoto, Y. Mano, and Y. Terui for their encouragements during this work. Special thanks go to Dr. H. Kadoto and ADENA team.

## Reference

- T. Nogi, "Parallel machine ADINA," in Computing Methods in Applied Sciences and Engineering V., eds., R. Growinsky and J, Lions, North-[1]
- Sciences and Engineering V., eds., R. Grownisky and J. Lions, North-Holland, Amsterdam, pp. 103-122, 1982. T. Nogi, "Parallel Computation," *Patterns and Waves Qualitative analysis of nonlinear differential equations -, North-Holland, Amsterdam*, pp. 279-318, 1986. S. Odanaka et. al., "SMART-II : A three-dimensional CAD model for submicrometer MOSFETs," to be published in IEEE Trans. on CAD / [2]
- [3] ICAS.
- [4]
- K. Kaneko et al., "64 bit RISC microprocessor for the parallel computer system," ISSCC Dig. Tech. Papers, pp. 78-79, 1989. T. Toyabe et al. "Three-dimensional device simulator CADDETH with highly convergent matrix solution algorithms," IEEE Trans. on CAD/ [5] ICAS, Vol. CAD-4, No. 4, pp. 482-488, 1985.

- [6] S. Odanaka et al., "SMART-P : Rigorous three-dimensional process simulator on a supercomputer," IEEE Trans. on CAD / ICAS, Vol. CAD-7, No. 6, pp. 675-683, 1988.
- [7] G. I. Marchuk, Methods of numerical mathematics. Second Edition. Springer-Verlag 1982.



Fig. 1 ADENA structure for three-dimensional simulation. In this case, the simulation region is covered by a virtual network composed of 2x2x2 real networks. BMU (Buffer Memory Unit) elements of 16×16 are arranged in 16 layers, respectively.



Fig. 2 Relative residual versus iteration for SPUPCG and SPUPBCG methods. The results are compared with that calculated using the ICCG and ILUBCG methods. The simulated device structure is a 0.5 µm n-MOSFET.



Fig. 3 CPU time per iteration as a function of the number of grid nodes. The SPUPBCG method calculated on the ADENA is compared with the ILUBCG method calculated on the VP-200.