# Industry Evaluation of Reversible Scan Chain Diagnosis Soumya Mittal<sup>1</sup>, Szczepan Urban<sup>2</sup>, Kun Young Chung<sup>1</sup>, Jakub Janicki<sup>2</sup>, Wu-Tung Cheng<sup>3</sup>, Martin Parley<sup>1</sup>, Manish Sharma<sup>3</sup>, Shaun Nicholson<sup>1</sup> Qualcomm Technologies, Inc., San Diego, CA 92121, USA <sup>2</sup>Siemens EDA, Poznań, Poland <sup>3</sup>Siemens EDA, Wilsonville, OR 97070, USA Abstract— Reversible scan chain is an architecture targeted towards improving the quality of chain diagnosis. In this scan architecture, chains are designed to shift the test data in both directions to isolate a defect. We implemented this technique on a test chip in one of the most advanced technology nodes to measure its benefit over its cost. Test chips are typically low in volume and may suffer from worse diagnosis resolution and lower diagnosis convergence than production chips due to early process technology. Hence, it is very important to enhance diagnosis resolution and increase successful diagnosis data volume without losing accuracy to help expedite yield learning. In this paper, the detailed evaluation method together with silicon data and failure analysis results are presented. Moreover, we utilize this novel methodology for improving diagnostics suspect resolution and physical area. Overall, we observe significant benefit in diagnosis quality. Specifically, 4X improvement in the number of suspects and up to 7X improvement in suspect cell area is observed without losing accuracy. Design overhead is also presented and discussed. Keywords—scan chain diagnosis, reversible scan chain ## I. INTRODUCTION The objective of diagnosis is to accurately find the actual defect location with as small number of suspects as possible. In scan chain failure scenario, an ideal diagnosis would suspect a single scan cell for each physical defect. In failure analysis (FA), however, a single scan cell suspect may not necessarily mean small search space, because it is required to examine gates and connections all the way from the scan input to the suspect scan cell and to the scan output. If there are buffers/inverters along the scan paths, they are all required to be examined, too. In practice, multiple scan cells can often be reported as suspects, which tend to include a large silicon area, making it harder and time-consuming to perform FA. Hence, it is very important to optimize diagnosis resolution (*i.e.*, number of suspects or FA search space), which, most likely, results in faster and more accurate FA execution. Diagnosing defects on scan chains has become an important vehicle for yield analysis and product ramp up in the past two decades. Scan chain tests are applied before logic tests, and therefore often effective in detecting defects not only in the scan cells but also in the interconnects. Especially, in the early stages of a new technology node, it is possible to obtain more failures from scan chain tests due to underlying process issues or higher defect density inside the scan cells and/or along the scan paths, which, consequently, provides yield learning opportunities from diagnosing chain defects. Scan chain fault diagnostic techniques can typically be classified into three categories: tester-based, hardware-based, and software-based diagnosis techniques [1]. Tester-based diagnosis techniques use automatic test equipment (ATE) to control the test conditions and load/unload values into a faulty scan chain. By observing defect responses at different test conditions, failing scan cells can be identified. A hardware-based diagnosis technique modifies the conventional scan chains to isolate failing scan cells. Software-based techniques use algorithmic diagnosis procedures to simulate scan patterns and identify failing scan cells. In prior work [2] – [5], one of the hardware-based diagnosis techniques called *reversible scan chain architecture* (RS) was proposed to improve diagnostic resolution. It can change the scan chain shift direction between loading and unloading flush patterns, which increases the possibility to attain perfect diagnostic resolution (i.e., narrow down a defect to a single scan cell). This approach can further reduce the search space in the input and output paths of suspected scan cell(s), which is discussed in detail in Section IV. One of the biggest concerns of such hardware-based solutions is design overhead (area, leakage, etc.) that may be too expensive to adopt in production chips. On the other hand, test chips tend to be more tolerant to design overhead, where the emphasis is on finding defects and process issues for early yield learning, making diagnosis an essential element. However, test chips are typically low in volume and may suffer from worse diagnosis resolution and lower diagnosis convergence due to early process technology and/or high defect density. Hence, it is extremely important in a test chip to enhance diagnosis resolution and increase successful diagnosis data volume without losing accuracy to help expedite yield learning [6]. In this context, we implement the previously proposed reversible scan chain architecture in an industry test chip fabricated in one of the most advanced technology nodes. In this paper, a new diagnosis algorithm is described that pinpoints the defect location by identifying the failing shift direction(s). In addition, the reversible scan chain implementation is evaluated by comparing the effectiveness of its diagnosis against the state-of-the-art. Results show a significant reduction in FA search area. The remainder of the paper is organized as follows. In Section II, we review the reversible scan architecture proposed in [2] – [5]. Section III discusses multi-bit scan element in the context of the reversible scan chain. In Section IV, the proposed diagnosis algorithm is described. In Section V, we present the success criteria and the experimental results to illustrate the diagnostic performance and quality improvement by using the proposed method. FA findings and design characteristics are presented in Sections VI and VII, respectively. This is followed by conclusions and future work in Section VIII. #### II. REVERSIBLE SCAN ARCHITECTURE In prior work [2] – [5], different versions of reversible scan architecture have been proposed. In such architectures, the scan chain performs both forward and backward scan shift to diagnose scan faults. Each scan cell can receive shift values from its neighbor cell either on its left or on its right. Therefore, the scan chain is stitched together in both directions. The scan data can be shifted from left to right (denoted as "L2R") or right to left (denoted as "R2L") by a control signal. A simple scan chain example with 4 scan cells is illustrated in Figure 1 and Figure 2. The signal DIR controls the direction of the scan shift via two-input multiplexers (referred to as direction MUX). The scan path in red color in Figure 1 indicates L2R shift from Cell 3 to Cell 0 (DIR = 0), whereas the scan path in green color in Figure 2 indicates R2L shift from Cell 0 to Cell 3 (DIR=1). Figure 1. Reversible scan chain L2R shift. Figure 2. Reversible scan chain R2L shift. As an example, two U-turn chain patterns are required to locate one single stuck-at-0 permanent fault: - (1) A pattern with all "1"s first shifts N cycles L2R, where N is the scan chain length. Then it makes a U-turn to shift N cycles R2L. We denote such patterns as LRL<sub>1111</sub>. - (2) Similarly, we can have another pattern RLR<sub>1111</sub>, where a pattern with all "1"s first shifts N cycles R2L and then makes a U-turn to shift N cycles L2R. Let us take pattern LRL<sub>111111</sub> as an example illustrated in Figure 3. Assuming Cell 2 has a stuck-at-0 fault, the loaded values of pattern LRL<sub>111111</sub> will be "111000". Then we change the shift direction to unload the pattern from right to left. We will observe "111000" at the scan chain output on the left. Because the first failure is observed at Cell 2, we can be certain that Cell 2 has a stuck-at-0 fault. Figure 3. Identification of the upper-bound suspect cell. Similarly, in Figure 4, if the identified suspect cell using the pattern RLR<sub>111111</sub> results in unloading value "000011", we can deduce that Cell 2 is failing. In both cases, we obtain a single suspect cell. Detailed diagnosis methodology is described in prior work [5]. Figure 4. Identification of the lower-bound suspect cell. Note that, by using the above-mentioned two patterns (LRL<sub>111111</sub> and RLR<sub>111111</sub>), we can only narrow down the defect associated with Scan Cell 2. However, there is not enough information to know whether the defect is at the scan cell input, scan cell output, or internal to the cell. In Section IV, we propose an enhanced diagnosis algorithm that may allow us to further improve the diagnostic resolution. # III. MULTI-BIT SCAN ELEMENTS Multi-bit flip-flops (MBFFs) reduce chip power consumption and area by merging single-bit flip-flops during the physical implementation process [7]. Each MBFF has one scan input and one scan output. Scan chains including MBFFs can employ an existing bidirectional scan technique by treating each MBFF as a single unit. However, diagnostic resolution would suffer because the existing bidirectional scan technique would report all bits of an MBFF as suspects if the multi-bit flip-flop is a suspect. In [8], we introduced an adaptation methodology that does not modify the existing MBFF cell models but, instead, leverages the ability to shift in the opposite direction. Figure 5 illustrates a 3-bit reversible MBFF. In comparison to the conventional MBFF scan chain architecture, the reverse scan insertion procedure adds a two-input multiplexer for each bit (M0-M2) to control shift direction and capture procedure for subsequent scan cells. Each inserted multiplexer, based on the direction signal (DIR), selects either the data input (D0-D2) or the output of the previous scan cell. Each multiplexer output is connected to the data input of the corresponding scan flop. The restitched MBFFs use a new scan enable signal (SE') generated from the original scan enable (SE) and inverted scan direction control (DIR) signal. A scan chain may comprise both single-bit and multi-bit flip-flops. Figure 6 illustrates a scan chain that comprises a three-bit flip-flop, a two-bit flip-flop, and two single-bit flip-flops. A new scan enable signal generation circuitry is configured to generate a signal for both reversed MBFF segments (3-bit and 2-bit flip-flops). Two single bit flip-flops use the original scan enable signal. In this architecture, the scan chain performs a capture operation when the scan enable (SE) and scan direction control (DIR) signals are set to be "0". Scan shift operation in the normal shift direction is conducted when SE and DIR are set as "1" and "0", respectively. Scan data is shifted in reverse direction when both the signals are set to "1". Figure 5. A 3-bit reversible MBFF. Figure 6. Reversible scan with single and multiple bit flops. #### IV. PROPOSED DIAGNOSTIC ENHANCEMENTS In the reversible scan architecture, a fault may only impact a single shift direction (referred to as lane) rather than both directions. In such a scenario, the two U-turn patterns (described in Section II), will not be able to converge on a single suspect cell. A load-unload procedure visualized in Figure 7 and Figure 8 explains the L2R-only (single direction) failure situation. As the first U-turn pattern $LRL_{111111}$ is shifted in over a faulty lane, the downstream cells from the fault site will be loaded with faulty values, which is highlighted in red in Figure 7. Unload proceeds over the fault-free lane, and the first observed faulty value is on Scan Cell 2, which is same as before. On the other hand, as shown in Figure 8, the second U-turn pattern RLR<sub>111111</sub> loaded on the fault-free lane from right-to-left will load all scan cells with good values. Unloading the good values happens via the defective lane, so the first observed faulty value is on Scan Cell 3. Figure 7. Single direction failure: L2R load - R2L unload. Figure 8. Single direction failure: R2L load - L2R unload. In this situation, a minimum of two scan cells are suspected (Cells 2 and 3), because the first failing scan cell from both directions will not converge on a single location. Suspect topology is highlighted in red in Figure 9. Such a diagnosis result could have been an outcome of insufficient sensitive bits, multiple faulty scan cells, or a fault site impacting only a single-lane. Without any robust method to determine the real root cause, the entire suspect network (highlighted in red in Figure 9) would be reported as candidates and examined in FA. Figure 9. Multi-cell suspects diagnosis callout with single-lane failure. In this work, to address the above-mentioned limitations of the prior work and enhance diagnostics resolution, we propose using two extra non-reversible flush patterns to identify the fault types by analyzing if the defect affects both shift directions (called dual-lane failure) or only one shift direction (called single-lane failure). The two extra flush patterns are applied to each scan chain, one from either end with no U-turn. Pattern loaded and unloaded from the left (i.e., the cell connected to the scan chain input), without direction change, will detect the fault type of defective scan cell assuming the chain has a single defective cell. In case of multiple defective cells within the same chain, L2R pattern can detect the fault type for the scan cell that is closest to the scan chain output. Similarly, the pattern loaded and unloaded from the right (i.e., the cell connected to the scan chain output) will detect the fault type of the cell closest to the scan chain input. For single-lane and dual-lane failures, the diagnostic algorithm described next leverages the information about the direction of the failing shift direction to narrow down the suspect count and critical area. #### 1. Dual-Lane Failures If both flush patterns for a single chain fail, then following a diagnostic procedure described in prior work [5], the suspect topology includes the scan cell itself, the most immediate branch of the input scan path and output scan path. Take an example illustrated in Figure 10. Assume a defect is located inside Cell 2, we observe a dual-lane failure. Applying the proposed enhancement, chain diagnosis suspect topology can be limited to the scan Cell 2 itself. Additionally, we will include the most immediate segment of the input scan path network and the stem of output scan path branch, as highlighted in red. Figure 10. Reduced suspect area for dual-lane failure. # 2. Left-to-Right Single-Lane Failure Let us assume that stuck-at-0 defect is located on the A0-input net of direction MUX on the output of Scan Cell 3 in Figure 11. Such a defect will only impact the data coming in from left-to-right whereas reversed scan path will not be affected by the defect. The previous solution in [5] would not be able to distinguish it from cell defect or the output side defect and result in suspect topology as in Figure 9. In contrast, with the proposed enhancement, it can be observed that only the L2R flush pattern fails. Suspect topology can be limited to a single direction MUX and its A0-input net. What is most important, as illustrated in Figure 11, is that the polygons of Cell 3 can be omitted from the suspect topology, which likely reduces FA effort. Figure 11. Reduced suspect polygons for left-to-right singlelane failure. # 3. Right-to-Left Single-Lane Failure RS architecture introduces new fault sites on the reversed shift direction, that can be very beneficial during chain diagnosis. Suppose that a defect is located on the A1-input of direction MUX of Scan Cell 3 in Figure 12. With the proposed enhancement, it can be observed that only the R2L flush pattern fails. Hence, the suspect topology can be limited to a single direction MUX and its A1-input net, as highlighted in red in Figure 12. As with Left-to-Right single-lane defect, Scan Cell 3 can be omitted from the suspect topology. Figure 12. Reduced suspect polygons for right-to-left singlelane failure. # V. EVALUATION CRITERIA The presented enhancements to support MBFFs and enable high diagnosis resolution for single and dual-lane failures are used to evaluate the reversible scan technology in the field of silicon learning for advanced nodes. A set of success criteria is established to compare the diagnostics quality between the new method and state-of-the-art (denoted as Point of Reference or POR): - 1) Failure detection detect equal or higher number of defective devices. - 2) Diagnostics time improve average runtime of diagnosis. - 3) Defect chain level accuracy same chains should be suspected on any given faulty device. - 4) Accuracy check in FA failure analysis can locate the defect in more than 90% of selected dice. - Number of single-suspect diagnostics number of diagnosis reports with 1 suspect per symptom is increased by a factor of 2. - 6) Diagnostics resolution decrease the average number of suspects in volume data. - Enclosing circle reduce the physical search space illustrated in Figure 13 and defined as the minimum radius of circle that encompasses all the symptom bounding boxes on all layers. - 8) Enclosing rectangle reduce the physical search space illustrated in Figure 14 defined as the minimum diagonal of rectangle that encompasses all the symptom bounding boxes on all layers. - Net Area reduce the average physical area of suspect interconnects. - 10) Cell Area reduce the average physical area encompassing suspect cells. - 11) Integration in volume flow solution should offer seamless integration to the existing volume diagnostics flow and support various design features (e.g., multi-bit flip flops). Table 1. Success criteria and comparisons. | | Criteria | POR | RS | Improvement | | |----|------------------------------|------|------|-------------|--| | | Diagnostics attributes | | | | | | 1 | Failure detection | 795 | 1204 | 1.5X | | | 2 | Diagnostics time | 936 | 245 | 4X | | | 3 | Defect chain level accuracy | 42 | 291 | PASS | | | 4 | FA | - | - | PASS | | | 5 | Resolution of single suspect | 126 | 748 | 6X | | | 6 | Diagnostics resolution | 6.85 | 1.77 | 3.9X | | | | Physical attributes | | | | | | 7 | Enclosing<br>circle [µm] | 2066 | 1009 | 2X | | | 8 | Enclosing rectangle [μm] | 1148 | 477 | 2.4X | | | 9 | Net area [μm²] | 3.28 | 1.8 | 1.8X | | | 10 | Cell area [μm²] | 9.57 | 1.37 | 7X | | | | Other | | | | | | 11 | Integration in volume flow | - | - | PASS | | Failure detection is expected to be higher for RS due to additional fault sites in the reverse lane. Silicon results confirmed the assumption and we achieved 1.5X increase in number of failed dice, which can be used for yield learning. Diagnosis throughput is key during volume silicon manufacturing. To identify faulty cell, RS diagnostic algorithm utilizes flush patterns, which don't require expensive computation of circuit capture state. Number of patterns required to guarantee perfect diagnostic resolution is also smaller. This allowed RS to achieve 4x faster runtime. One of the most important factors driving yield improvement is the number of diagnoses with ideal resolution, so that information can be used for process changes and learning. As seen in Table 1, criteria 5 was met and number of perfect diagnosis reports increased by a factor of 6. It means that, with reversible scan architecture, 62% (748 out of 1204) of all cases resolved the failure to a single suspect, whereas the percentage of single-suspect diagnostics is 12% (126 out of 795) for POR. When the physical area suspected by diagnostics is compared between the two methods, we observe that the diameter of the enclosing circle is 2X and the diagonal of the enclosing rectangle is 2.4X smaller for RS. In other words, the area that needs to be investigated during FA would decrease by 2X (or 2.4X depending on the metric). Moreover, diagnosed failures have 1.8X less net area. Most importantly, we see that cell area was reduced by a factor of 7, dramatically decreasing the time and effort needed for physical failure isolation and analysis. The improvements for presented physical factors clearly indicate that the usage of reversible scan architecture can enable diagnosis algorithm to significantly reduce the suspect search area compared to the POR solution. Additional analysis was done concerning diagnosability of a die with multiple failing chains. Out of a total of 735 dice that failed both POR and RS, we found that 185 contain more than 1 failing chain (on average 8.3 failing chains per die). RS was able to diagnose 30% (54 out of 185) of all dice with perfect accuracy for all failing chains. POR was able to achieve that for only 4% of dice (8 out of 185). Additionally, POR reported 31% (58 out of 185) cases with no suspects. There were no such cases for RS. Figure 13. Defect enclosing circle. Figure 14. Defect enclosing rectangle. Additional benefit of reversible scan chain diagnosis solution was the ability to not disrupt the established volume diagnosis flow, either in the form of introducing new elements in the tool chain, or pre/post processing of diagnosis input or output data. ## VI. FAILURE ANALYSIS One of the most important success criteria is checking for the accuracy of diagnostics suspects through FA. At the time of publication, Laser Voltage Imaging (LVI) was conducted on 2 dice selected for physical analysis. It is worth mentioning that setting up LVI analysis did not come with additional work related to design containing reversible scan. # 1. Double-lane suspect POR diagnosis identified 2 neighboring flops 3 and 4 as suspect. RS diagnosis identified a single scan flop 4 with indication of clock tree defect relation. Defect affected both shift directions and exhibited intermittent behavior at low test voltage, and permanent at high test voltage for both RS and POR. To corroborate diagnosis findings, LVI used RS test vectors and found, that during flush application. the clock signal is present at cell 3, which drives suspect cell 4. Clock signal was also present on other flops except bit 4. This indicated a discontinuity on the clock signal branch feeding only a single cell (marked in red circle in Figure 15). Figure 15. Laser Imaging overlayed with scan schematic. Figure 16. Double-lane defect position data impact. # 2. Single-lane suspect POR diagnosis reported 9 flops in range 71-79 during low voltage testing. Suspect range narrowed down to 2 flops in range 78-79 during high voltage test. RS diagnosis called out position 79 for all voltage levels, with indication that the defect is in the interconnect logic outside of scan cell. This was because only L2R lane was failing. Investigation using LVI for comparing POR vector application imaging and RS vector application was done. During POR mode, the data goes from positions closer to scan-in to scan-out, and the data transfer direction is reversed during RS vector application. LVI images in Figure 17 proved that defect only impacts L2R shift direction. Figure 17. LVI comparison of POR and RS activity. Based on LVI, it appears that the failure is isolated between the output of bit 78.2 (buffer) and the input of bit 78.1 (multiplexer) as shown in Figure 18. The RS vector passed because the reversed shift does not go through input A of multiplexer 78.1. Figure 18. Single-lane defect position data impact. According to feedback from LVI team, results were clear enough to confirm diagnosis accuracy. There was no need for further FA effort. ## VII. DESIGN CHARACTERISTICS It is important to establish the hardware cost of the evaluated solution. The comparison had been done between identical blocks, one of which had been instrumented with reversible scan capability while the other was not. The Key Performance Indicators listed in Table 2 show that 11% of cell count increase is in line with expectation, since there is additional shift direction mux added for every flop. The overall hardware utilization cost is 5%. A big part of the overhead is the costly re-stitching of connections outside of MBFFs. A mitigation strategy is described in Section VIII. Table 2. Design overhead of reversible scan chains. | Key Performance Indicators | Change | | |------------------------------|--------------|--| | Leakage power | 11% increase | | | Cell count increase | 11% increase | | | Routing wire length increase | 15% increase | | | Place and Route | ~5% increase | | | DRC effort | High | | Designs with a high percentage of scan cells which are packed into MBFF arrays would be a very expensive solution for production chips, but test chips may afford the overhead in design. #### VIII. CONCLUSIONS AND FUTURE WORK Reversible scan chain is a design-for-diagnosis technique to improve the quality of chain diagnostics. It pinpoints the defective scan cell by shifting in and shifting out the test data in forward (i.e., from the scan cell closest to the scan in to the cell closest to the scan out) as well as reverse (i.e., from the scan cell closest to the scan out to the cell closest to the scan in) direction. This paper describes (a) the diagnostics evaluation of the implementation of the reversible scan chain architecture in an industry test chip manufactured in a bleeding-edge process node, and (b) a new method that endeavors to further narrow down the diagnostics candidates by identifying whether a defect affects one of the shift directions (single-lane fails) or both the shift directions (dual-lane fails). Comparison with the state-of-the-art demonstrates the superiority of reversible scan for diagnostics performance as well as quality. Specifically, the proposed solution spends 4X less runtime, produces 6X more ideal diagnostics, and reduces physical search space for failure analysis by at least 2X (depending on the metric). LVI confirms that the increased physical resolution does not sacrifice accuracy. Additionally, the identification of multiple failing chains is more accurate; about 7X more diagnostics correctly identify multiple failing chains compared to the state-of-the-art. The impressive effectiveness of the proposed solution comes however at the cost of design area, routing complexity and leakage power. There are at least a couple of ways that are worth exploring to address the design overhead. First approach is to reduce the area overhead by compromising the diagnosis resolution for multi-bit flops. The idea is to treat the MBFF as a single cell (instead of individual bits) and design the reverse scan architecture at the cell level instead of bit level. For certain defect types (e.g., hard defects) where electrical/optical fault isolation is typically skipped, the physical FA effort may remain the same because the entire cell irrespective of the bitlevel suspects will most likely be examined. On the other hand, for defects causing voltage-sensitive failures, more electrical/optical fault isolation resources may be consumed because the number of transistors that need to be analyzed increases. Eventually, the goal is to find the right tradeoff between design overhead, diagnostics quality, and FA resources. #### ACKNOWLEDGEMENT This research was possible thanks to Bipin Duggal and designers for implementing the reversible chain in the design, Ranjit Kurra for providing test data with debug efforts, Lesly Endrinal for providing FA results and analysis, Anumita Gupta for providing diagnosis data, and Spencer Chang for executing various analyses. #### REFERENCES - Y. Huang, R. Guo, W.-T. Cheng, J. C.-M. Li, "Survey of Scan Chain Diagnosis," IEEE Design and Test, Volume 25, May-June 2008, pp. 240 - 248. - [2] P. Song, "A New Scan Structure for Improving Scan Chain Diagnosis and Delay Fault Coverage," Proc. 9th IEEE North Atlantic Test Workshop (NATW), 2000, pp. 14-18. - [3] "Bidirectional Scan Chain for Digital Circuit Testing," IP.com Number: IPCOM000160595D. - [4] Y. Huang and W.-T. Cheng, "On Designing Two-Dimensional Scan Architecture for Test Chips," International Symposium on VLSI Design, Automation and test (VLSI-DAT), 2017. - [5] Y. Huang, S. Urban, W.-T. Cheng, M. Sharma, F. Niu, J. Zhong, W.-L. Hsu, "Reversible Scan Based Diagnostic Patterns," International Symposium on VLSI Design, Automation and Test (VLSI-DAT), 2019. - [6] S. Mittal, Z. Liu, B. Niewenhuis and R. D. S. Blanton, "Test chip design for optimal cell-aware diagnosability," 2016 IEEE International Test Conference (ITC), 2016. - [7] C. Santos, R. Reis, G. Godoi, M. Barros and F. Duarte, "Multi-bit flip-flop usage impact on physical synthesis," 2012 25th Symposium on Integrated Circuits and Systems Design (SBCCI), 2012. - [8] W.-T. Cheng, S. Urban, J. Janicki, M. Sharma, Y. Huang "Reversible Multi-Bit Scan Cell-based Scan Chains For Improving Chain Diagnostic Resolution" US patent 11156661.