8.25m Pitch 66% Fill Factor Global Shared Well SPAD Image Sensor in 40nm CMOS FSI Technology

Citation for published version:

Link:
Link to publication record in Edinburgh Research Explorer

General rights
Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights.

Take down policy
The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact openaccess@ed.ac.uk providing details, and we will remove access to the work immediately and investigate your claim.
8.25µm Pitch 66% Fill Factor Global Shared Well SPAD Image Sensor in 40nm CMOS FSI Technology

T. Al Abbas¹, N.A.W. Dutton², O. Almer¹, F.M. Della Rocca¹,², S. Pellegrini², B. Rae², D. Golanski³ and R.K. Henderson¹

¹School of Engineering, The University of Edinburgh, Edinburgh, UK, email: tarek.alabbas@ed.ac.uk
²Imaging Division, STMicroelectronics, Edinburgh, UK, ³STMicroelectronics, Crolles, France

Abstract—We present the first single photon avalanche diode (SPAD) device and image sensor realized in a customized 40nm CMOS front side illuminated (FSI) technology. The 96×40 array utilizes a global shared well layout structure with up to 66% fill factor at 8.25µm pitch and median dark count rate (DCR) less than 70cps at 1V excess bias. A rising edge to rising edge time gating technique is demonstrated achieving a minimum time gate of 360ps FWHM.

I. INTRODUCTION

A common issue when designing planar FSI SPAD image sensors is the tradeoff between sensitivity and integrated functionality due to SPAD and circuitry sharing the pixel area. The relatively large SPAD guard ring structure and the restrictive well spacing rules (SPAD-to-SPAD or SPAD-to-circuit) limit the array design options.

The first CMOS SPAD arrays relied on a standalone SPAD layout which resulted in a large pixel pitch and a low fill factor [1]. As these sensors advanced, SPAD well sharing techniques were proposed [2] and optimized circuit designs which significantly improve fill factor and reduce pixel pitch were adopted, initially in the form of single strip sharing [3] and evolving to double strip sharing and all NMOS circuitry reaching 26.8% fill factor at 8µm pitch [4].

Global well sharing where the SPAD and circuitry are decoupled from each other allowing for a continuous detector array layout with high fill factor has been demonstrated, yet it has been restricted to line sensors [5] and silicon photomultiplier arrays (SiPMs) [6] due to routing complexity and limited 2D scalability.

In this work we demonstrate that global well sharing can be extended to low resolution image sensors by taking advantage of the high routing density of advanced CMOS nodes. Hence we present the first SPAD device and image sensor realized in a customized 40nm CMOS FSI technology. Four trials of the 96×40 array have been fabricated with fill factor up to 66% at 8.25µm pitch.

II. SENSOR LAYOUT

Figure 1 shows a micrograph of the 1mm² sensor highlighting the imaging array and the banks of electronics above and below its periphery. Figure 2 shows a layout view of the bottom right corner of the array where the interface between the shared well SPAD array and the processing electronics is visible. The vertical (green) tracks represent anode connections routed from the array to the corresponding pixel circuits underneath. The SPAD structure has been ported from previous generation 130nm node to 40nm and has a p-well (PW) to deep n-well (DNW) junction with retrograde guard ring as described in [7]. A cross section of the shared well SPAD and its construction is shown in Figure 3.

Figure 4 illustrates the routing of the SPAD anodes in both top and bottom directions to maximize the resolution along the y-axis. Metal routes run over the SPAD guard ring region with the top most SPAD in each half of the array connecting to the first circuit block in order to balance route lengths. The maximum number of connections per column is determined by the metal pitch, spacing rules and metal layer availability in the process. Route length parasitics and potential electrical crosstalk are also a factor.
A drawback of such layout is the need to duplicate some resources such as clock trees as shown in sensor’s block diagram in Figure 5. This increases the overall area of the design and its power consumption.

Although such layout is not as fully scalable as its 3D-stacked counterpart [8], it allows for reduced pixel pitch and higher fill factor than traditional layout techniques in FSI implementations providing a pathway to miniature application specific sensors.

A drawback of such layout is the need to duplicate some resources such as clock trees as shown in sensor’s block diagram in Figure 5. This increases the overall area of the design and its power consumption.

Although such layout is not as fully scalable as its 3D-stacked counterpart [8], it allows for reduced pixel pitch and higher fill factor than traditional layout techniques in FSI implementations providing a pathway to miniature application specific sensors.

Figure 6 shows a comparison of the achievable fill factor and pixel pitch for the different layout styles. The model takes into account process specific spacing rules and implant dimensions for what is considered a conservative SPAD structure and it assumes a 50:50 split between the pixel area dedicated for SPAD and circuit. Global well sharing has a clear advantage in terms of maximum imaging array fill factor and minimum pixel pitch attainable.

Figure 7 models the fill factor versus pixel pitch for global well sharing considering both conservative and aggressive SPAD parameters. The four crosses (red) represent the different trials at 8.25µm pitch fabricated with fill factors ranging from 39% up to 66% by reducing the size of the SPAD guard ring.
III. PIXEL ELECTRONICS

The pixel circuit (also at 8.25µm pitch) is composed of a thick oxide front end to accommodate excess bias voltages up to 3V followed by thin oxide low power 40nm logic as shown in Figure 8. When operated in photon counting mode, a configurable 12-bit ripple counter provides a full well (FW) capacity of 4095 photons with no noise from accumulation or readout due to the digital architecture. Figure 9 shows a grayscale intensity image captured with the sensor in room conditions and 7ms exposure time.

![Pixel circuit diagram](image)

**Figure 8.** Pixel circuit diagram.

In time gated mode the counter splits into three 4-bit bins operating in parallel. An all rising edge gating has been implemented as outlined in Figure 10 in the context of fluorescence lifetime imaging (FLIM). The time gates are externally programmable and are defined as the time difference between the rising edges of gating signals G1 to G4 as opposed to the conventional square-like gate. This dual pulse (rising edge to rising edge) approach has the advantage of eliminating gate width mismatch due to imbalanced rising and falling times of gate drivers and overcoming the problem of single pulse time gate (rising edge to falling edge) where the column RC bandwidth limits the minimum gate duration that can be propagated.

When a SPAD fires, its rising edge samples the state of gating signals and the in-pixel decision logic increments the corresponding bin slowly building a coarse histogram of the scene. Having parallel time gates increases the photon collection efficiency per laser repetition but reduces the bin depth requiring multiple readouts before enough photons are collected for post processing.

IV. RESULTS

The SPAD’s DCR has been characterized at room temperature and different excess bias voltages for the four implemented trials. Figure 11 shows how DCR increases as the guard ring dimensions are pushed to the limit suggesting the onset of edge breakdown. Although this is a 40nm process the median DCR measured for all trials is below 70cps at 1V excess bias which compares well against other CMOS implementations [2][7] due to the customized SPAD implants.

![SPAD DCR versus excess bias](image)

**Figure 11.** SPAD DCR versus excess bias at room temperature.

The time gates have been characterized by sweeping a Hamamatsu PLP10 443nm laser in time using a DG645 Stanford Delay Generator in steps of 25ps. A minimum time gate of 360ps FWHM has been measured which is a 2× improvement to the reported state of the art for time gated SPAD image sensors [9]. Figure 12 shows the time gate profile of bin 1 of a randomly selected pixel with the distribution across the array inset (σ of 31ps).

![Time gate profile](image)

**Figure 12.** Time gate profile of bin 1 of a randomly selected pixel with the distribution across the array inset (σ of 31ps).
rising and falling times creates a dead zone in between contiguous gates resulting in loss of photons.

To characterize this behavior, the DG645 box was used to generate two contiguous square-like 20ns wide time gates which were connected to bin 1 and bin 2 of the array respectively. The laser was swept through in steps of 100ps and the cumulative counts of the sensor are plotted against time. Figure 13 shows the normalized response of the two bins (time gates) and an undesirable dip in photon counts is clear at the handover interface between them.

On the other hand, the same experiment was repeated for the rising edge to rising edge technique with the delay generator box outputting three edges defining the 20ns time gates. Figure 14 shows the improved response obtained where the dip is no longer present suggesting a continuous handover. This is due to the fact that the second edge acts as the ending and starting edges of bins 1 and 2 respectively.

V. CONCLUSION

Implementation of SPAD devices in technology nodes as advanced as 40nm with customized implants to improve SPAD performance is maturing. Such nodes allow for unique global well sharing layout styles taking advantage of the high routing density achieving small pitch and high fill factor miniature arrays albeit with limited scalability. Edge to edge time gating techniques coupled with edge triggered circuits which can be compactly implemented in deep submicron technologies offer an opportunity for optimizing the time gate minimum width and efficiency for demanding applications such as FLIM.

ACKNOWLEDGMENT

The authors are grateful to The University of Edinburgh and PROTEUS project (http://proteus.ac.uk) for funding this work (EPSRC grant number EP/K03197X/1) and POLIS project (http://polis.minalogic.net) and ST Crolles for providing silicon.

REFERENCES