Edinburgh Research Explorer

A High Dynamic Range 128×120 3D-Stacked CMOS SPAD Image Sensor SoC for Fluorescence Microendoscopy

Citation for published version:
https://doi.org/10.1109/JSSC.2022.3150721

Digital Object Identifier (DOI):
10.1109/JSSC.2022.3150721

Link:
Link to publication record in Edinburgh Research Explorer

Document Version:
Peer reviewed version

Published In:
IEEE Journal of Solid-State Circuits

General rights
Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights.

Take down policy
The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact openaccess@ed.ac.uk providing details, and we will remove access to the work immediately and investigate your claim.
A High Dynamic Range 128×120 3D-Stacked CMOS SPAD Image Sensor SoC for Fluorescence Microendoscopy

Ahmet T. Erdogan, Tarek Al Abbas, Member, IEEE, Neil Finlayson, Member, IEEE, Charlotte Hopkinson, Istvan Gyongy, Oscar Almer, Member, IEEE, Neale A. W. Dutton, Senior Member, IEEE, and Robert K. Henderson, Fellow, IEEE

Abstract—A miniaturized 1.4mm × 1.4mm, 128 × 120 single photon avalanche diode (SPAD) image sensor with a 5-wire interface is designed for time-resolved fluorescence microendoscopy. This is the first endoscopic chip-on-tip sensor capable of fluorescence lifetime imaging microscopy (FLIM). The sensor provides a novel, compact means to extend the photon counting dynamic range (DR) by partitioning the required bit-depth between in-pixel counters and off-pixel noiseless frame summation. The sensor is implemented in STMicroelectronics 40nm/90nm 3D-stacked backside-illuminated (BSI) CMOS process with 8 µm pixels and 45% fill factor. The sensor capabilities are demonstrated through FLIM examples, including ex-vivo human lung tissue, obtained at video rate.

Index Terms—CMOS image sensor, CIS, fluorescence lifetime imaging microscopy, FLIM, single photon avalanche diode, SPAD, microendoscopy, time gating, time-resolved, high dynamic range, HDR, chip-on-tip, system-on-chip, SoC, 3D-stacking.

I. INTRODUCTION

Ultra small form factor cameras enable minimally invasive surgical procedures and diagnostics in lung, blood vessel and urinary tract inspection [1]. A trend towards disposable “chip on tip” endoscopes to avoid laborious and sometimes imperfect sterilization procedures is made possible by low cost, nanoscale CMOS image sensor (CIS) manufacturing [2]. The literature includes examples of recent developments addressing some of the challenges imposed by miniaturization. A 10k pixel sensor with a 3 µm pixel pitch was presented in [3] which utilizes a current based analogue to digital converter (ADC) to overcome power supply noise over long cabling distances associated with some endoscopy applications. The work in [4] highlights the challenges of wafer level chip scale packaging (WL CSP) such as loss of peripheral area due to dicing and packaging wall spacers and compatibility with microlensing.

Other works focus on optimizing the optical efficiency of the focal plane as the pixel area in CMOS image sensors tends to be rectangular in dimension while the lenses are spherical in shape resulting in aberrations and optical distortion at the edges of the rectangular image. To overcome that, a rectangular IC with an octagonal pixel array was proposed in [5] to match the focal plane of the image sensor and the lens. More recently, a CIS with a unique octagonal layout was demonstrated in [6]. The IC is sawn in an octagonal form such that it maximizes the imaging area when placed on the tip of conventionally cylindrical endoscopes.

On the other hand, other efforts target the endoscopy module or hardware system design in which the miniature sensor is a main component. A disposable camera system comprising a CIS, optics and a light emitting diode (LED) was presented in [7] for minimally invasive surgery. Miniature low power cameras are also being developed for ingestible pill endoscopy [8], [9]. A low power wireless capsule endoscope for fluorescence imaging based on a SPAD sensor was presented in [10], [11] including optics, an LED source and a battery unit.

Currently all endoscopy sensors are tailored for capturing intensity-based images and therefore lack time-resolved capability. A time-resolved endoscope would be able to identify the presence of pathogens by measuring their autofluorescence lifetime against background tissue or by measuring changes in the lifetime of actively introduced biomarkers that bind to targeted molecules [12]-[15]. Such specificity in detection would inform treatments prescribed by clinicians and assist in fluorescence-guided surgical oncology for cancerous tissue removal [16]. It would also open new application avenues such as 3D imaging guided keyhole surgery and physical environment characterization such as pH sensing by Raman probes [17].

There are several challenges associated with designing time-resolved SPAD sensors including the detector integration...
In CMOS, pixel sensitivity and functionality trade-off due to the sophisticated embedded processing and the high data rates associated with complex time-resolved systems [18], [19]. Meeting these challenges becomes harder in the context of designing a miniature sensor where the silicon area is restricted and the data bandwidth is limited. In addition to employing advanced smart layout, pixel and system architectures, the advent of advanced CMOS technologies such as 3D-stacking opens the door to new design possibilities and higher built-in intelligence [20].

In this paper, we introduce a 128 x 120 SPAD array time-resolved image sensor for microendoscopy. A full microendoscopy system, integrating the image sensor introduced here with light sources and excitation/emission filters, is currently being developed. As part of this development work, the sensor has already been demonstrated working successfully in an untethered 1.5m operation mode. The sensor is integrated in a 3D-stacked 90nm/40nm CMOS process, measuring 1.4 mm x 1.4 mm, and having a 5-wire interface. Compared to commercially available endoscopic CMOS image sensors, this is the only sensor which enables video rate time-resolved capability, such as fluorescence lifetime imaging microscopy (FLIM), in order to provide clinicians with more informative diagnostic tools. In addition, we introduce a partitioned photon counting scheme between in-pixel counters and dense SRAM external to the pixel array, allowing for dynamic range (DR) extension by on-chip noiseless frame summation. The sensor was first presented in [20]. More details about the sensor architecture, its implementation and extended characterization results, including initial and indicative ex-vivo FLIM imaging examples, are reported here.

The paper is organized as follows. Section II describes the sensor architecture, Section III introduces dynamic range extension by performing on-chip oversampling. Section IV provides characterization results and Section V concludes the paper.

II. SENSOR DESIGN

A photomicrograph of the sensor top tier with bottom tier blocks overlaid is shown in Fig. 1. The 1.4 mm x 1.4 mm chip is implemented in STMicroelectronics 3D-stacked CMOS process with a 40 nm bottom tier and a BSI 90 nm top tier at 8 μm pixel pitch and 45% fill factor. The bottom tier sensor block diagram in Fig. 2 consists of a 128 x 120 SPAD pixel array with peripheral addressing and readout blocks, a micro-control unit (MCU), a ring oscillator (RO) based gate generator and distribution clock trees, a power generation network (with power-on-reset (POR), bandgap (BG), and 1.1V voltage regulator), and a 5-wire IO interface (VHV, VDD, GND, CLK and bidirectional DATA). The top tier chip (not shown) contains a corresponding array of 128 x 120 global shared well SPADs with one SPAD per pixel connected to the bottom tier chip via a 1-to-1 hybrid-bond site.

A. Pixel Architecture

The pixel circuit diagram is shown in Fig. 3. The front end is made of three thick oxide transistors MQ, M0 and M1. MQ is the passive quench and recharge transistor while M0 and M1 form an inverter operating at 1.1 V for direct voltage level shifting. This is followed by a 14-bit ripple counter which is triggered by the SPAD’s leading edge. A triple input (ROW, COL and OVF) AND gate is used to implement the gating function where the counter overflow flag (OVF) is an active low signal based on the four most significant bits (MSBs) going high yielding a maximum photon counting capacity of 15360 events. If the pixel reaches its saturation limit, the OVF signal blocks the counter from receiving further SPAD pulses to prevent roll-over. All of the logic is implemented using 1.1 V thin oxide low power standard cells with the exception of the ripple counter bits Q<1:13> where optimized custom D-type flip-flops (DFF) were used for area saving. The custom DFF cell consists of seventeen transistors which make it much smaller than the standard DFF cell, providing 40% saving in area which translates into more bits per pixel. Being digital, the pixel counter suffers from no accumulation noise and no
added readout noise as in analogue implementations [22]. Moreover, a digital counter provides a readily digitized output eliminating the need for complex analogue to digital converters (ADCs) which consume power and area. Conventional column parallel readout of the buffered counter bits is implemented through transmission gates driven by row select signals.

The pixel gating logic allows the sensor to operate in four different modes based on the state of the ROW and COL signals, as shown in Fig. 4. For rolling shutter (RS) operation, the COL signal is globally held high and the exposure period is defined by the ROW signal which is high throughout the rolling process until the row is selected for readout. Alternatively, in time gated RS mode, the COL signal is globally pulsed to generate intermittent time gates within the rolling exposure period. For global shutter (GS) operation, the ROW signal for all rows is globally held high and the COL signal is what determines the exposure period. It can either be set high for a predefined time interval or pulsed to generate time gates. The RS mode readout follows in a similar fashion to conventional image sensors. ROW signals are generated by the row scanner block in the imaging array and COL signals are generated by the gating logic block where both blocks are driven by the MCU.

The SPAD device implemented in this chip is based on the same p-well (PW) to deep n-well (DNW) junction SPAD described in [23]. This structure is proven to be scalable down to 3 µm pixel pitch [24].

B. Array Readout

The readout of the imaging array is carried out by two subblocks: row scanner and readout multiplexer. The row scanner is a standard shift register with a token inserted at the top of the chain and shifted down to select rows for readout. When array readout is initiated, three operations take place: LATCH, RESET and READOUT.

During the LATCH phase, the column bus data of the selected row is allowed some time to settle and is then latched into a 1792-bit (= 14-bit x 128 columns) temporary register memory. Next, the pixel counters are reset by the row RESET phase. This is followed by the READOUT phase where 28 bits of two consecutive pixels from the temporary register are funneled through the readout multiplexer (MUX) by using a 6-bit code to address the 64 pairs of pixels (i.e. corresponding to 128 pixel columns). All readout control signals and MUX addresses are issued by the MCU. Fig. 5 shows a typical timing diagram of a single selected row in RS mode as an example.

C. Micro-Control Unit (MCU) and SRAM Memory

A block diagram of the MCU is shown in Fig. 6. It was designed to perform five main tasks: 1) Control the pixel array readout timing based on the selected mode of operation; 2) Manage the IO interface to read out frames or read in configuration register settings; 3) Act as a configuration block by hosting a wide range of programmable register settings that configure the various modes of operation and the settings of all other sub-systems such as the gating logic and power management; 4) Manage the data flow from the imaging array to the SRAM memory banks based on the selected mode of operation; 5) Perform simple arithmetic operations such as frame summation or comparison to predefined threshold values.

Aside from array readout routines, data transfer and configuration setup, the most important feature of the MCU is its data flow and frame manipulation functionality. This feature is at the heart of the SoC design and is aimed at: 1) Improving the sensor’s dynamic range; 2) Mediating data flow to reduce data rates and improve frame rates given the single data pad output which is necessary for a miniature system.

The MCU supports a total of 28 sensor operation modes with different properties such as frame rate, bit depth and dynamic range. As summarized in Table I, these modes include 4 pixel modes (GS, GS with gating, RS, and RS with gating); 3 adder modes (32-bit integer, 16-bit integer, and 16-bit floating point); 2 integration modes (using one or two SRAM banks); and 2 readout modes (integrating before readout and integrating while readout). The two readout modes are independent of the four pixel modes. While pixel modes determine how the pixel array is readout into SRAM banks in the MCU where a number of image frames are
TABLE I

<table>
<thead>
<tr>
<th>Pixel modes</th>
<th>GS</th>
<th>Good for imaging fast changing scenes</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>RS</td>
<td>Good for imaging still or slow changing scenes</td>
</tr>
<tr>
<td></td>
<td>Gated GS or RS</td>
<td>Used for time-resolved imaging</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Adder modes</th>
<th>32-bit integer</th>
<th>Used with the two SRAM banks integration mode for high dynamic range imaging but at a slower frame rate</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>16-bit integer</td>
<td>Provides limited dynamic range improvement (by allowing summing up to 4 frames only) but at a higher frame rate</td>
</tr>
<tr>
<td></td>
<td>16-bit floating point</td>
<td>Provides compromise between 32-bit and 16-bit integer adder modes</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Integration modes</th>
<th>One SRAM bank</th>
<th>Used with 16-bit adder modes, providing higher frame rate at reduced dynamic range</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Two SRAM banks</td>
<td>Provides high dynamic range at reduced frame rate when used with 32-bit adder mode and high frame rate at reduced dynamic range when used in 16-bit adder modes (with one SRAM used for integration while the other used for readout in ping-pong fashion)</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Readout modes</th>
<th>Integrating before readout</th>
<th>Used with 32-bit adder mode, providing high dynamic range at reduced frame rate</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Integrating while readout</td>
<td>Used with 16-bit adder modes, providing high frame rate at reduced dynamic range</td>
</tr>
</tbody>
</table>

833 µm. Each SRAM bank is configured as 8192 words and 32-bits/word, hence providing a 256 kbit memory capacity. This allows up to 32-bit/pixel photon counting capability when two SRAM banks are used together, providing high count rate, SNR and DR; or utilizing only one of the SRAM banks in a 16-bit/pixel counting mode for improved frame rate at the expense of reduced count rate, SNR and DR. As part of the digital design stage, the two SRAM banks were integrated into the MCU block functionality.

D. Power Network

Since the number of IO pads is limited there is a requirement to generate and regulate the necessary voltages internally through an on-chip power network. As shown in Fig. 7, the core blocks making up the power network are 1) Power-on-reset (POR) for initializing the sensor upon start-up; 2) Bandgap (BG) with multiple reference voltages (0.4 V, 0.9 V and 1.1 V); 3) Voltage regulator for generating the core logic 1.1 V supply with up to 20 mA current supply. These three blocks were provided by STMicroelectronics as standard IP blocks in their 40 nm CMOS process. The power generation network requires two supply inputs, 2.8 V and ground, both supplied through a VDD and a GND IO pads. The configuration registers which are addressable by the MCU ensure that the sensor starts with the correct initial conditions and allows reconfiguring several options in IP blocks such as voltage and current trimming values of the BG to account for process variability. When the 2.8 V supply is ramped up the POR reset signal is released and the BG starts functioning. After some start up delay the BG reference voltages are generated and the BG_OK flag is raised high, indicating that it is ready for operation. Following that the voltage regulator starts up and the 1.1 V supply is generated.

E. Gating Logic

For time-resolved imaging, a time-gating approach was preferred over time stamping or time correlated single photon counting (TCSPC) in order to maintain pixel simplicity, leave enough area for counter bits, and avoid high throughput parallel readout to keep up with the high data rates generated by TCSPC systems which conflict with the miniaturization target of this work. It is possible to extract temporal information by simply collecting photons in two time gates and therefore determine parameters such as fluorescence lifetime (by rapid lifetime determination (RLD) [26]), or distance and depth (by indirect time of flight method [27]). This trades-off the accuracy of the statistical approach of histogramming time stamps of captured photons for pixel simplicity and compressed data rates.
On-chip time gate generation logic was implemented based on a fully digital custom design handcrafted in the analogue flow with the following design objectives: 1) Compactness for minimal chip area overhead; 2) Programmable for flexible and adaptive operation; 3) Sub-nanosecond temporal resolution to allow for very short width time gates. The design is based on a ring oscillator (RO) and programmable ripple counters to produce three time gates. Two of the time gates (GATE_A and GATE_B) are broadcast globally to the imaging array via balanced binary trees. These time gates can also be interleaved to odd and even columns to minimize motion distortion.

Fig. 8 shows a simplified block diagram of the gating logic. Each time gate is implemented by utilizing two 12-bit counters that operate in three phases which are derived from the system clock. During the first phase both counters are set to logic high and in the second phase each counter is set to a programmable 11-bit value by selectively resetting bits to logic low. The twelfth bit is always reset to logic low in this phase. During the third phase, the high speed RO clock is enabled and both counters start to count down until they roll back over to all logic high which is a different point in time for each counter based on their programmed values. Once the twelfth bit goes high for any of the counters, a rising edge is generated. The twelfth bit is used instead of a zero-crossing detection to avoid meta-stability due to different bit settling times when comparing 12 counter bits and because the MSB of a counter stays high for much longer than a single RO clock cycle, allowing sufficient time for signal propagation delays. The two rising edges generated by the pair of counters (with Edge 2 being inverted) feed into an AND gate which generates a corresponding time gate. While the lower count value of the two programmable counters defines the time offset in terms of number of RO clock cycles between the rising edges of the generated gate and the system clock in phase 3, the count difference between the counters defines the time gate width in terms of number of RO clock cycles. Fig. 9 shows an example timing diagram for one counter pair where the time offset and the width of the generated time gate are both set to 2 RO clock cycles.

There is inefficiency in generating time gates in this implementation since a counter pair requires a three-phase operation during which only one phase generates a time gate. This inefficiency is removed by employing 3 counter pairs (instead of one) with a different order of the three-phase operation for each counter pair. At any system clock cycle one of the counter pairs is set to all logic high, the second is set to a programmable count value while the third is generating a time gate by the mechanism described above. This guarantees continuous time gate generation with every system clock cycle at the expense of increased circuit area for the gating logic. However, this increase in area is insignificant when the total sensor area is considered. Counter pairs for each time gate are programmed independently and hence the generated time gates can overlap or be delayed in time in respect to each other as necessary.

The temporal dynamic range of the time gate generation logic is dependent on the system clock frequency. An 11-bit value was used to ensure that as many RO cycles as required can be counted within a system clock period. It is also possible to divide the system clock frequency and use the divided version to drive the gating logic in order to increase the temporal coverage or dynamic range. On the other hand, the temporal resolution is defined by the RO clock period which is ~390 ps.

The RO also has configurable settings controlled by the MCU allowing for it to be turned off for saving power when time gating is not used. Moreover, the RO employs a self-reset function every system clock cycle to minimize accumulated jitter. Since no process-voltage-temperature (PVT) compensation mechanism is integrated on-chip through a phase locked loop (PLL) or a delay locked loop (DLL) due to area constraints, allowing the RO to free run for long periods of time would result in accumulation of non-linearity and hence integrated jitter [28]. In order to minimize this jitter, the self-resetting function stops, resets and restarts the RO with every system clock cycle. By design it is ensured that self-resetting consumes minimal time such that the temporal dynamic range is not reduced. Fig. 10 shows a timing diagram of the RO in self-reset mode. In terms of PVT variations, for our sensor we expect the voltage and temperature variations to be small since the 1.1 V core logic power supply is regulated on-chip (as part of the power generation network) and the sensor’s normal use is inside a human body where the temperature is ~37 C. This leaves the process variation as the
main factor affecting the RO clock period. There is no direct access to the RO clock to monitor and compensate it for PVT variations directly. However, variations in RO clock periods among different chips can be inferred indirectly by making use of the gating logic. For a set time gate width, a pulsed laser can be swept in small time steps across the time gate and photon counts recorded. Then time gate profiles for different chips can be reconstructed from the recorded photon counts and compared to each other. The time difference in reconstructed time gate profiles will be proportional to difference between their RO clock periods. This process can be used to tune the RO’s supply voltages (hence their clock periods) via MCU configuration register settings, until the difference between time gate profiles is minimized.

III. DYNAMIC RANGE ENHANCEMENT

In fluorescence microendoscopy applications large areas of specular reflections occur at the smooth and watery surface of abdominal organs under endoscope illumination [29]. These specular reflections disturb the surgeon’s observation and judgment. To avoid the resulting image saturation and clipping artifacts whilst preserving low intensity detail we have implemented a high DR scheme in our sensor.

We define the DR within a scene as:

\[ DR = 20 \times \log(S/R) \]  

where \( S \) is the saturation signal level and \( R \) is the readout noise floor. Since SPADs with digital counters exhibit shot noise limited performance [2], the only noise source is the dark count rate (DCR) of the SPAD which in value is dependent on exposure time. Since the mean DCR value of a SPAD can be measured, it can be corrected for by subtraction and hence the minimum observable signal is not the DCR value itself but rather its shot noise component (i.e. square root of DCR). Therefore the DR can be redefined as follows:

\[ DR = 20 \times \log(S/\sqrt{\text{DCR}}) \]  

There are two options for improving the DR; one can either extend the maximum signal level to accommodate the higher end of light intensities or reduce the noise floor to extend the DR towards the lower end of the scale. SPAD devices can intrinsically offer a wide DR response in excess of 100 dB by virtue of low DCR noise floor and high maximum count rate due to short device dead-time (Tdead). For our passively quenched SPAD devices the maximum count rate is set by \( 1/(\exp(1) \times \text{Tdead}) \) [32]. Based on SPAD characterization results presented in [33], Tdead is \( \sim 3.5 \) ns and hence for our sensor the maximum count rate is in excess of 100 Mcps.

However, it is difficult to capture this response as it requires a very large pixel photon counting capacity which translates to a large bit width counter with negative impact on pixel pitch and fill factor.

The oversampling method allows achieving high DR by summing multiple frames while keeping the pixel pitch small. Fig. 11 illustrates this for an \( Nc \) columns by \( Nr \) rows pixel array by replacing \((N+k)\)-bit in-pixel counters with a combination of smaller \( N \)-bit in-pixel counters, \((N+k)\)-bit SRAM memories for storing accumulated pixel counts, and a shared \((N+k)\)-bit accumulator for summing \( N \)-bit pixel counts with their corresponding \((N+k)\)-bit accumulated counts in SRAM memories. This arrangement allows controlling the pixel pitch by tuning \( N \) while keeping \( N+k \) fixed. This is enabled by using on-chip SRAM memories which offer approximately 16x higher bit density (accounting for SRAM periphery logic) than the in-pixel counters.

When summing \( M \) frames together and given that uncorrelated noise sources add in quadrature, the DR can be expressed as:

\[ DR_{\text{MultipleFrame}} = 20 \times \log\left(\frac{M \times S}{\sqrt{M \times \sqrt{\text{DCR}}}}\right) \]  

Therefore the improvement in DR is given by the difference between equations (3) and (2):

\[ DR_{\text{Improvement}} = 20 \times \log(M) \]  

However, increasing \( M \) has a negative effect on the readout frame rate, representing a trade-off between DR and frame rate. The frame rate depends on the system clock frequency, MCU configuration settings in Table I, exposure time per frame, number of frames summed (\( M \)), and \((N+k)\)-bit width of the SRAM memories (since \((N+k)\)-bit pixel data is readout from SRAM memories serially via a single IO pad). For our sensor, the parameter \( N \) was fixed to 14-bits in the chip design phase to match the top tier SPAD pitch, while the parameter \( k \) is determined by the adder mode (configured via the MCU). It is 2-bits in 16-bit integer or floating point adder modes and 18-bits in 32-bit adder mode. Therefore, one can determine the \( M \) parameter based on the desired trade-off between DR and frame rate.

This on-chip noiseless frame summation scheme was first introduced in the initial presentation of this work in [21]. Since then there have been similar schemes presented for high DR SPAD image sensors, such as [30] and [31].
Although it is possible to implement frame summation off-chip, for our sensor it was critical to perform the frame summation on-chip due to limited bandwidth. The sensor has a 5-wire interface (and only a single wire reserved for data transmission) due to hard sensor area constraints imposed by microendoscopy systems. Assuming a system clock frequency of 37.5 MHz and RS pixel mode, while the pixel array can be transferred to MCU at a rate of 5.6 Gbit/s, it can only be readout of the sensor at a rate of 37.5 Mbit/s. Therefore, performing frame summation off-chip would result in both reducing frame rate and increasing IO power consumption significantly since each frame would need to be readout of the sensor.

IV. EXPERIMENTAL RESULTS

A. Sensor Characterization

The all-digital pixel circuit, having a 14-bit in-pixel counter with overflow protection, yields a maximum count of 15360 events which equates to an intrinsic DR of 72 dB based on equation (2) and a DCR of 15 counts per second (cps). High dynamic range (HDR) imaging is provided by noiseless frame summation in two 256 kbits on-chip SRAM banks, as described in Section III. The noiseless frame summation increases DR up to 126 dB by summing a programmable number of frames while the on-chip processing eliminates the need of reading out large amounts of data at high speeds. Fig. 12 shows the mean photon count rate versus illumination from a controlled white LED source, at 1 V excess bias voltage above the SPAD breakdown voltage of 16.5 V. The increase in DR as a function of the number of frames summed is clearly demonstrated, corresponding closely to photon shot noise theory (inset), as estimated by equation (4).

To characterize the time gating performance a set of time gates were generated, all having the same time offset in respect to the rising edge of the system clock but their gate widths incremented in steps of 390 ps. For each time gate, a pulsed laser was then swept in steps of 25 ps across the time gate and counts were recorded over many frames in order to reconstruct the resulting gate profiles. Fig. 13 shows the time gating linearity and uniformity across the pixel array. A race condition between pixel read and reset signals has caused a corner of the array to be insensitive to light. The time-gate width can be configured from 390 ps to >100 ns in 390 ps steps. Example time-gates from 390 ps to 8.58 ns are shown in Fig. 13(a) and their mean full width at half maximum (FWHM) across the pixel array in Fig. 13(b). The variation in time gates across the pixel array is shown in Fig. 13(c) for a selected time gate width of 3.9 ns. The corresponding histogram of time gates across the pixel array is depicted in Fig. 13(d), showing 3.92 ns mean time gate width and 200 ps standard deviation.

Photo response non-uniformity (PRNU) is one of the most important figures for image sensors as it reflects the pixel to pixel output variation with respect to constant input photon flux. To evaluate PRNU of the imaging array, photon counts were captured under fixed and low illumination level for a 20 ms exposure time at 1 V excess bias voltage. Fig. 14 (a) shows the photon counts across the pixel array while the histogram of the photon counts is shown in Fig. 14 (b) with the mean and standard deviation counts being 1227 and 44.94, respectively (excluding pixels insensitive to light and high DCR pixels). PRNU was then calculated as the standard deviation divided by the mean which equated to 3.66%.

The photon detection efficiency (PDE) varies between 1.3% and 12.6% for wavelengths from 450 nm to 900 nm (peaking
at 615 nm) at 3 V excess bias voltage, based on measurements carried out on a test SPAD implemented in the same silicon run [32]. The median DCR at room temperature is 15 cps and 118 cps at 1 V and 2 V excess bias voltages, respectively. The VDD power consumption is 10 mW at maximum activity.

B. HDR Experiments

HDR imaging was performed in light conditions which caused single frames to saturate, as shown in Fig. 15 (a). Summing frames on-chip recovers detail in the clipped areas of the image, as shown in Figs. 15 (b) and (c) for 8 and 128 frames summed, corresponding to 9 dB and 12.6 dB improvement in DR based on Eq. (4), respectively. All three images were captured at 15 fps, with a total exposure time of 5 ms, and cropped to an imaging area of 90 x 120.

C. Instrument Response Function (IRF) and FLIM Experiments

The IRF and FLIM images were acquired using a Hamamatsu Picosecond Light Pulser PLP-10 (wavelength 483 nm, pulse width 80 ps, maximum power 150 mW, repetition rate 37.5 MHz). Measurements carried out on a test SPAD yielded a SPAD IRF of 70 ps [33], whereas all array measurements on SPAD and gating circuitry yielded a composite IRF of 0.55 +/- 0.02 ns. An epi-fluorescence optical set-up was used to capture FLIM images, with the laser beam directed onto the sample via a dichroic mirror and 10x magnification objective. The resulting sample fluorescence passes back through the dichroic, where it is filtered through a 495 nm long-pass filter to remove any residual scattered or reflected laser light, and focused onto the image sensor using a second objective of 10x magnification.

Fluorescein and Rhodamine B liquid samples were prepared by mixing with distilled water. Fig. 16 shows representative fluorescence decay and lifetime distributions of the samples. A least-squares fit using 15 time-gate bins is compared with a 2-gate RLD fit, yielding Fluorescein (least squares mean 4.2 ns +/- 0.3 ns, RLD mean 4.4 ns +/- 0.7 ns) and Rhodamine B (least squares mean 1.9 ns +/- 0.1 ns, RLD mean 1.8 ns +/- 0.2 ns) lifetimes which match closely the expected literature values [34]. The time-gate bin width is 390 ps for both methods. Although the lifetime distribution is much broader for Fluorescein with 2-gate RLD, Fluorescein and Rhodamine B samples can still be separated from each other. It is also worth mentioning that the lifetime estimates for Fluorescein with 2-gate RLD can be improved by increasing the time-gate bin width. As another example, fluorescence intensity and lifetime images of a Convallaria Majalis sample are shown in Fig. 17, again matching the expected lifetime distributions [35]. The sample was imaged with 10000 exposure cycles (corresponding to an exposure time of 266.7 µs per frame with a 37.5 MHz system clock) and 1000 frames summed in 32-bit adder and GS (gated GS for lifetime images) modes, giving a readout frame rate of 1 frame per second (fps) for intensity imaging. The Convallaria fluorescence lifetime image was obtained with the least-squares fit method using 5 time-gate bins (with 390 ps bin width), resulting in a frame rate of 0.2 fps. It is worth noting that the frame rate for intensity images

Fig. 15. HDR intensity images of (a) single frame, (b) 8 frames summed, (c) 128 frames summed. All three images were captured at 15 fps, with a total exposure time of 5 ms, and cropped to an imaging area of 90 x 120.

Fig. 16. (a) Example fluorescence decays of Fluorescein and Rhodamine B samples for a selected pixel (row: 76, column: 88), obtained using 70 time-gate bins, each with a bin width of 390 ps. For comparison, the IRF for the same pixel is also shown. (b) Lifetime histograms of Fluorescein and Rhodamine B samples across the pixel array. Lifetimes estimated by the least-squares fit method are based on using 15 successive time-gate bins (selected from a region starting from the peak bin and extending towards the noise floor of the decay) whereas the lifetimes obtained with the RLD method are based on 2 time-gate bins. The time-gate bin width is 390 ps for both methods.

Fig. 17. Fluorescence (a) intensity and (b) lifetime images of a Convallaria Majalis sample. Lifetimes estimated by applying the least-squares fit method, using 5 time-gate bins, each with a bin width of 390 ps.
hematoxylin and eosin stained human lung tissue taken at video rate. For each image, two contrasting regions were selected, and the corresponding lifetime histograms are presented to show lifetime distribution across those regions. The lung sample was imaged with 60,000 exposure cycles and 10 frames summed, providing a frame rate of 7.4 fps. To increase the frame rate, the 2-bin RLD method (with 390 ps bin width) was used in even and odd columns mode (as described above).

The fluorescence lifetime signal-to-noise ratio (SNR) can be defined as $\frac{\tau(N)}{\Delta \tau(N)}$ where $\tau$ is the lifetime, $\Delta \tau$ is the standard deviation and $N$ is the number of photons detected over $n$ time channels [36]. Achieving good SNR requires increasing $N$ and hence longer exposure times which have negative effect on frame rate. In Fig. 19 we present lifetime SNR versus frame rate comparisons for our sensor. A homogeneous fluorescein sample was used for lifetime measurements which were obtained for different exposure times (by varying the number of exposure cycles and frames summed) and using the 2-gate RLD method in odd and even columns mode. It is clear that with the fluorescein sample a lifetime SNR $> 5$ can be achieved for frame rates up to 20 fps. However, the SNR versus frame rate profile will vary with different samples and parameter settings.

Table II provides a comparison against some state-of-the-art academic and commercial endoscopy image sensors. In general, chip-on-tip endoscopes are smaller and provide higher frame rates compared to capsule endoscopes.

![Image 18](image18.png) Fig. 18. Example lifetime frames from a stained human lung tissue captured at 7.4 fps. For each frame, lifetime histograms for two selected regions are displayed to show lifetime variation between those regions.

![Image 19](image19.png) Fig. 19. Lifetime SNR versus frame rate, obtained using a homogeneous fluorescein sample and applying the 2-bin RLD method for lifetime estimations.

| TABLE II | COMPARISON TO STATE-OF-THE-ART ACADEMIC AND COMMERCIAL ENDOSCOPY IMAGE SENSORS |
| :---------- | :--------------------------- | :--------------------------- | :--------------------------- | :--------------------------- |
| Chip-on-tip endoscopes | Capsule endoscopes |
| This Work | [37] | [38] | [39] | [40] | [41] |
| Resolution | 128 x 120 | 249 x 250 | 400 x 400 | 200 x 200 | 64 x 64 | 320 x 240 |
| Pixel Pitch | 8 μm | 3 μm | 1.75 μm | 1.75 μm | 61.5 μm | 10 μm |
| Technology | 3D-Stacked BSI CMOS | FSI CMOS | BSI CMOS | BSI CMOS | BSI CMOS | BSI CMOS |
| Sensor Type | SPAD | CMOS Image Sensor | CMOS Image Sensor | CMOS Image Sensor | SPAD | CMOS Image Sensor |
| Frame Rate | 15 fps (intensity) | 43 to 62 fps | 30 fps | 30 fps | 1.3 fps | 2 fps |
| 7.4 fps (time-resolved) | | | | | |
| Dynamic Range | 72 dB* / 126 dB | 58 dB | 65.8 dB | 60.2 dB | n/a | n/a |
| Shutter Mode | Rolling / Global | Rolling | Rolling | Rolling | Global | n/a |
| IO Pins | 5 | 4 | 4 | 4 | n/a | 3 |
| Dimensions | 1.4 mm x 1.4 mm | 1 mm x 1 mm | 0.95 mm x 0.94 mm | 0.58 mm x 0.58 mm | 3.9 mm x 3.9 mm | 4.84 mm x 4.34 mm |
| Intensity Imaging | Yes | Yes | Yes | Yes | Yes | Yes |
| Time-Resolved Imaging | Yes | No | No | No | No | No |

*native (without oversampling)
Compared to chip-on-tip endoscopes, while our sensor has a larger area of 1.96 mm² and a smaller number of pixels due to the bigger 8 µm pixel pitch, it is capable of video rate operation, higher dynamic range of 72 dB due to noiseless readout which can be boosted up to 126 dB by on-chip frame summation and can operate in both rolling and global shutter modes. Moreover, our sensor is the only miniature image sensor capable of time-resolved imaging with a 5-wire interface.

V. CONCLUSION

We have presented a 128 x 120 SPAD array time-resolved image sensor for microendoscopy. The sensor is integrated in a 3D-stacked 90nm/40nm CMOS process, measuring 1.4 mm x 1.4 mm, and having a 5-wire interface. It incorporates an 8 µm pitch, 14-bit depth pixel with on-chip data processing, allowing for dynamic range extension by noiseless frame summation and for achieving video rate time-resolved imaging.

To the best of the authors’ knowledge, this is the first SPAD image sensor designed for microendoscopy which enables video rate time-resolved imaging.

ACKNOWLEDGMENT

The authors would like to thank Dr. Ahsan Akram for provision of the lung tissue sample.

Ethics statement: All experiments using ex vivo human lung tissue were performed following approval by the appropriate regional Research Ethics Committee (REC), NHS Lothian (references 13/ES/0126 and 16/LO/1883), and all subjects gave written informed consent.

REFERENCES


[25] ARM Debug Interface Architecture Specification, ADIV5.0 to ADIV5.2


[27] D. Stoppa, L. Pancheri, R. Scanduzzo, L. Gori, G. Dalla Betta


[38] https://www.ovt.com/sensors/OV6946

[39] https://www.ovt.com/sensors/OV6948


Ahmet T. Erdogan received the B.Sc. degree in electronics engineering from Dokuz Eylul University, Izmir, Turkey, in 1990, and the M.Sc. and Ph.D. degrees in electronics engineering from Cardiff University, Cardiff, U.K., in 1995 and 1999, respectively.

Since 1999, he has been working on several research projects in the University of Edinburgh, Edinburgh, U.K., where he is currently a Research Associate with the CMOS Sensors and Systems Group, School of Engineering. His research interests include low-power VLSI design, reconfigurable computing, and CMOS image sensors.

Tarek Al Abbas (M’13) received the master’s degree in analog electronics design and the Ph.D. degree in the design of CMOS single-photon avalanche diode (SPAD) image sensors from The University of Edinburgh, Edinburgh, U.K., in 2013 and 2019, respectively.

He is currently with the Pixel Design Team, Sense Photonics, Edinburgh, working on next-generation flash light detection and ranging (LIDAR) systems for autonomous driving vehicles. His research interests include time-resolved applications, single-photon counting, and miniature SPAD pixel architectures. He has authored or coauthored more than 15 articles in these fields.

Neil Finlayson (M’10) received the B.Sc. and Ph.D. degrees in electrical engineering and electronics from the University of Glasgow, Glasgow, U.K., in 1981 and 1985, respectively.

He has led teams and worked on projects in optoelectronic systems, energy engineering, internet services, and software development, over a 30-year engineering and research career in industry and academia. He is currently a Research Fellow with the CMOS Sensors and Systems Group, Institute for Integrated Micro and Nano Systems, University of Edinburgh, Edinburgh, U.K. He works on the Proteus project which is focused on molecular imaging of lung tissue. His primary responsibilities are software and firmware development, optical characterization, and applications of next-generation time-resolved fluorescence/Raman spectroscopic sensors.

Charlotte Hopkinson received an MEng degree in Biomedical Engineering from the University of Glasgow, Glasgow, U.K., in 2020. She is currently a PhD student in the School of Engineering, University of Edinburgh, Edinburgh, U.K. As well as this, she is part of the Proteus project team creating a microendoscopic camera system for imaging inside the lung.

Istvan Gyongy holds MEng and PhD degrees from Oxford University, UK. Following a period in industry, where he worked on processors for smartphones, and on a cloud-connected activity tracking system for dairy farms, he joined the University of Edinburgh, UK. He is currently a Research Fellow and is developing single-photon avalanche diode (SPAD) cameras, and exploring applications in 3D capture as well as in the life sciences.

Oscar Almer holds both a MEng in Electronics with Computer Science and a PhD in Informatics, both from the University of Edinburgh, the latter earned in 2012. He has since worked both in research and industry settings as digital design engineer. Oscar’s main focus is on digital design and control for optoelectronics.

Neale A.W. Dutton (M’13-SM’15) received the Masters of Engineering in EEE from the University of Edinburgh in 2011. He received the PhD degree from the same institution in 2016 researching SPAD image sensors for photon counting and Time of Flight (TOF) imaging funded by STMicroelectronics, Imaging Division in Edinburgh, UK.

Dr. Dutton is a Principal Engineer & Senior Member of the Technical Staff with ST with research and development interests in SPADs and TOF. He is supervising 1 PhD student and is an author on 38 papers, 1 book chapter, and an inventor on 39 patents & trade secrets. He reviews for JSSC, SSC-L and MDPI Sensors. He serves on the Technical Programme Committee of the IEEE VLSI Symposium.

Robert K. Henderson (M’84-SM’15-F’20) received the Ph.D. degree in electronics and electrical engineering from the University of Glasgow, Glasgow, U.K., in 1990.

From 1991 to 1996, he was a Research Engineer at the Swiss Centre for Microelectronics, Neuchatel, Switzerland. In 1996, he was appointed Senior VLSI Engineer at VLSI Vision Ltd, Edinburgh, U.K., where he worked on world’s first single chip video camera. From 2000 to 2005, as a Principal VLSI Engineer with STMicroelectronics Imaging Division, Edinburgh, he developed image sensors for mobile phone applications. In 2005, he joined the University of Edinburgh, Edinburgh, designing the first single photon avalanche diode image sensors in nanometer CMOS technologies in MegaFrame and SPADnet EU projects. He is currently a Professor with the School of Engineering, University of Edinburgh, Edinburgh.

Prof. Henderson was awarded a prestigious ERC Advanced Fellowship in 2014.