BioWF: A Naturally‐Fused, Di‐Domain Biocatalyst from Biotin Biosynthesis Displays an Unexpectedly Broad Substrate Scope

Abstract The carbon backbone of biotin is constructed from the C7 di‐acid pimelate, which is converted to an acyl‐CoA thioester by an ATP‐dependent, pimeloyl‐CoA synthetase (PCAS, encoded by BioW). The acyl‐thioester is condensed with ʟ‐alanine in a decarboxylative, Claisen‐like reaction to form an aminoketone (8‐amino‐7‐oxononanoic acid, AON). This step is catalysed by the pyridoxal 5’‐phosphate (PLP)‐dependent enzyme (AON synthase, AONS, encoded by BioF). Distinct versions of Bacillus subtilis BioW (BsBioW) and E. coli BioF (EcBioF) display strict substrate specificity. In contrast, a BioW‐BioF fusion from Corynebacterium amycolatum (CaBioWF) accepts a wider range of mono‐ and di‐fatty acids. Analysis of the active site of the BsBioW : pimeloyl‐adenylate complex suggested a key role for a Phe (F192) residue in the CaBioW domain; a F192Y mutant restored the substrate specificity to pimelate. This surprising substrate flexibility also extends to the CaBioF domain, which accepts ʟ‐alanine, ʟ‐serine and glycine. Structural models of the CaBioWF fusion provide insight into how both domains interact with each other and suggest the presence of an intra‐domain tunnel. The CaBioWF fusion catalyses conversion of various fatty acids and amino acids to a range of AON derivatives. Such unexpected, natural broad substrate scope suggests that the CaBioWF fusion is a versatile biocatalyst that can be used to prepare a number of aminoketone analogues.

CaBioWF reactions transforming a range of carboxylic acids (DC 6 -DC 9 and C 6 -C 9 ) with ʟ-Ala leading to the production of the corresponding aminoketone. Product formation confirmed by LC ESI-MS analysis.

CaBioWF Modelling and Simulation
Initially, the CaBioW (M1-T238) and CaBioF (G239-A620) domains were modelled separately using the accurate deep learning architecture ColabFold (see the Experimental Section in the main text). Both domains were predicted with high confidence (pLDDT >90, pTM >0.85, see figure S22A-B) with homodimeric interfaces comparable to experimentally solved structures including BsBioW (PDB: 5FLL, see figure S23B) and EcBioF (PDB: 1DJ9, see figure S24B). CaBioW was modelled with a subdomain architecture shared by type IV ANL enzymes (see also PDB: 5TV5), wherein the catalytic C-terminal subdomain binds its substrates and the structural N-terminal subdomain comprises a dimer interface. The predicted CaBioF shares strong fold-level similarity with several BioF homologues (PDB: 5JAY, 6ONN, 5VNX, 7S5M), as well as other PLP-dependent enzymes such as serine palmitoyltransferases (SPT, PDB: 3A2B, 2X8U) and 2-amino-3-ketobutyrate CoA ligases (KBL, PDB: 7V58, 3TQX, 7BXP). Furthermore, several highly conserved residues that define the binding pocket of each domain were identified by evolutionary conservation analysis, including Y181 and R194 in CaBioW (figure S23C) as well as H380 and K483¬ in CaBioF ( figure S24C). This initial study provided confidence in the ability of ColabFold to accurately predict the tertiary and quaternary structures of the CaBioWF domains.
The full CaBioWF dimer was subsequently modelled, and the top-ranked output (pLDDT 91.8, pTM = 0.68, see figure S22C) was studied in a 10 ns (5 x 10 6 time steps) molecular dynamics simulation (MDS, see the Experimental Section in the main text). While the individual domains were confidently predicted on a fold-level, there was some uncertainty regarding the relative orientation of the two CaBioW domains, in part due to the disordered intra-domain linker(s) tethering CaBioW and CaBioF together ( figure S22C). The predicted CaBioWF model suggests that both domains contribute towards the dimeric interface, and these interfacial contacts are maintained over the course of the MDS (see figure S25). In particular, the CaBioWF complex is stabilised by an average of 36 ± 7 interfacial hydrogen bonds, the majority of which (52%) occur within 2.72-2.93 Å (figure S26A/B). While the average radius of gyration (Rg = 3.57 ± 0.02 nm, Rg max-min = 0.160 nm) suggests that the CaBioWF complex is stable, pairwise RMSD analysis reveals that the bifunctional enzyme exhibits a moderate amount of conformational flexibility, with RMSDs as high as 6 Å occasionally observed (figure S26C-D). Root Mean Square Fluctuation (RMSF) and B-factor analysis identifies the intra-domain linker and the CaBioW domains as the most mobile regions of the protein (figure S27 A-C). In fact, this linker is flexible enough to allow light orientational adjustment of the CaBioW domains within the first 2 ns of the simulation, with one of the CaBioW domains rotating approximately 18.5 ° inwards from the start of the trajectory ( figure S27D). By the midpoint of the simulation, the CaBioW domains had settled, and both BsBioW and EcBioF could be comfortably superimposed onto the CaBioWF complex ( figure  S28). Interestingly, the CaBioW and CaBioF binding pockets face each other approximately 4.75 nm apart; the proximity and orientation of these binding pockets suggests that the pimeloyl-CoA product of CaBioW can easily diffuse into active site of CaBioF. Taken all together, this in silico study provides insight into the didomain architecture of CaBioWF, and hints towards both its flexibility and the existence of a potential "tunnel" that can channel products from the CaBioW domain to CaBioF ( figure  S29). This makes CaBioWF an attractive, curious and potentially challenging target for future crystallographic trials.

Figure. S23: A closer inspection of the predicted CaBioW domain A)
CaBioW was predicted to form dimeric contacts between the N-terminal subdomains. B) The crystal structure of BsBioW superimposed on the predicted CaBioW homodimer (RMSD: 1.06 Å, 174 pruned atom pairs). C) Highly conserved pocket residues identified in the predicted CaBioWF. The pimeloyl adenylate was extracted from the BsBioW crystal structure (grey, PDB: 5FLL) and is displayed here for reference.

Figure. S24: A closer inspection of the predicted CaBioF domain A)
CaBioF was predicted to form dimeric contacts, as commonly observed with BioF homologues and other PLP-dependent enzymes. B) The crystal structure of EcBioF superimposed on the predicted CaBioF homodimer (RMSD: 0.99 Å, 289 pruned atom pairs). C) Highly conserved pocket residues identified in the predicted CaBioF model. The AONS:PLP-AON product external aldamine was extracted from the EcBioF crystal structure (grey, PDB: 1DJ9) and is displayed here for reference.  The BioW and BioF binding pockets face towards each other, providing an easy diffusion path from CaBioW domain to the active site of CaBioF. B) The predicted "tunnel" is estimated to be 4.5-5 nm in length.