De novo creation of a naked eye–detectable fluorescent molecule based on quantum chemical computation and machine learning

Designing fluorescent molecules requires considering multiple interrelated molecular properties, as opposed to properties that straightforwardly correlated with molecular structure, such as light absorption of molecules. In this study, we have used a de novo molecule generator (DNMG) coupled with quantum chemical computation (QC) to develop fluorescent molecules, which are garnering significant attention in various disciplines. Using massive parallel computation (1024 cores, 5 days), the DNMG has produced 3643 candidate molecules. We have selected an unreported molecule and seven reported molecules and synthesized them. Photoluminescence spectrum measurements demonstrated that the DNMG can successfully design fluorescent molecules with 75% accuracy (n = 6/8) and create an unreported molecule that emits fluorescence detectable by the naked eye.


Correlation with number of aromatic rings
To see the correlations of absorption wavelength to S1 states/emission wavelength from S1 states and its oscillator strengths (OSs) with the number of aromatic rings, the correlation graphs with them are shown in Figure S1. We used the tools of RDkit (59) to count aromatic rings. Figure S1. Correlation graphs of fluorescence/absorption wavelength and their intensities with aromatic rings. Upper two are correlation graphs of number of aromatic rings with S0 absorption wavelength, its Oscillator strength. Bottom two are correlation graphs with S1 emission wavelength, its oscillator strength (bottom two).

Correlation with conjugate length
To see the correlations of absorption wavelength to S1 states/emission wavelength from S1 states and its oscillator strengths (OSs) with conjugate length, the correlation graphs with them are shown in Figure S1. We have count conjugate length whose unity is defined as single-doublesingle bond sequence. Figure S2. Correlation graphs of fluorescence/absorption wavelength and their intensities with conjugate rings. Upper two are correlation graphs of number of conjugate lengths with S0 absorption wavelength, its Oscillator strength. Bottom two are correlation graph with S1 emission wavelength, its oscillator strength.

Prediction of absorption and emission by machine learning model
For investigation of the relationship between molecular features and the properties of absorption and fluorescence, we developed prediction models based on random forest using the Mordred descriptors. We employed an implementation of the random forest regression model in scikit-learn library. The number of trees was set to 100, and other settings were used as default. The prediction performance was evaluated using 5-fold cross validation. The average of correlations of the prediction results for each property is shown in Table S1. The features with high feature importance in each prediction model are shown in Figure S3. Table S1. Pearson's correlation coefficients (R values) of wavelengths and their intensities of absorption to S1 states and fluorescence at the B3LYP/3-21G* level from S1 states with the predicted values by the trained random forest models.

Selected molecules
From the viewpoint of detectability, we have selected 87 molecules in accordance with the condition mentioned in the main text. In Table S2, selected 87 molecules in SMILES string are summarized with their properties computed at the B3LYP/3-21G* level. A-G are molecules or its tautomers that are found in Scifinder and synthesized for the experimental validation. The unreported molecule we synthesize in this study is PC. I-IV are expected to emit near-infrared light.

General methods
ATR-FTIR spectra were obtained using a Thermo-Nicolet 760X FTIR spectrophotometer equipped with a SMART-iTX ATR accessory. 1 H-NMR spectra were obtained using a JEOL JNM-ECA400 spectrometer operating at 400 MHz and using tetramethylsilane (TMS) as an internal standard. Proton decoupled 13 C-NMR spectra were obtained using a JEOL JNM-ECA400 spectrometer operating at 101 MHz and using TMS as an internal standard. Data was processed using Delta version 5.0.5.1. 1 H NMR chemical shifts (δ) are reported in ppm relative to TMS in DMSO-d6 (δ = 0.00). 13 C NMR chemical shifts (δ) are reported in ppm relative to the solvent reported. Coupling constants (J) are expressed in Hertz (Hz), shift multiplicities are reported as singlet (s), doublet (d), triplet (t), quartet (q), double doublet (dd), multiplet (m) and broad singlet (bs). High resolution ESI-MS mass spectra were measured using a Thermo Scientific Q-Exactive Plus instrument in methanol with 0.1% formic acid.

Photoluminescence (PL) spectra of PC under N 2
PL spectra under air and N2 (after N2 bubble for 20 min) are measured at room temperature. Since there is no difference between them, phosphorescence should not be involved in PL of PC. Figure S12. PL spectra of PC under air and N 2 .

Materials
The materials of A-G (Table S2) were obtained from Tokyo Chemical Industry Co., Ltd. through custom synthesis. Spectroscopic grade solvents (DCM or THF) were obtained from Fujifilm Wako Pure Chemical Corporation.

Characterization of A-G
ATR-FTIR spectra were obtained using a Thermo-Nicolet 760X FTIR spectrophotometer equipped with a SMART-iTX ATR accessory. 1 H-NMR spectra were obtained using a JEOL JNM-ECA400 or JNM-ECZ400S spectrometer operating at 400 MHz and using tetramethylsilane (TMS) as an internal standard. Proton decoupled 13 C-NMR spectra were obtained using a JEOL JNM-ECA400 or JNM-ECZ400S spectrometer operating at 101 MHz and using TMS as an internal standard. 19 F-NMR spectra were obtained using a JEOL JNM-ECZ400S spectrometer operating at 376 MHz. Data was processed using Delta version 5.0.5.1. 1 H NMR chemical shifts (δ) are reported in ppm relative to TMS in DMSO-d6 (δ = 0.00). 13 C NMR chemical shifts (δ) are reported in ppm relative to the solvent reported. Coupling constants (J) are expressed in Hertz (Hz), shift multiplicities are reported as singlet (s), doublet (d), triplet (t), quartet (q), double doublet (dd), multiplet (m) and broad singlet (bs). High resolution ESI-MS mass spectra were measured using a Thermo Scientific Q-Exactive Plus instrument in methanol with 0.1% formic acid or methanol with 10% tetrahydrofuran.

Quantum yields of PC and A-G
Quantum yields ( ) in solid and solution states of PC and A-G molecules are summarized in Table  S5. Absolute fluorescence quantum yields were determined with a Hamamatsu Photonics C-9920-02 calibrated integrating sphere system.