Abstract

We developed a highly efficient composite quantum chemical method tailored for biomass-derived molecules. The method achieves chemical accuracy in evaluating CCSD(T)/CBS correlation energies and enthalpies of formation while reducing computational costs by over 2 orders of magnitude. This efficiency makes large-scale thermodynamic analysis feasible. Furthermore, the approach can be extended to condensed-phase and catalytic systems, enabling quantitative insights into biomass conversion and related chemical processes in diverse environments.

Converting plant-derived biomass, i.e. lignocellulosic biomass, into valuable materials and promoting its reuse are crucial for achieving a sustainable society. In this process, polymers such as cellulose, hemicellulose, and lignin undergo pyrolysis, breaking down into low-molecular-weight fuels and gases. A detailed understanding and precise control of the reaction pathways in pyrolysis1,2 are essential for enhancing lignocellulosic biomass utilization efficiency. Extensive experimental3–6 and theoretical7 investigations, including studies on catalytic effects,8,9 have been conducted to elucidate the kinetics of these processes.

Quantum chemical calculations offer a powerful approach for evaluating the thermodynamic properties of molecules. To achieve a quantitatively reliable description of chemical phenomena, an accuracy of approximately 1 kcal/mol—often referred to as chemical accuracy—is required. Among quantum chemical methods, coupled cluster theory with singles, doubles, and perturbative triples in the complete basis set limit (CCSD(T)/CBS) is widely regarded as the gold standard due to its exceptional accuracy in describing electron correlation effects. However, the prohibitively high computational cost of CCSD(T)/CBS significantly limits its applicability to the large number of chemical species involved in lignocellulosic biomass pyrolysis.

To address this challenge, we have developed a composite method that accurately predicts CCSD(T)/CBS energies by combining lower-cost electron correlation methods.10 In this study, we applied our composite method to estimate the thermodynamic properties of lignocellulosic biomass-derived molecules with high accuracy. Specifically, we re-optimized the method's parameters for lignocellulosic biomass-related species and evaluated both its accuracy and computational efficiency in calculating enthalpies of formation.

First, we describe the procedure for calculating correlation energies using the composite method and the determination of its parameters. In the following discussion, the double-, triple-, and quadruple-zeta basis sets (DZ, TZ, and QZ) correspond to the Dunning cc-pVXZ series. The electronic energy (Eelec) at the CCSD(T)/CBS level is expressed by Eq. (1):

(1)

where EHFCBS represents the Hartree–Fock energy extrapolated to the complete basis set (CBS) limit using DZ, TZ, and QZ basis sets.11 The term ECCSD(T)CBS denotes the CCSD(T)/CBS correlation energy, which is predicted by the composite method through a linear combination of correlation energies obtained from several electron correlation methods and basis sets,10 as given by Eq. (2):

(2)

In Eq. (2), E represents correlation energies computed at the MP2/XZ (X = D, T, Q), CCSD/YZ (Y = D, T), and CCSD(T)/DZ levels, while c denotes the corresponding coefficients. The reference CCSD(T)/CBS correlation energies, required for parameterizing the composite method, were obtained using an extrapolation method12 described by Eq. (3):

(3)

In this study, CCSD(T)/TZ and CCSD(T)/QZ calculations were employed to estimate the reference CCSD(T)/CBS energy. The coefficients of the composite method were determined using classical linear regression.

Experimental enthalpies of formation were compiled for 200 organic molecules composed of carbon, oxygen, and hydrogen, containing up to 8 carbon or oxygen atoms. Compared with the sets applied in the original development of the composite method, the present set contains a wider range of functional groups and larger molecules. The experimental values were sourced from a reference13 or the NIST Chemistry WebBook.14,15 A complete list of experimental values is provided in Supplementary material. Among these, reference CCSD(T)/CBS energies were calculated for 179 relatively small molecules and used to parameterize the composite method, while the remaining 21 larger molecules were designated as an untrained set.

The initial 3-dimensional molecular structures were generated using CORINA Classic software.16 Structural optimizations and single-point calculations were performed with the Gaussian16 program.17 Optimization and vibrational frequency calculations were conducted using density functional theory (DFT) with the ωB97X-D exchange-correlation functional and the aug-cc-pVTZ basis set. Computations were carried out on an AMD EPYC 7763 CPU. For enthalpy calculations, standard thermodynamic corrections, including vibrational contributions from DFT, were applied at 298.15 K. In the enthalpy of formation calculations, reference standard enthalpies for H2, O2, and graphite were required. The values for H2 and O2 were derived from computed standard enthalpies, while the standard enthalpy of graphite was determined by combining the experimental enthalpy of formation of CH4 with the computed standard enthalpies of CH4 and H2.

Table 1 presents the re-optimized parameters of the composite method, compared with those previously proposed. The coefficients corresponding to each combination of electron correlation method and basis set are listed for both the original parameters (“Original”) and the newly optimized ones (“Biomass”). The signs of the coefficients remain consistent between the 2 sets, indicating that the optimized parameters retain similar properties to the original ones. Additionally, the absolute values of the Biomass coefficients tend to be smaller, likely due to the high structural similarity among the lignocellulosic biomass molecules used for parameterization.

Table 1.

Coefficients of composite methods.

MethodBasis setOriginalBiomass
MP2DZ0.42360.2956
MP2TZ−1.9340−1.7296
MP2QZ1.53601.4851
CCSDDZ−1.9603−1.7602
CCSDTZ1.62301.4909
CCSD(T)DZ1.28531.1935
MethodBasis setOriginalBiomass
MP2DZ0.42360.2956
MP2TZ−1.9340−1.7296
MP2QZ1.53601.4851
CCSDDZ−1.9603−1.7602
CCSDTZ1.62301.4909
CCSD(T)DZ1.28531.1935
Table 1.

Coefficients of composite methods.

MethodBasis setOriginalBiomass
MP2DZ0.42360.2956
MP2TZ−1.9340−1.7296
MP2QZ1.53601.4851
CCSDDZ−1.9603−1.7602
CCSDTZ1.62301.4909
CCSD(T)DZ1.28531.1935
MethodBasis setOriginalBiomass
MP2DZ0.42360.2956
MP2TZ−1.9340−1.7296
MP2QZ1.53601.4851
CCSDDZ−1.9603−1.7602
CCSDTZ1.62301.4909
CCSD(T)DZ1.28531.1935

Table 2 reports the mean absolute deviations (MAD) between the calculated and reference CCSD(T)/CBS correlation energies for the 179 molecules in the training set. For comparison, the correlation energies applied to composite method and reference calculations are listed. The composite methods exhibited significantly low MAD values of 0.2 kcal/mol and 0.075 kcal/mol for the Original and Biomass parameters, respectively. In contrast, even CCSD(T)/QZ yielded a MAD exceeding 25 kcal/mol. The Biomass composite method further improved accuracy, although the Original composite method had already achieved a high level of precision, highlighting the robustness and general applicability of the approach.

Table 2.

Deviation of calculated correlation energies from reference CCSD(T)/CBS values.

MethodBasis setMAD/kcal mol−1
MP2DZ223.311
MP2TZ111.673
MP2QZ72.914
CCSDDZ188.101
CCSDTZ85.298
CCSD(T)DZ172.083
CCSD(T)TZ59.760
CCSD(T)QZ25.211
 Original composite0.200
 Biomass composite0.075
MethodBasis setMAD/kcal mol−1
MP2DZ223.311
MP2TZ111.673
MP2QZ72.914
CCSDDZ188.101
CCSDTZ85.298
CCSD(T)DZ172.083
CCSD(T)TZ59.760
CCSD(T)QZ25.211
 Original composite0.200
 Biomass composite0.075
Table 2.

Deviation of calculated correlation energies from reference CCSD(T)/CBS values.

MethodBasis setMAD/kcal mol−1
MP2DZ223.311
MP2TZ111.673
MP2QZ72.914
CCSDDZ188.101
CCSDTZ85.298
CCSD(T)DZ172.083
CCSD(T)TZ59.760
CCSD(T)QZ25.211
 Original composite0.200
 Biomass composite0.075
MethodBasis setMAD/kcal mol−1
MP2DZ223.311
MP2TZ111.673
MP2QZ72.914
CCSDDZ188.101
CCSDTZ85.298
CCSD(T)DZ172.083
CCSD(T)TZ59.760
CCSD(T)QZ25.211
 Original composite0.200
 Biomass composite0.075

The enthalpies of formation (Hf) of 200 molecules were computed using various methods and compared with experimental values. Table 3 summarizes the resulting errors, including mean absolute errors (MAE), maximum errors (MaxE), and computational times. The computational time is reported as the average per molecule. For the composite method, the time corresponds to the sum of the computation times for the electron correlation energies used in Eq. (2). For the reference CCSD(T)/CBS values, the total time includes the combined CCSD(T)/TZ and CCSD(T)/QZ calculations.

Table 3.

Errors in enthalpies of formation relative to experimental values and computational times.

MethodBasis setTrained setUntrained set
  ΔHf/kcal mol−1CPU time/minΔHf/kcal mol−1CPU time/min
  MAEMaxE MAEMaxE 
MP2DZ3.8117.740.33.0211.060.7
MP2TZ1.787.036.12.816.8316.2
MP2QZ3.249.5279.35.5110.74195.5
CCSDDZ6.0625.772.411.3318.788.1
CCSDTZ6.0315.4181.59.5016.95334.7
CCSD(T)DZ3.9924.3910.76.4814.8446.4
CCSD(T)TZ2.248.74319.9
CCSD(T)QZ1.054.8219022.6
 Original composite0.833.62180.30.902.06601.6
 Biomass composite0.963.55180.30.962.37601.6
CCSD(T)CBS0.943.4919342.6
MethodBasis setTrained setUntrained set
  ΔHf/kcal mol−1CPU time/minΔHf/kcal mol−1CPU time/min
  MAEMaxE MAEMaxE 
MP2DZ3.8117.740.33.0211.060.7
MP2TZ1.787.036.12.816.8316.2
MP2QZ3.249.5279.35.5110.74195.5
CCSDDZ6.0625.772.411.3318.788.1
CCSDTZ6.0315.4181.59.5016.95334.7
CCSD(T)DZ3.9924.3910.76.4814.8446.4
CCSD(T)TZ2.248.74319.9
CCSD(T)QZ1.054.8219022.6
 Original composite0.833.62180.30.902.06601.6
 Biomass composite0.963.55180.30.962.37601.6
CCSD(T)CBS0.943.4919342.6
Table 3.

Errors in enthalpies of formation relative to experimental values and computational times.

MethodBasis setTrained setUntrained set
  ΔHf/kcal mol−1CPU time/minΔHf/kcal mol−1CPU time/min
  MAEMaxE MAEMaxE 
MP2DZ3.8117.740.33.0211.060.7
MP2TZ1.787.036.12.816.8316.2
MP2QZ3.249.5279.35.5110.74195.5
CCSDDZ6.0625.772.411.3318.788.1
CCSDTZ6.0315.4181.59.5016.95334.7
CCSD(T)DZ3.9924.3910.76.4814.8446.4
CCSD(T)TZ2.248.74319.9
CCSD(T)QZ1.054.8219022.6
 Original composite0.833.62180.30.902.06601.6
 Biomass composite0.963.55180.30.962.37601.6
CCSD(T)CBS0.943.4919342.6
MethodBasis setTrained setUntrained set
  ΔHf/kcal mol−1CPU time/minΔHf/kcal mol−1CPU time/min
  MAEMaxE MAEMaxE 
MP2DZ3.8117.740.33.0211.060.7
MP2TZ1.787.036.12.816.8316.2
MP2QZ3.249.5279.35.5110.74195.5
CCSDDZ6.0625.772.411.3318.788.1
CCSDTZ6.0315.4181.59.5016.95334.7
CCSD(T)DZ3.9924.3910.76.4814.8446.4
CCSD(T)TZ2.248.74319.9
CCSD(T)QZ1.054.8219022.6
 Original composite0.833.62180.30.902.06601.6
 Biomass composite0.963.55180.30.962.37601.6
CCSD(T)CBS0.943.4919342.6

For the trained set, only the composite methods and CCSD(T)/CBS reproduced the experimental enthalpies of formation with MAEs below 1 kcal/mol. Their maximum errors were 3.62, 3.55, and 3.49 kcal/mol for the Original composite, Biomass composite, and CCSD(T)/CBS, respectively, showing comparable accuracy. The slightly lower MAE of the Original composite might be due to error cancellation between correlation and vibrational contributions. As shown in Table 2, the Biomass composite reproduces CCSD(T)/CBS correlation energies more faithfully, indicating that its agreement with MAE of CCSD(T)/CBS is founded on a more robust physical description. Notably, the composite method estimated energies approximately 107 times faster than the reference CCSD(T)/CBS calculations while also surpassing CCSD(T)/TZ in both speed and accuracy. For the untrained set of larger molecules, the composite method maintained a similar level of accuracy, demonstrating minimal size dependency and strong generalizability. The average computation time for a single large molecule using the composite method was approximately 10 h, whereas direct CCSD(T)/CBS calculations would require nearly 100 times longer, making them impractical for large-scale applications.

Finally, Fig. 1 illustrates the distribution of ΔHf values obtained using the composite method for all 200 molecules. The majority of values lie within ±1 kcal/mol, confirming that chemical accuracy was achieved across a broad range of molecular species. No systematic bias toward overestimation or underestimation was observed. Overall, these results highlight the suitability of the composite method for practical applications involving diverse lignocellulosic biomass-derived molecules.

Distribution of enthalpy of formation errors between experimental data and values predicted by the composite method.
Fig. 1.

Distribution of enthalpy of formation errors between experimental data and values predicted by the composite method.

In summary, we have re-optimized the parameters of a composite method specifically for lignocellulosic biomass-derived molecules and applied it to evaluate CCSD(T)/CBS correlation energies and enthalpies of formation for 200 compounds. The composite method achieves an average error of less than 0.1 kcal/mol for CCSD(T)/CBS correlation energies and reproduces experimental enthalpies of formation with an average error below 1 kcal/mol, all while being more than 100 times faster than direct CCSD(T)/CBS calculations. These results confirm that the composite method meets chemical accuracy and enables reliable quantitative analysis of chemical phenomena at a significantly reduced computational cost.

While this study has focused on gas-phase molecules, the computed energies can be extended to liquid-phase systems and catalytic surfaces by incorporating independently obtained solvation and adsorption energies. This extension will further enhance the applicability of the method, allowing for the thermodynamic evaluation of lignocellulosic biomass conversion processes in more complex reaction environments.

Acknowledgments

The computation was performed using the Research Center for Computational Science, Okazaki, Japan (Project: 24-IMS-C039).

Supplementary data

Supplementary material is available at Chemistry Letters online.

Funding

This work was supported by Demonstration Project of Innovative Catalyst Technology for Decarbonization through Regional Resource Recycling, the Ministry of the Environment, Government of Japan. M. F. is grateful to the financial support by the Grant-in-Aid for Transformative Research Areas (A) Digitalization-driven Transformative Organic Synthesis (Digi-TOS) (KAKENHI Grant Number JP24H01096) from the Japan Society for the Promotion of Science.

References

1

D.
 
Shen
,
W.
 
Jin
,
J.
 
Hu
,
R.
 
Xiao
,
K.
 
Luo
,
Renew. Sustain. Energy Rev.
 
2015
,
51
,
761
.

2

S.
 
Wang
,
G.
 
Dai
,
H.
 
Yang
,
Z.
 
Luo
,
Prog. Energy Combust. Sci.
 
2017
,
62
,
33
.

3

S.
 
Hameed
,
A.
 
Sharma
,
V.
 
Pareek
,
H.
 
Wu
,
Y.
 
Yu
,
Biomass Bioenergy
 
2019
,
123
,
104
.

4

S.
 
Vikram
,
P.
 
Rosha
,
S.
 
Kumar
,
Energy Fuels
 
2021
,
35
,
7406
.

5

A. K.
 
Vuppaladadiyam
,
S. S.
 
Varsha Vuppaladadiyam
,
V. S.
 
Sikarwar
,
E.
 
Ahmad
,
K. K.
 
Pant
,
M.
 
S
,
A.
 
Pandey
,
S.
 
Bhattacharya
,
A.
 
Sarmah
,
S.-Y.
 
Leu
,
J. Energy Inst.
 
2023
,
108
,
101236
.

6

A.
 
Couce
,
Prog. Energy Combust. Sci.
 
2016
,
53
,
41
.

7

B.
 
Hu
,
B.
 
Zhang
,
W.-L.
 
Xie
,
X.-Y.
 
Jiang
,
J.
 
Liu
,
Q.
 
Lu
,
Energy Fuels
 
2020
,
34
,
10384
.

8

C.
 
Liu
,
H.
 
Wang
,
A. M.
 
Karim
,
J.
 
Sun
,
Y.
 
Wang
,
Chem. Soc. Rev.
 
2014
,
43
,
7594
.

9

X.
 
Chen
,
Q.
 
Che
,
S.
 
Li
,
Z.
 
Liu
,
H.
 
Yang
,
Y.
 
Chen
,
X.
 
Wang
,
J.
 
Shao
,
H.
 
Chen
,
Fuel Process. Technol.
 
2019
,
196
,
106180
.

10

J.
 
Seino
,
H.
 
Nakai
,
J. Comput. Chem.
 
2016
,
37
,
2304
.

11

A.
 
Halkier
,
T.
 
Helgaker
,
P.
 
Jørgensen
,
W.
 
Klopper
,
J.
 
Olsen
,
Chem. Phys. Lett.
 
1999
,
302
,
437
.

12

T.
 
Helgaker
,
W.
 
Klopper
,
H.
 
Koch
,
J.
 
Noga
,
J. Chem. Phys.
 
1997
,
106
,
9639
.

13

B.
 
Narayanan
,
P. C.
 
Redfern
,
R. S.
 
Assaryb
,
L. A.
 
Curtiss
,
Chem. Sci.
 
2019
,
10
,
7449
.

14

P. J.
 
Linstrom
,
W. G.
 
Mallard
,
J. Chem. Eng. Data
 
2001
,
46
,
1059
.

15
16
17

M. J.
 
Frisch
,
G. W.
 
Trucks
,
H. B.
 
Schlegel
,
G. E.
 
Scuseria
,
M. A.
 
Robb
,
J. R.
 
Cheeseman
,
G.
 
Scalmani
,
V.
 
Barone
,
G. A.
 
Petersson
,
H.
 
Nakatsuji
,
X.
 
Li
,
M.
 
Caricato
,
A. V.
 
Marenich
,
J.
 
Bloino
,
B. G.
 
Janesko
,
R.
 
Gomperts
,
B.
 
Mennucci
,
H. P.
 
Hratchian
,
J. V.
 
Ortiz
,
A. F.
 
Izmaylov
,
J. L.
 
Sonnenberg
,
D.
 
Williams-Young
,
F.
 
Ding
,
F.
 
Lipparini
,
F.
 
Egidi
,
J.
 
Goings
,
B.
 
Peng
,
A.
 
Petrone
,
T.
 
Henderson
,
D.
 
Ranasinghe
,
V. G.
 
Zakrzewski
,
J.
 
Gao
,
N.
 
Rega
,
G.
 
Zheng
,
W.
 
Liang
,
M.
 
Hada
,
M.
 
Ehara
,
K.
 
Toyota
,
R.
 
Fukuda
,
J.
 
Hasegawa
,
M.
 
Ishida
,
T.
 
Nakajima
,
Y.
 
Honda
,
O.
 
Kitao
,
H.
 
Nakai
,
T.
 
Vreven
,
K.
 
Throssell
,
J. A.
 
Montgomery
 Jr
,
J. E.
 
Peralta
,
F.
 
Ogliaro
,
M. J.
 
Bearpark
,
J. J.
 
Heyd
,
E. N.
 
Brothers
,
K. N.
 
Kudin
,
V. N.
 
Staroverov
,
T. A.
 
Keith
,
R.
 
Kobayashi
,
J.
 
Normand
,
K.
 
Raghavachari
,
A. P.
 
Rendell
,
J. C.
 
Burant
,
S. S.
 
Iyengar
,
J.
 
Tomasi
,
M.
 
Cossi
,
J. M.
 
Millam
,
M.
 
Klene
,
C.
 
Adamo
,
R.
 
Cammi
,
J. W.
 
Ochterski
,
R. L.
 
Martin
,
K.
 
Morokuma
,
O.
 
Farkas
,
J. B.
 
Foresman
,
D. J.
 
Fox
,
Gaussian 16, Revision C.02
,
Gaussian, Inc.
,
Wallingford, CT
,
2016
.

Author notes

Conflict of interest statement. None declared.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact [email protected].

Supplementary data