Physics in Medicine & Biology       PAPER • OPEN ACCESS A quantitative assessment of Geant4 for predicting the yield and distribution of positron-emitting fragments in ion beam therapy To cite this article: Andrew Chacon et al 2024 Phys. Med. Biol. 69 125015   View the article online for updates and enhancements. You may also like Development of a more accurate Geant4 quantum molecular dynamics model for hadron therapy Yoshi-hide Sato, Dousatsu Sakata, David Bolst et al. - Carbon fragmentation measurements and validation of the Geant4 nuclear reaction models for hadrontherapy M De Napoli, C Agodi, G Battistoni et al. - Fusion mechanism in fullerene-fullerene collisions: The deciding role of giant oblate-prolate motion J. Handt and R. Schmidt - This content was downloaded from IP address 137.157.8.253 on 20/12/2024 at 02:04 https://doi.org/10.1088/1361-6560/ad4f48 /article/10.1088/1361-6560/ac9a9a /article/10.1088/1361-6560/ac9a9a /article/10.1088/1361-6560/ac9a9a /article/10.1088/0031-9155/57/22/7651 /article/10.1088/0031-9155/57/22/7651 /article/10.1088/0031-9155/57/22/7651 /article/10.1088/0031-9155/57/22/7651 /article/10.1088/0031-9155/57/22/7651 /article/10.1088/0031-9155/57/22/7651 /article/10.1209/0295-5075/109/63001 /article/10.1209/0295-5075/109/63001 /article/10.1209/0295-5075/109/63001 https://pagead2.googlesyndication.com/pcs/click?xai=AKAOjsvSt7v3hry9B-9ZgrtaR1cmIDc9fNmlfYlYceoSd1nDYWK_8eu7y-FvGKPJY_k3tVYDSqwEiqQG7koPolz21e9m9oCkY2nKC2Ifi5P1OioQ1dMFgcylIjUta4uqtYOrX9k8lJZw7m6SrWHSNeL7GeRQ-D9T3DEZOhdu209A9Ov665EcuN-FcFLIYlMyn_hkNvp_9IqhT3f1il5Jwxh5HSrPLKsIwOlxQ_1hyodJ800JMRYFUrgcGK3xAtc36oC_Bh1PtaiHnYlyAQ17iZYpgF78fHCzV4k3Aj2Jvhr2t4T85dR0CF6RkY7SqxoymYZ1mQuuFrrlSmYCpiz3V7J7wxkrAjEhmN13BOrGC2C6psAR&sig=Cg0ArKJSzKQKME-Njbkb&fbs_aeid=%5Bgw_fbsaeid%5D&adurl=https://www2.sunnuclear.com/l/302621/2024-09-18/zwnvc Phys. Med. Biol. 69 (2024) 125015 https://doi.org/10.1088/1361-6560/ad4f48 Physics in Medicine & Biology OPEN ACCESS RECEIVED 5 February 2024 REVISED 10 May 2024 ACCEPTED FOR PUBLICATION 22 May 2024 PUBLISHED 11 June 2024 Original Content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. PAPER A quantitative assessment of Geant4 for predicting the yield and distribution of positron-emitting fragments in ion beam therapy Andrew Chacon1, Harley Rutherford1,2, Akram Hamato3, Munetaka Nitta3, Fumihiko Nishikido3, Yuma Iwao3, Hideaki Tashima3, Eiji Yoshida3, Go Akamatsu3, Sodai Takyu3, Han Gyu Kang3, Daniel R Franklin4, Katia Parodi5, Taiga Yamaya3, Anatoly Rosenfeld2,6, Susanna Guatelli2,6 and Mitra Safavi-Naeini1,2,7,∗ 1 Australian Nuclear Science and Technology Organisation (ANSTO), Lucas Heights, NSW, Australia 2 Centre for Medical Radiation Physics, University of Wollongong, Wollongong, NSW 2522, Australia 3 National Institutes for Quantum Science and Technology, Chiba, Japan 4 School of Electrical and Data Engineering, University of Technology Sydney, Ultimo, Australia 5 Department of Medical Physics, Faculty of Physics, Garching b, Ludwig-Maximilians-Universität München, Munich, Germany 6 Illawarra Health and Medical Research Institute, University of Wollongong, Wollongong, NSW 2522, Australia 7 Brain and Mind Centre, University of Sydney, Sydney, NSW, Australia ∗ Author to whom any correspondence should be addressed. E-mail: mitras@ansto.gov.au Keywords: hadronic models, fragmentation models, ion beam therapy, carbon ion beam therapy, positron emission tomography (PET), Geant4 Monte Carlo simulation toolbox, quality assurance Supplementary material for this article is available online Abstract Objective. To compare the accuracy with which different hadronic inelastic physics models across ten Geant4 Monte Carlo simulation toolkit versions can predict positron-emitting fragments produced along the beam path during carbon and oxygen ion therapy. Approach. Phantoms of polyethylene, gelatin, or poly(methyl methacrylate) were irradiated with monoenergetic carbon and oxygen ion beams. Post-irradiation, 4D PET images were acquired and parent 11C, 10C and 15O radionuclides contributions in each voxel were determined from the extracted time activity curves. Next, the experimental configurations were simulated in Geant4 Monte Carlo versions 10.0 to 11.1, with three different fragmentation models—binary ion cascade (BIC), quantum molecular dynamics (QMD) and the Liege intranuclear cascade (INCL++) - 30 model-version combinations. Total positron annihilation and parent isotope production yields predicted by each simulation were compared between simulations and experiments using normalised mean squared error and Pearson cross-correlation coefficient. Finally, we compared the depth of the maximum positron annihilation yield and the distal point at which the positron yield decreases to 50% of peak between each model and the experimental results.Main results. Performance varied considerably across versions and models, with no one version/model combination providing the best prediction of all positron-emitting fragments in all evaluated target materials and irradiation conditions. BIC in Geant4 10.2 provided the best overall agreement with experimental results in the largest number of test cases. QMD consistently provided the best estimates of both the depth of peak positron yield (10.4 and 10.6) and the distal 50%-of-peak point (10.2), while BIC also performed well and INCL generally performed the worst across most Geant4 versions. Significance. The best predictions of the spatial distribution of positron annihilations and positron-emitting fragment production along the beam path during carbon and oxygen ion therapy was obtained using Geant4 10.2.p03 with BIC or QMD. These version/model combinations are recommended for future heavy ion therapy research. © 2024 The Author(s). Published on behalf of Institute of Physics and Engineering inMedicine by IOP Publishing Ltd https://doi.org/10.1088/1361-6560/ad4f48 https://crossmark.crossref.org/dialog/?doi=10.1088/1361-6560/ad4f48&domain=pdf&date_stamp=2024-6-11 https://creativecommons.org/licenses/by/4.0/ https://creativecommons.org/licenses/by/4.0/ https://orcid.org/0000-0002-3274-4261 https://orcid.org/0000-0003-3155-0083 https://orcid.org/0000-0003-4269-2618 https://orcid.org/0000-0001-9686-8901 https://orcid.org/0000-0002-9563-5943 https://orcid.org/0000-0001-5116-6308 https://orcid.org/0000-0002-9289-7956 https://orcid.org/0000-0002-6975-9563 mailto:mitras@ansto.gov.au https://doi.org/10.1088/1361-6560/ad4f48 Phys. Med. Biol. 69 (2024) 125015 A Chacon et al 1. Introduction One of the chief advantages of particle therapy as a treatment for cancer is the high dose gradient between the treatment area and surrounding regions (Durante et al 2017). This precision necessitates the use of sophisticated treatment planning and quality assurance methods to ensure proper delivery of the prescribed dose to the target only. These methods, in turn, are heavily reliant on Monte Carlo simulation methods, which are used for modelling the interaction of high-energy charged particles with the patient. Good models for nuclear fragmentation processes are especially critical for faithfully simulating imaging applications in particle therapy, such as positron emission tomography (PET)-based dose estimation methods for quality assurance, since the production and distribution of positron-emitting radionuclide fragments directly affects the quality of the resulting image (Parodi and Polf 2018, Hofmann et al 2019a, 2019b, Rutherford et al 2020). One of the leading fully open source Monte Carlo toolkits for modelling the interaction of radiation and matter, Geant4, currently offers a choice of three hadronic inelastic fragmentation models that are appropriate for particle therapy—binary ion cascade (BIC), quantum molecular dynamics (QMD), and Liège intranuclear cascade (INCL++) (Agostinelli et al 2003, Mancusi et al 2014, G Collaboration 2018)8. In a previous study, we evaluated these models by comparing the spatial distributions of positron-emitting radionuclides predicted following irradiation of PMMA, gelatin and polyethylene targets by monoenergetic carbon and oxygen ion beams (simulated using Geant4 10.2.p03) to equivalent results estimated from experimentally-obtained PET data (Chacon et al 2019). The BIC model was found to provide the best estimates overall; however, none of the models provided a perfect fit in all evaluated cases, and some significant discrepancies were observed. Since the publication of our previous study, there have been several updates to Geant4; specifically, six minor releases (versions 10.x) and one major release (version 11, which has since been updated to version 11.1). Each of these releases includes modifications to the physics models implemented in Geant4, which can affect the simulation of positron-emitting fragment production in particle therapy. In this work, we have extended our earlier study, and present a quantitative evaluation of Geant4’s ability to predict positron-emitting fragment production across a total of ten different stable versions (10.0.p04, 10.1.p03, 10.2.p03, 10.3.p03, 10.4.p03, 10.5.p01, 10.6.p03, 10.7.p02, 11.0 and 11.1) which have followed the previous major release (10.0) for each of the three different fragmentation models (Chacon et al 2019). In addition to the normalised mean squared error (NMSE) metric used in the previous study, three additional metrics—the Pearson cross-correlation coefficient (CCC), the depth of the positron annihilation peak, and the depth at which the positron annihilation intensity has decreased to 50% of the peak—are also used to compare the shape of the predicted positron-emitting fragment distributions with the experimentally measured distributions. 2. Materials andmethods This section describes the methods used for obtaining and quantitatively comparing the experimental and simulated positron annihilation profiles. The general approach is similar to that used in our previous study (see Chacon et al 2019); however, it has been extended to include a much wider range of Geant4 versions, and additional comparison metrics are introduced. The experimental methods used to estimate the total positron annihilation profile and activity of the dominant positron-emitting fragment isotopes (11C, 10C and 15O) are briefly summarised in section 2.1. Equivalent simulation configurations were constructed for each Geant4 version under test, and the total positron annihilation profile and activity of 11C, 10C and 15O were predicted for each beam ion/energy, target material, hadronic inelastic fragmentation model and Geant4 version; the design and parameters of these simulations are described in detail in section 2.2. Results in each of the three target materials and 5 ion/energy combinations were then compared to those predicted in equivalent simulations performed in Geant4 using each of hadronic fragmentation models (BIC, QMD and INCL++) across the ten evaluated Geant4 versions for a total of 150 unique target/ion/energy/version/model test conditions. The total positron yields and yields of the individual positron-emitting fragment species from each model and Geant4 version were then compared with the experimental annihilation profiles using the following metrics in each of the entrance, build-up, and Bragg peak and tail regions: 8 INCL++ is considered the most appropriate option for neutron spallation simulations, but is included here for completeness (Boudard et al 2002). 2 Phys. Med. Biol. 69 (2024) 125015 A Chacon et al Table 1. Beam parameters for each ion species and energy. The energy spread is 0.2% of nominal energy in each case; 95% confidence intervals are given for beam flux. Ion Energy (MeV/u) σx (mm) σy (mm) Beam flux (pps) 12C 148.5 2.77 2.67 1.8×109 ± 3.8×107 12C 290.5 3.08 4.70 1.8×109 ± 6.4×107 12C 350 2.50 2.98 1.8×109 ± 4.6×107 16O 148 2.79 2.89 1.1×109 ± 2.8×107 16O 290 2.60 4.90 1.1×109 ± 7.0×107 • NMSE; and • Pearson CCC. Additionally, the depth of the positron annihilation peak and the depth of the distal point at which the magnitude of the positron annihilation profile decreases to 50% of the peak value are evaluated. All metrics are described in detail in section 2.3. 2.1. Experimental configuration The experimental data obtained in our 2019 paper were used as the ground truth for this simulation study; a detailed description of the experimental procedures is presented in that paper (Chacon et al 2019). In summary, phantoms constructed from either pure PMMA, polyethylene or gelatin (encased in a thin-walled PMMA container), each with dimensions of 100mm×100mm×300mm, were irradiated with monoenergetic carbon or oxygen ion beams of various energies—three for carbon ions and two for oxygen (see table 1). Positron annihilation profiles (with respect to depth in the target) were estimated across the full width at tenth maximum (FWTM) of the beam using the whole-body DOI-PET scanner prototype developed at QST (Akamatsu et al 2019). These profiles were decomposed into the individual population of each of the dominant parent positron-emitting fragments (11C, 10C and 15O) at t= 0 (end of irradiation period) by fitting the observed time-decay curves in each voxel to a multiexponential decay model. 2.2. Simulation parameters The same beam parameters, phantom compositions and geometries used in the experimental measurements were modelled in each version of Geant4. Apart from minor modifications to the simulation source code required due to version-to-version changes in certain Geant4 application programming interfaces (APIs), the code was identical across versions. Simulations were performed using each of the 10 most recent stable releases of Geant4: 10.0.p04, 10.1.p03, 10.2.p03, 10.3.p03, 10.4.p03, 10.5.p01, 10.6.p03, 10.7.p02, 11.0 and 11.1. For brevity, the patch number will be dropped when referring to the version of Geant4. For each version of Geant4, three alternative hadronic ion fragmentation models were evaluated—BIC, QMD and Liège Intranuclear Cascade (INCL) models9 (Mancusi et al 2014, G Collaboration 2018). All simulations modelled electromagnetic interactions using the standard option 3 list (G4EmStandardPhysics_option3). The remaining physics processes, including hadronic physics models, are listed in table 10. The location of each positron annihilation, as well as the identity of the parent isotope which decayed to emit each positron (principally 11C, 10C and 15O), were scored with a resolution of 1.5 mm3 to match the voxel dimensions of the experimental OpenPET image reconstruction output. The pristine positron annihilation profiles were convolved with a 2.3 mm FWHMGaussian filter to simulate the measured point spread function of the PET system (Akamatsu et al 2019). A total of 20 runs, each with 108 primary particles were simulated for each version/model combination. In our previous work, we established that this is sufficient to limit the run-to-run ratio of standard deviation to mean across the build-up and Bragg peak region of the profiles to less than 5% (Chacon et al 2019). Each of the simulated profiles is randomly paired with one of the experimental profiles (for the same target, ion species and beam energy) and then the performance metrics are calculated, with the statistical distribution of each metric used to generate the confidence intervals shown in the results presented in the supplementary materials. 2.3. Evaluationmethods andmetrics The irradiated target was divided into three separate regions for analysis since different physics processes dominate in each: the entrance region, the build-up and Bragg peak region, and the tail region. This 9 The INCL model was developed specifically for spallation reactions but is included in this study as it can also model fragmentation. 3 Phys. Med. Biol. 69 (2024) 125015 A Chacon et al segmentation is defined in the same way as in our previous paper (Chacon et al 2019); in summary, the central build-up and Bragg peak region is defined as follows: • The proximal edge in the z dimension (along the path of the beam) is defined as the first point at which the dose deposited along the central axis exceeds the entrance plateau dose by more than 5% of the difference between peak dose and the entrance plateau dose; and • The distal edge in z is defined as the last point at which the deposited dose is greater than 5% of the absolute peak dose value. The entrance region is then defined as the region proximal to the build-up and Bragg peak region, while the tail region is defined as the region distal to the build-up and Bragg peak region. The yields of the positron-emitting nuclei are defined by (1): Yield (Isotope) = N (Isotope) N (Primary) (1) where N (Isotope) is the yield of the isotope under study in that region and N (Primary) is the total number of primary particles. Yields were calculated for each voxel along the beam path. Three different metrics were chosen to quantify the accuracy of each model in Geant4: the NMSE, the Pearson CCC, and the range (depth along the path of the beam) of both the positron annihilation peak and the point beyond the peak at which positron annihilation intensity decreases by 50%. NMSE measures the average squared difference between the experimental measurements and simulation-predicted positron yields in each region. NMSE is most useful in regions of relatively high yield (especially in the entrance and build-up and Bragg peak regions); the relatively low statistics available in the tail region limit the value of the NMSE there. NMSE is defined as: NMSE= Nreg∑ i=1 |Si − Ei|2 Nreg∑ i=1 |Ei|2 (2) where Si and Ei are the simulation and experimental yields in the ith voxel of the Nreg voxels in region reg (with a lower value indicating a better match). For the NMSE metric, we identify the best-performing model (with the lowest mean NMSE) and consider any other model whose mean NMSE is within two standard deviations of the best-performing version/model as being statistically equal. For a Gaussian random distribution, this would correspond to a 95% confidence interval (although, as can be seen in the box plots of the NMSE results included in the supplementary materials, the NMSE distributions often deviate from the Gaussian model). The Pearson CCC compares the degree of linear dependence of one profile to another—that is, the degree to which changes in the profiles occur at the same location and in the same direction. Thus, the Pearson CCC quantifies the differences in shape between the simulation-predicted positron-emitting fragment distributions and the experimental measurements, without regard to differences in the magnitude of the profiles. The Pearson CCC is defined as: CCC= ∑Nreg i=1 ( Snorm,i − Snorm )( Ei − Enorm )√(∑Nreg i=1 ( Snorm,i − Snorm )2)(∑Nreg i=1 ( Enorm,i − Enorm )2) (3) where Snorm,i and Enorm,i are the normalised simulation and experimental yields in the ith voxel of the Nreg voxels in region reg. Normalisation is performed by dividing each Si and Ei by the maximum value in its respective region. Snorm and Enorm are the mean values in each region. When comparing the models, the closer that the CCC between the simulation output and the experimental estimate of positron-emitting fragment distribution is to+1, the more accurate the prediction. A Pearson CCC greater than+0.8 is generally considered to be ‘very strong’ (Swinscow 2021). In this work, we aim to identify the very best version/model combinations; therefore, a Pearson CCC threshold of 0.95 is chosen to identify those combinations which have produced exceptionally good predictions of the shape of the yield profiles. It is important to note that this threshold is quite arbitrary, and the most appropriate 4 Phys. Med. Biol. 69 (2024) 125015 A Chacon et al threshold depends on the application; readers are referred to the supplementary data for the complete set of results. For each version of Geant4, phantom, beam type and energy, the NMSE and CCC were calculated for both total annihilation photon yield profiles and also for the profiles of the three main positron-emitting fragment species (10C, 11C and 15O). The calculation was repeated for each of the Nreg regions (entrance, build-up and Bragg peak, and tail regions). The NMSE and the CCC were then compared across all evaluated Geant4 versions for each region, phantom material and beam type. A total of 5 energy/ion combinations are evaluated (carbon ions at three energies and oxygen ions at two energies). For oxygen ions, three target materials (gelatine, PMMA and polyethylene) are evaluated for total positron annihilation yield and 11C/10C/15O yield. For carbon ions, the same three target materials are evaluated for total positron annihilation yield and 11C/10C yield and two for 15O yield (polyethylene is omitted since it is not possible to produce 15O fragments with a 12C ion beam and a PE target which only contains carbon and hydrogen). Thus, a total of 15 cases are evaluated for total positron annihilation, 11C yield and 10C yield, while 12 are cases evaluated for 15O yield. For range calculations, the difference between the depths at which the positron annihilation yield reached its maximum value in the experiment and simulation was calculated (see (4)). Additionally, the point distal to this maximum at which positron annihilation yield decreases to 50% of the maximum value was also compared between experiment and simulation. For each version and model, the mean differences between the experimental and simulation-based values, as well as the standard deviations and maximum differences were calculated across all test cases (ion species, energies and target materials), δvoxel = Rsimulation −Rexperiment (4) where Rx is the range (depth) of the voxel with the maximum value (or, for distal 50% of peak, the first distal voxel to fall below 50% of the maximum value) in either the simulation or experiment. 3. Results and discussion The number of cases in which each version/model combination performed the best or equal-best in terms of each of the evaluated metrics are counted across all simulations in the entrance, build-up and Bragg peak and tail regions, and summarised in this section. Detailed results for each experiment are included in the supplementary materials. 3.1. Entrance region In the entrance region, positron-emitting fragments are created by target fragmentation rather than projectile fragmentation. The projectile ions lose energy via Coulomb interactions, slowing down at an approximately constant rate as they traverse this region, with only gradual changes to projectile/target cross sections. As a result, the positron-emitting fragment distributions are expected to exhibit an approximately flat depthwise profile in this region. NMSE and Pearson CCC results between simulation and experimental total positron annihilation profiles in the entrance region are summarised in tables 2 and 3, respectively, with corresponding figures shown in supplementary material section 1. For the entrance region, the BIC model implemented in Geant4 versions 10, 10.1, 10.3 and 10.4 provided the (equal) lowest NMSE of the yields of total positron annihilation in 5 out of 15 cases. The BIC model in Geant4 10, 10.1 and 10.2 also provided the (equal) lowest NMSE for 11C fragment production (11/15 cases), whereas for 10C the best version/model combination was 10.5/INCL (8/15 cases) and for 15O it was 10.6/BIC (9/12 cases). Geant4 versions 10.5-11 with BIC and 10.3/10.4 with INCL each achieved a Pearson CCC greater than 0.95 (3/15 cases) for total positron yield; QMD did not reach the threshold for any test case in any version of Geant4. Results for individual radionuclides were also mixed, with 10/BIC, 10.1/BIC, 10.4/BIC and 10.5/INCL achieving the threshold in 4/15 cases for 11C, 10-10.4/BIC and all versions with QMD reaching the threshold in 2/15 cases for 10C, and all versions with BIC and 10/INCL, 10.1/INCL, 10.2/INCL, and 10.5/INCL reaching the threshold for 15O. 3.2. Build-up and Bragg peak region In the build-up and Bragg peak region, positron-emitting fragments are produced via a combination of target fragmentation and projectile fragmentation. There is a rapid change in positron-emitting fragment yield with respect to depth, especially since different positron-emitting fragments stop at different distances from their point of production. 5 Phys. Med. Biol. 69 (2024) 125015 A Chacon et al Table 2. Number of test cases for which each Geant4 version/model combination achieved the lowest or equal-lowest NMSE in the entrance region. Bold text denotes the version/model achieving the highest (or equal-highest) number of best results for each combination of ion/energy/target. Version Total 11C 10C 15O BIC QMD INCL BIC QMD INCL BIC QMD INCL BIC QMD INCL 10 6 0 0 11 3 2 0 0 2 6 2 0 10.1 6 0 0 11 3 2 0 0 3 6 2 0 10.2 5 0 0 11 3 2 0 0 3 5 2 0 10.3 6 0 0 6 0 0 5 3 6 4 2 0 10.4 6 0 0 1 0 0 2 3 6 9 2 0 10.5 2 0 0 0 0 0 2 5 8 3 1 0 10.6 4 0 0 1 0 0 0 0 5 5 4 0 10.7 3 0 0 1 0 0 0 0 5 5 2 0 11 4 0 0 1 0 0 0 0 5 5 2 0 11.1 4 0 0 0 0 0 0 0 5 5 2 0 Table 3. Number of test cases for which each Geant4 version/model combination achieved a CCC greater than 0.95 in the entrance region. Bold text denotes the version/model achieving the highest number of best results for each combination of ion/energy/target. Version Total 11C 10C 15O BIC QMD INCL BIC QMD INCL BIC QMD INCL BIC QMD INCL 10 2 0 1 4 0 0 2 2 0 3 0 3 10.1 1 0 1 4 0 0 2 2 0 3 0 3 10.2 2 0 1 3 0 0 2 2 0 3 0 3 10.3 2 0 3 3 1 3 2 2 0 3 0 2 10.4 2 0 3 4 1 3 1 2 0 3 0 2 10.5 3 0 1 3 2 4 1 2 0 3 0 3 10.6 3 0 1 2 1 2 1 2 0 3 0 2 10.7 3 0 1 3 1 3 1 2 0 3 0 2 11 3 0 1 2 1 3 1 2 0 3 0 2 11.1 1 0 1 2 1 2 1 2 0 3 0 2 Table 4. Number of test cases for which each Geant4 version/model combination achieved the lowest or equal-lowest NMSE in the build-up and Bragg peak region. Bold text denotes the version/model achieving the highest (or equal-highest) number of best results for each combination of ion/energy/target. Version Total 11C 10C 15O BIC QMD INCL BIC QMD INCL BIC QMD INCL BIC QMD INCL 10 4 0 0 11 6 3 0 0 2 3 2 0 10.1 5 0 0 11 6 3 0 0 2 3 2 0 10.2 11 1 0 14 6 3 0 0 1 5 1 0 10.3 1 0 0 5 0 0 4 1 2 3 0 0 10.4 1 0 0 1 0 0 3 2 2 4 0 0 10.5 0 0 0 0 0 0 3 4 7 1 2 0 10.6 1 0 0 0 0 0 3 0 9 2 1 2 10.7 1 0 0 0 0 0 3 2 7 0 0 1 11 2 0 0 0 0 0 3 2 9 2 0 2 11.1 1 0 0 0 0 0 3 2 9 2 0 2 NMSE and Pearson CCC results between simulation and experimental total positron annihilation profiles in the build-up and Bragg peak region are summarised in tables 4 and 5, respectively, with corresponding figures shown in supplementary material section 2. In the build-up and Bragg peak region, according to the NMSE metric, total positron yield is most accurately predicted by the BIC model in Geant4 version 10.2, being (equal) best in 11/15 cases. This is much higher than the next-best combinations (10.1/BIC with 5/11 cases followed by 10/BIC with 4/11). Similar results are observed for 11C yield, with 10.2/BIC achieving (equal) best performance in 14/15 cases, and 10/BIC and 10.1/BIC each achieving (equal) best results in 11/15 cases; QMD also performs reasonably well in this case with 10, 10.1 and 10.2 achieving wins in 6/15 cases. For 10C, 10.6/INCL, 11/INCL and 11.1/INCL are the best performers (each winning in 9/15 cases). Finally, for 15O, 10.2/BIC is the best-performing model with 5/12 wins, followed by 10.4/BIC with 4 wins. 6 Phys. Med. Biol. 69 (2024) 125015 A Chacon et al Table 5. Number of test cases for which each Geant4 version/model combination achieved a CCC greater than 0.95 in the build-up and Bragg peak region. Bold text denotes the version/model achieving the highest number of best results for each combination of ion/energy/target. Version Total 11C 10C 15O BIC QMD INCL BIC QMD INCL BIC QMD INCL BIC QMD INCL 10 9 4 3 3 3 2 4 3 6 3 2 3 10.1 9 4 4 3 3 3 4 3 6 3 2 3 10.2 10 8 4 6 6 3 5 4 5 3 3 3 10.3 9 6 6 6 5 3 2 2 2 4 3 3 10.4 6 8 7 6 5 3 1 1 3 4 3 3 10.5 8 8 6 6 5 4 2 2 3 3 3 4 10.6 9 9 6 6 7 6 2 1 4 4 3 4 10.7 6 6 4 3 5 2 2 1 4 3 2 3 11 8 8 6 4 5 4 3 2 4 3 3 3 11.1 6 5 6 2 3 4 3 2 4 3 3 3 Table 6. Differences between the depths of the maximum positron annihilation yield in experimental and simulation results. Each voxel has a width of 1.5 mm; the maximum error is always in multiples of 1.5 mm increments. Version BIC QMD INCL µ (mm) σ (mm) max (mm) µ (mm) σ (mm) max (mm) µ (mm) σ (mm) max (mm) 10 1 1.85 6 −0.20 1.69 3 1.10 3.82 10.50 10.1 1 1.85 6 −0.20 1.69 3 1 3.91 10.50 10.2 0.60 1.37 3 −0.60 1.58 −3 0.60 3.39 9 10.3 1.30 1.78 6 −0.30 1.41 −3 2.30 4.12 9 10.4 1.60 1.55 6 −0.10 1.44 −3 4 4.45 10.50 10.5 0.69 0.99 3 0.60 2.03 6 2.20 4.24 9 10.6 0.60 1.37 3 −0.10 1.55 3 0.10 2.50 7.50 10.7 1.20 1.72 4.50 0.60 1.95 4.50 0.70 3.40 10.50 11 1.10 1.65 4.50 0.60 1.95 4.50 0.60 3.09 9 11.1 0.30 1.62 3 −0.30 1.62 3 0 2.78 7.50 Table 7. Differences between the distal depths at which the positron annihilation yield has decreased to 50% of the peak value in experimental and simulation results. Each voxel has a width of 1.5mm; the maximum error is always in multiples of 1.5mm increments. Version BIC QMD INCL µ (mm) σ (mm) max (mm) µ (mm) σ (mm) max (mm) µ (mm) σ (mm) max (mm) 10 0.70 1.49 3 0.30 1.41 3 −0.20 1.49 −3 10.1 0.70 1.49 3 0.30 1.41 3 −0.20 1.49 −3 10.2 0.30 1.01 1.50 0 1.13 1.50 −0.50 1.22 −3 10.3 0.50 1.09 3 0.20 1.11 1.50 −0.20 1.37 −3 10.4 0.60 1.11 3 0.30 1.01 1.50 0.10 1.20 −3 10.5 0.35 0.90 1.50 0.20 1.11 1.50 −0.50 1.22 −3 10.6 0.40 0.89 1.50 0.20 1.11 1.50 −0.40 1.33 −3 10.7 1 1.46 3 0.70 1.59 3 −0.10 1.65 3 11 1 1.46 3 0.70 1.59 3 0 1.60 3 11.1 0 1.60 −3 −0.10 1.44 −3 −0.60 1.68 −3 Using the Pearson CCC metric, the best-performing version/model combinations for overall positron yield are 10.2/BIC (10/15 cases), followed by 10/BIC, 10.1/BIC, 10.3/BIC, 10.6/BIC and 10.6/QMD (9/15 cases). Generally, BIC performed very well, with all Geant4 versions achieving (equal) best performance in at least 6 cases. 11C yield was best predicted by 10.6/QMD (7/15 cases) however many version/model combinations did well here also, with 10.2/BIC, 10.2/QMD, 10.3/BIC, 10.4/BIC, 10.5/BIC, 10.6/BIC and 10.6/INCL all achieving 6/15 wins. 10C yield was best predicted by 10/INCL and 10.1 INCL (6/15 cases), closely followed by 10.2/BIC and 10.2/INCL which won in 5/15 cases. The best-performing version/model combinations for 15O yield were 10.3/BIC, 10.4/BIC, 10.5/INCL, 10.6/BIC and 10.6/INCL with 4/15 wins each, and all other version/model combinations achieving 2 or 3 wins. Table 6 lists difference between the experimental and simulation positron peak, while table 7 lists the difference between the 50% fall off point for the experimental and simulated positron peak. 7 Phys. Med. Biol. 69 (2024) 125015 A Chacon et al Table 8. Number of test cases for which each Geant4 version/model combination achieved the lowest or equal-lowest NMSE in the tail region. Bold text denotes the version/model achieving the highest (or equal-highest) number of best results for each combination of ion/energy/target. Version Total 11C 10C 15O BIC QMD INCL BIC QMD INCL BIC QMD INCL BIC QMD INCL 10 3 2 2 5 9 4 2 2 1 4 4 4 10.1 4 2 2 6 9 4 1 1 2 4 4 4 10.2 12 10 2 11 12 4 1 1 3 7 7 4 10.3 1 1 1 1 1 1 2 4 0 3 4 4 10.4 1 1 1 1 1 1 2 3 0 3 4 3 10.5 1 1 1 1 1 1 3 4 2 2 2 2 10.6 1 2 1 1 1 1 5 3 6 4 10 4 10.7 1 1 1 1 1 1 3 3 5 3 4 3 11 1 1 1 1 1 1 4 3 6 4 4 3 11.1 1 1 1 1 1 1 4 3 5 4 4 3 Table 9. Number of test cases for which each Geant4 version/model combination achieved a CCC greater than 0.95 in the tail region. Bold text denotes the version/model achieving the highest number of best results for each combination of ion/energy/target. Version Total 11C 10C 15O BIC QMD INCL BIC QMD INCL BIC QMD INCL BIC QMD INCL 10 12 11 11 10 11 11 4 3 4 8 7 9 10.1 12 11 11 11 11 11 4 4 4 8 7 9 10.2 12 11 11 10 11 11 4 4 4 8 7 7 10.3 11 11 10 11 11 11 2 2 1 8 7 8 10.4 10 10 10 9 11 10 2 2 0 7 7 8 10.5 11 11 11 11 11 11 2 2 3 7 7 8 10.6 11 11 12 11 11 11 2 3 3 7 7 9 10.7 12 11 12 11 11 12 3 3 3 7 7 9 11 12 11 12 11 11 13 3 3 3 8 8 9 11.1 12 11 12 11 11 13 3 3 3 8 8 9 The smallest differences between experimental and simulation-based depth of maximum positron annihilation were obtained with Geant4 10.4/QMD (µ=−0.1mm; max=−3mm) and 10.6/QMD (µ=−0.1mm; max=+3mm). While a smaller mean value was obtained with 11.1/INCL, the maximum value and standard deviation were much larger (+7.5mm and 2.78mm) compared to 10.4/QMD and 10.6/QMD. Differences in the depth of the distal 50%-of-peak point were much smaller; the best estimates were obtained with 10.2/QMD (µ= 0 mm; max=+1.5mm), 11.1/BIC (µ= 0mm; max=−3mm) and 11/INCL (µ= 0mm; max=+3mm). 3.3. Tail region In the tail region, positron-emitting radionuclides are primarily produced through fragmentation of the target material caused by light fragments created upstream from the primary beam. As such, the production of positron-emitting fragments in the tail region is highly dependent on fragmentation and scattering cross sections upstream. Therefore, the yield of positron annihilation is not expected to rapidly change across this region compared to the build-up and Bragg peak region. NMSE and Pearson CCC results between simulation and experimental total positron annihilation profiles in the tail region are summarised in tables 8 and 9, respectively, with corresponding figures shown in supplementary material section 3. Using the NMSE metric, 10.2/BIC was the best-performing version/model combination for overall positron yield (12/15 cases), with 10.2/QMD being the second-best (10/15). Results were similar for 11C yield, with the best version/model combinations being 10.2/QMD (12/15 cases) and 10.2/BIC (11/15). For 10C, the most wins were obtained by 10.6/INCL and 11/INCL (6/15 cases) followed by 10.6/BIC, 10.7/INCL and 11.1/INCL (5/15 cases). Finally, for 15O, the best results were obtained with 10.6/QMD (10/12 cases) followed by 10.2/BIC and 10.2/QMD (7/12 cases). The Pearson CCC results in the tail region were all very similar across Geant4 versions, with only a few wins separating the best and worst-performing version/model combinations in most instances. All version/models exceeded the threshold of 0.95 for a clear majority of cases for total positron yield as well as 11C and 15O production. For total positron annihilation yield, 10/BIC, 10.1/BIC, 10.2/BIC, 10.6/INCL, 8 Phys. Med. Biol. 69 (2024) 125015 A Chacon et al 10.7/BIC, 10.7/INCL, 11/BIC, 11/INCL, 11.1/BIC and 11.1/INCL all exceeded the target threshold for 12/15 cases. Even the worst-performing version/model combinations still exceeded the threshold in 10/15 cases. For 11C yield, 11/INCL and 11.1/INCL reached the threshold in 13/15 cases (with the worst-performing combination scoring 9/15 wins). Fewer wins were seen with 10C; the best results were obtained with 10/BIC, 10/INCL, 10.1/BIC, 10.1/QMD, 10.1/INCL, 10.2/BIC, 10.2/QMD and 10.2/INCL (4/15 cases). Finally, 15O yield was best predicted by 10/INCL, 10.1/INCL, 10.6/INCL, 10.7/INCL, 11/INCL and 11.1/INCL (9/12 cases)-again, in this case, even the worst-performing version/model combinations exceeded the threshold in 7/12 cases. 3.4. Overall recommendation The accuracy of Geant4’s hadronic inelastic physics models (BIC, QMD and INCL) in predicting both total positron annihilation yield and individual positron-emitting fragment production is not consistent between different versions of Geant4; furthermore, later releases do not necessarily provide a more accurate prediction of experimental observations than preceding versions. In some cases, NMSE and Pearson CCC yielded conflicting results, due to the different features of the respective profiles which are emphasised by each metric (NMSE quantifying the overall average squared differences between the profiles while Pearson CCC quantifying the degree of linear dependence, independent of relative or absolute magnitude). In the entrance region, BIC was clearly the best-performing model, with the best choice of Geant4 version depending on the particular metric and fragmentation product of interest. NMSE results generally favoured 10-10.4/BIC (especially 10.2/BIC), except for 10C yield, which was better predicted by 10.3+/INCL. Pearson CCC performance did not strongly favour any particular version/model combination, with at most 1/3 of test cases achieving the target CCC threshold of 0.95 for any version/model. In the build-up and Bragg peak region and tail region, the results are more conclusive. The NMSE metric conclusively shows that version 10.2/BIC is the best choice for total positron yield as well as 11C and 15O yield, while 10.5-11.1/INCL performed the best for 10C. Pearson CCC results are more mixed, but again, 10.2/BIC gives the best results for total positron annihilation yield, with most versions of Geant4 with BIC performing well. 10.6/QMD performed the best for 11C, 10/INCL and 10.1/INCL performed the best for 10C, and there was no clear winner for 15O. Using the depth-of-maximum-yield metric, the smallest mean differences were obtained with 10.4/QMD and 10.6/QMD. These versions/models also achieved the equal-smallest maximum difference (−3mm and +3mm, respectively). Across all versions of Geant4, QMD demonstrated the best overall accuracy (lowest average mean difference in peak depth) and highest precision (lowest average standard deviation). INCL was the worst-performing model across all versions, with much larger maximum differences, and a consistent underestimation of depth of maximum yield across Geant4 versions, with the exception of version 11.1 (which, despite a mean difference of 0, exhibited a large standard deviation and maximum value). Standard deviations obtained using INCL were generally around double those of QMD and BIC. BIC also showed a consistent underestimation in depth of maximum yield, although the maximum differences were much smaller than for INCL. For context, the difference between the depth of the positron annihilation peak and the Bragg peak with monoenergetic ion beams is of the order of –5.6± 0.8mm for 12C and –6.6± 0.8mm for 16O (Augusto et al 2018, Mohammadi et al 2019, Chacon et al 2020). Results were generally better for the distal depth at 50% of peak metric. In this case, 10.2/QMD, 11.1/BIC and 11/INCL all achieved a mean of zero, with 10.2/QMD also having the equal-lowest maximum value of 1.5mm (a depth difference of one voxel). QMD’s maximal values were slightly smaller overall compared to BIC, and INCL’s were the largest at±3mm for all versions. INCL tended to consistently overestimate the depth of this point, with both mean and maximum differences being negative in most cases. BIC and QMD both tended towards underestimating the 50%-of-peak depth, with the exception of version 11.1 (negative maxima for both, and means of 0 and−0.1mm, respectively). Standard deviations were quite small for all versions and models (with the maximum standard deviation being 1.68mm, for 11.1/INCL). Finally, in the tail region, Geant4 10.2 with BIC and QMD again provided the best prediction of total positron and 11C yield in terms of NMSE, while 10.6/INCL performed the best for 10C and 15O. All version/model combinations performed well for total positron annihilation, 11C and 15O yield according to the Pearson CCC metric, while no version/model performed especially well for 10C. Across all regions, ion species, beam energies, and target materials evaluated, the combination of Geant4 version 10.2 and BIC is best able to reproduce experimental results as evaluated using the NMSE and Pearson CCC metrics—especially in the build-up and Bragg peak region and tail region. Since the build-up and Bragg peak region is the location where (1) the majority of the dose resulting from carbon or oxygen ion beam irradiation in heavy ion therapy is deposited, and (2) where the strongest positron annihilation signal is observed, the results in this region are the most relevant to PET image-based QA simulation work. Version 10.2 also provided the best estimate of the depth of the distal depth at which positron yield decreased to 50% 9 Phys. Med. Biol. 69 (2024) 125015 A Chacon et al of peak, although this was obtained with QMD rather than BIC; the most accurate estimate of the depth of the peak itself was also achieved with QMD, but with Geant4 versions 10.4 and 10.6. As QMD exhibited the best accuracy and precision across Geant4 versions, it is the recommended model if the depth of the yield peak is critical. The BIC model implemented in Geant4 version 10.5 suffered from a run-time stability error which resulted in it being unable to simulate all test scenarios; therefore, we recommend that this version/model combination should be avoided for future studies. In the evaluation of individual positron-emitting fragment yield profiles, predictions of 10C distribution were generally the least accurate in terms of both the NMSE and Pearson CCC. Interestingly, the INCL model often performed the best for prediction of 10C fragment yield, although it rarely performed the best for total positron annihilation and 11C or 15O. Therefore, INCL should be considered for studies focusing on 10C fragmentation, with the caveat that range estimation will be less accurate with this model. Not all models met or exceeded the set threshold of 0.95 for the Pearson CCC metric. This means that in these cases, the shape of the predicted positron distribution differs significantly from the experimental measurements. This is of particular concern if these models are to be used for dose estimation using a deconvolution approach (Hofmann et al 2019a, 2019b) or for the training of machine learning models for feature extraction (Rutherford et al 2022). One may reasonably ask why the performance of the fragmentation models in Geant4 has not continued to steadily improve with each release, and in fact has regressed at times. Positron-emitting isotope production channels represent only a fraction of all possible reaction outcomes, so it may be the case that by improving results for one subset of reaction processes, the positron-emitting nuclide production cross sections became worse. Another possible reason is the implementation of different numbers of de-excitation channels in the Fermi break-up model in different versions of Geant4. Unfortunately, to date, no detailed investigation has been conducted into Geant4 to pin down the specific cause, and it is unknown at this stage if there are other contributing factors as well. In order to more strictly monitor the impact of the evolution of Geant4 in the results of a simulation application of interest, the Geant4 developers are developing an automated benchmarking system for medical applications in Geant4 (the G4-Med project) which should help to document the reasons behind different results when using different Geant4 releases with higher granularity (Arce et al 2020). In the next release of Geant4, 11.2, a new QMDmodel, ‘Light Ion QMD’, will be introduced10 with a specific focus on hadron therapy (Sato et al 2022). In future work, we will be collaborating with the developers of this model to compare its performance against the other models included in Geant4 11.2 with a focus on in vivo PET applications. Finally, it is worth noting that current evaluations of fragmentation cross sections exhibit uncertainties exceeding 10%, which must be tightened in order to accurately model positron fragmentation, particularly in the case of complex fragmentation reactions such as the production of 10C (Bolst et al 2017, Toppi et al 2022). These uncertainties are especially due to the effective cross-sections that are double-differential in angle and energy. Since these cross-sections provide a strong constraint on nucleus-nucleus reaction models, access to improved experimental measurements of these cross-sections is vital to constraining these models and improving their accuracy. This also impacts other Monte Carlo simulation platforms (such as FLUKA, MCNP and PHITS) which also rely on accurate cross section data (although notably PHITS uses a new version of this model, JQMD2, which tries to correct the main flaw of the QMDmodel, the drop in effective cross-sections at low angles (Ogawa et al 2015)). 4. Conclusion In this study, the accuracy with which Geant4 is able to predict the distribution of total positron annihilation yield and the distributions of individual positron-emitting fragmentation products (11C, 10C and 15O) during carbon or oxygen ion therapy was compared to experimental data. Three different hadronic inelastic physics models—BIC, QMD and Liege Intranuclear Cascade model (INCL) were used with ten different versions of Geant4-10.0.p04, 10.1.p03, 10.2.p03, 10.3.p03, 10.4.p03, 10.5.p01, 10.6.p03, 10.7.p02, 11.0 and 11.1, in three different homogeneous phantoms. The simulated and experimental data were compared using two different metrics—NMSE and the Pearson CCC. Additionally, the differences between the simulated and experimental depth of maximum positron annihilation yield, as well as the distal point at which positron yield declines to 50% of the peak were evaluated. It was found that the accuracy of the hadronic inelastic 10 Note: this model had not been included in Geant4 prior to the submission of this manuscript. 10 Phys. Med. Biol. 69 (2024) 125015 A Chacon et al physics models strongly depends on the version of Geant4 in which it was implemented, and newer versions of Geant4 were not always more accurate at predicting positron-emitting fragmentation compared to older versions. Furthermore, it was found that not all version/model combinations were able to satisfactorily predict the shape of positron annihilation or positron-emitting fragment distributions, even though they could provide a good estimation of the total positron annihilation yield and range. For future simulation studies of therapeutic irradiation using carbon or oxygen ion beams, it is recommended that Geant4 version 10.2 with the BIC model be used as it is currently the version/model combination best able to replicate the experimentally-observed total positron yield and the fragmentation product distributions, while the depth of the maximum positron yield and distal 50%-of-peak point were best predicted using the QMDmodel from Geant4 10.4, 10.6 (peak) and 10.2 (distal 50%-of-peak). Data availability statement All data that support the findings of this study are included within the article (and any supplementary information files). Acknowledgment The authors would like to acknowledge the following organisations for providing access to their high-performance computing resources: the Multi-modal Australian Sciences Imaging and Visualisation Environment (MASSIVE) ‘M3’ cluster and Australia’s Nuclear Science and Technology Organisation (ANSTO) ‘Tesla’ cluster. This research has been conducted with the support of the Australian government research training program scholarship. The authors acknowledge the scientific and technical assistance of the National Imaging Facility, a National Collaborative Research Infrastructure Strategy (NCRIS) capability at the Australian Nuclear Science and Technology Organisation, ANSTO. Appendix Table 10 lists the physics models which were used in the simulations. Table 10. Hadronic physics processes and models used in all simulations. Interaction Energy range Geant4 model Radioactive decay All energies G4RadioactiveDecayPhysics Particle decay All energies G4Decay Hadron elastic 0–100 TeV G4HadronElasticPhysicsHP Ion inelastic <100 MeV Binary Light Ion Cascade 100 MeV–10 GeV BIC or QMD or INCL++ Neutron capture 0–20 MeV NeutronHPCapture >19.9 MeV nRadCapture Neutron inelastic 0–20 MeV NeutronHPInelastic >19.9 MeV Binary Cascade Proton inelastic 990 eV–10 TeV Binary Cascade ORCID iDs Akram Hamato https://orcid.org/0000-0002-3274-4261 Hideaki Tashima https://orcid.org/0000-0003-3155-0083 Eiji Yoshida https://orcid.org/0000-0003-4269-2618 Go Akamatsu https://orcid.org/0000-0001-9686-8901 Daniel R Franklin https://orcid.org/0000-0002-9563-5943 Anatoly Rosenfeld https://orcid.org/0000-0001-5116-6308 Susanna Guatelli https://orcid.org/0000-0002-9289-7956 Mitra Safavi-Naeini https://orcid.org/0000-0002-6975-9563 11 https://orcid.org/0000-0002-3274-4261 https://orcid.org/0000-0002-3274-4261 https://orcid.org/0000-0003-3155-0083 https://orcid.org/0000-0003-3155-0083 https://orcid.org/0000-0003-4269-2618 https://orcid.org/0000-0003-4269-2618 https://orcid.org/0000-0001-9686-8901 https://orcid.org/0000-0001-9686-8901 https://orcid.org/0000-0002-9563-5943 https://orcid.org/0000-0002-9563-5943 https://orcid.org/0000-0001-5116-6308 https://orcid.org/0000-0001-5116-6308 https://orcid.org/0000-0002-9289-7956 https://orcid.org/0000-0002-9289-7956 https://orcid.org/0000-0002-6975-9563 https://orcid.org/0000-0002-6975-9563 Phys. Med. Biol. 69 (2024) 125015 A Chacon et al References Agostinelli S et al 2003 Geant4—a simulation toolkit, nuclear instruments and methods in physics research section a: accelerators, spectrometers Detectors Assoc. Equip. 506 250–303 Akamatsu G et al 2019 Performance evaluation of a whole-body prototype PET scanner with four-layer DOI detectors Phys. Med. Biol. 64 095014 Arce P et al 2020 Report on G4Med, a Geant4 benchmarking system for medical physics applications developed by the Geant4 Medical Simulation Benchmarking GroupMed. Phys. 48 19–56 Augusto R S, Mohammadi A, Tashima H, Yoshida E, Yamaya T, Ferrari A and Parodi K 2018 Experimental validation of the fluka Monte Carlo code for dose and β+ -emitter predictions of radioactive ion beams Phys. Med. Biol. 63 215014 Bolst D et al 2017 Validation of geant4 fragmentation for heavy ion therapy Nucl. Instrum. Methods Phys. Res. 869 68–75 Boudard A, Cugnon J, Leray S and Volant C 2002 Intranuclear cascade model for a comprehensive description of spallation reaction data Phys. Rev. C 66 044615 Chacon A et al 2019 Comparative study of alternative geant4 hadronic ion inelastic physics models for prediction of positron-emitting radionuclide production in carbon and oxygen ion therapy Phys. Med. Biol. 64 155014 Chacon A et al 2020 Experimental investigation of the characteristics of radioactive beams for heavy ion therapyMed. Phys. 47 3123–32 Durante M, Orecchia R and Loeffler J S 2017 Charged-particle therapy in cancer: clinical uses and future perspectives Nat. Rev. Clin. Oncol. 14 483–495 G Collaboration 2018 Physics reference manual for geant4 Technical Report CERN Hofmann T et al 2019a Dose reconstruction from PET images in carbon ion therapy: a deconvolution approach Phys. Med. Biol. 64 025011 Hofmann T, Fochi A, Parodi K and Pinto M 2019b Prediction of positron emitter distributions for range monitoring in carbon ion therapy: an analytical approach Phys. Med. Biol. 64 105022 Mancusi D, Boudard A, Cugnon J, David J-C, Kaitaniemi P and Leray S 2014 Extension of the Liège intranuclear-cascade model to reactions induced by light nuclei Phys. Rev. C 90 054602 Mohammadi A, Tashima H, Iwao Y, Takyu S, Akamatsu G, Nishikido F, Yoshida E, Kitagawa A, Parodi K and Yamaya T 2019 Range verification of radioactive ion beams of 11C and 15O using in-beam PET imaging Phys. Med. Biol. 64 145014 Ogawa T, Sato T, Hashimoto S, Satoh D, Tsuda S and Niita K 2015 Energy-dependent fragmentation cross sections of relativistic 12C Phys. Rev. C 92 024614 Parodi K and Polf J C 2018 In vivo range verification in particle therapyMed. Phys. 45 e1036–50 Rutherford H et al 2022 An inception network for positron emission tomography based dose estimation in carbon ion therapy Phys. Med. Biol. 67 194001 Sato Y-h, Sakata D, Bolst D, Simpson E C, Guatelli S and Haga A 2022 Development of a more accurate geant4 quantum molecular dynamics model for hadron therapy Phys. Med. Biol. 67 225001 Swinscow T 2021 Statistics at Square One (Wiley) Toppi M et al 2022 Elemental fragmentation cross sections for a 16O beam of 400 MeV/u kinetic energy interacting with a graphite target using the FOOT∆E-TOF detectors Front. Phys. 10 979229 Rutherford H et al 2020 Dose quantification in carbon ion therapy using in-beam positron emission tomography Phys. Med. Biol. 65 235052 12 https://doi.org/10.1016/s0168-9002(03)01368-8 https://doi.org/10.1016/s0168-9002(03)01368-8 https://doi.org/10.1088/1361-6560/ab18b2 https://doi.org/10.1088/1361-6560/ab18b2 https://doi.org/10.1002/mp.14226 https://doi.org/10.1002/mp.14226 https://doi.org/10.1088/1361-6560/aae431 https://doi.org/10.1088/1361-6560/aae431 https://doi.org/10.1016/j.nima.2017.06.046 https://doi.org/10.1016/j.nima.2017.06.046 https://doi.org/10.1103/physrevc.66.044615 https://doi.org/10.1103/physrevc.66.044615 https://doi.org/10.1088/1361-6560/ab2752 https://doi.org/10.1088/1361-6560/ab2752 https://doi.org/10.1002/mp.14177 https://doi.org/10.1002/mp.14177 https://doi.org/10.1038/nrclinonc.2017.30 https://doi.org/10.1038/nrclinonc.2017.30 https://doi.org/10.1088/1361-6560/aaf676 https://doi.org/10.1088/1361-6560/aaf676 https://doi.org/10.1088/1361-6560/ab17f9 https://doi.org/10.1088/1361-6560/ab17f9 https://doi.org/10.1103/PhysRevC.90.054602 https://doi.org/10.1103/PhysRevC.90.054602 https://doi.org/10.1088/1361-6560/ab25ce https://doi.org/10.1088/1361-6560/ab25ce https://doi.org/10.1103/physrevc.92.024614 https://doi.org/10.1103/physrevc.92.024614 https://doi.org/10.1002/mp.12960 https://doi.org/10.1002/mp.12960 https://doi.org/10.1088/1361-6560/ac88b2 https://doi.org/10.1088/1361-6560/ac88b2 https://doi.org/10.1088/1361-6560/ac9a9a https://doi.org/10.1088/1361-6560/ac9a9a https://doi.org/10.3389/fphy.2022.979229 https://doi.org/10.3389/fphy.2022.979229 https://doi.org/10.1088/1361-6560/abaa23 https://doi.org/10.1088/1361-6560/abaa23 A quantitative assessment of Geant4 for predicting the yield and distribution of positron-emitting fragments in ion beam therapy 1. Introduction 2. Materials and methods 2.1. Experimental configuration 2.2. Simulation parameters 2.3. Evaluation methods and metrics 3. Results and discussion 3.1. Entrance region 3.2. Build-up and Bragg peak region 3.3. Tail region 3.4. Overall recommendation 4. Conclusion Appendix References