Machine learning-based prediction of crack mouth opening displacement in ultra-high-performance concrete - Scientific Reports

The subsequent ranks are held by SF, FL, and FD, all of which have notably reduced scores. Although FD exceeded the importance ranks of FA, the w/b, and a₀, it was ultimately discarded in the PSO process. This decision stemmed from its diminishing return in overall predictive strength when FV and FL were already accounted for, a situation likely exacerbated by multicollinearity that renders overlapping explanatory capabilities less distinct. Variables such as SP, SD, and CA registered near-zero scores and were also omitted from the final model. The noticeable difference in scores shows that FV has the biggest impact on the model predictions, while other factors play a supporting role that fine-tunes but doesn't overshadow the predictive behavior.

Figure 9 presents a focused comparison of the estimated outputs from each algorithm with the independently recorded CMOD measurements, exposing the results on a plot that incorporates the a20-index. This index measures the percentage of predictions that fall within ± 20% of the actual observed values, giving us a clear indication of practical accuracy in engineering contexts. Unlike purely statistical measures, the a20-index directly shows whether the model meets the precision levels that are generally accepted for engineering decision-making. In civil engineering, for instance, predictive models are often evaluated based on their ability to stay within certain error tolerances. Typically, deviations of 15-25% are seen as acceptable for strength predictions and material property estimations. When the a20-index is displayed alongside the complementary aα family (especially a10 and a30), it represents a pragmatic middle ground. The a10 is widely deemed excessively severe, punishing deviations that, in the context of concrete performance, are unlikely to jeopardize workability. In contrast, the a30 is often criticized for being overly lenient, letting predictions seem reliable while masking substantial errors. Thus, the a20-index serves as a simple benchmark for assessing whether our predictions are not just statistically valid but also practically reliable.

For this analysis, the data were split prior to any modeling into a training subset containing 80 percent of the records and a held-out testing subset containing the other 20 percent. The partition occurred during the standard holdout phase; the training subset was solely employed for calibrating model parameters, while the testing subset was never exposed to the modeling process until the final evaluation. This careful separation guarantees that the accuracy scores (most notably the a20-index values illustrated in Fig. 9) serve as unbiased indicators of the models' ability to generalize to new cases. By eliminating the risk of information bleed from the training phase into the testing phase, the holdout design delivers a truthful appraisal of how the model is likely to perform on data it has never encountered.

Fig. 9, every sub-plot compares the predicted CMOD values against the measured CMOD values from the testing set. In this analysis, all features (FV, SF, FL, FD, FA, w/b, a, ST, CA, SP, SD) were considered. DTR returned the lowest performance, registering an a20-index of 0.71; its spread of predictions around the ideal diagonal reveals excessive variance, indicating it largely memorised the training samples rather than captured a generalisable mapping. In contrast, SVR provided a much tighter clustering of points, raising the a20-index to 0.90 and signalling good generalisation across the test set. The best performance came from the NuSVR, which reached an a20-index of 0.92 and showed a noticeably tighter error spread, suggesting an effective trade-off of bias and variance from carefully tuning the ν term. GPR equalled the SVR on the a20-index of 0.90, presenting predictions with a smooth and well-calibrated envelope. Its probabilistic nature adds interpretive power, though its computational demands may limit scalability with larger datasets.

Among the tree-based ensemble techniques, XGBoost achieved an a20-index of 0.81, while the RFR reached 0.83. Both methods exceeded the performance of the DTR but did not catch the leading kernel-based strategies, a difference likely linked to constraints on tree depth and possibly under-optimized hyperparameter settings. GBR exhibited an a20-index of 0.80, a result aligned with XGBoost and suggesting that either the learning rate was kept conservative or the boosting iterations remained shallow. ANN attained an a20-index of 0.89, positioning it just behind the kernel methods, with NuSVR, SVR, and GPR still ahead. The ANN successfully captured nonlinear patterns, though some residual variance persisted, indicating that deeper layers or better-tuned regularization could extract further gains.

TabPFN produced an a20-index of 0.91, ranking just behind NuSVR yet outperforming nearly all other models. This result underlines the effectiveness of automated model selection and representation learning, demonstrating that one can reach high predictive accuracy without the burdens of extensive manual feature engineering.

The final a20-index ranking, arranged from top to bottom, is: NuSVR (0.92), TabPFN (0.91), SVR (0.90), GPR (0.90), ANN (0.89), RFR (0.83), XGBoost (0.81), GBR (0.80), and DTR (0.71). This pattern shows that kernel-based techniques (especially NuSVR, SVR, and GPR) excelled in CMOD prediction, likely because of their capacity to model smooth, nonlinear structure within the data. Although deep learning models remained competitive, their slightly lower performance may relate to dataset size and feature dimensionality. Meanwhile, TabPFN demonstrated strong, automated performance, yet classic tree ensembles consistently trailed the kernel methods.

Further gains are conceivable through a stacking ensemble that synergizes NuSVR, TabPFN, and GPR. If implemented and validated using the original 80/20 holdout strategy, this hybrid model could potentially surpass the 0.92 a20-index limit currently set by NuSVR, yielding even greater predictive power.

Now, the a20-index analysis continues to adhere to the established protocol of allocating 80% of the data for training and 20% for testing, yet adopts a pivotal modification: only the six features deemed most influential (FV, SF, FL, FA, w/b, and a₀) guided both the model training and the evaluation routines. By intentionally reducing dimensionality, we aim to observe how each algorithm responds, given that sensitivity to uninformative or only marginally relevant predictors can vary widely among them. Figure 10 presents the predicted CMOD values plotted against the measured CMOD for all algorithms under these restricted feature conditions, facilitating a direct comparison with the corresponding results illustrated in Fig. 9, which utilized the entire feature set.

The overall impact of reducing the feature set yields inconclusive results. The DTR posted a small accuracy dip, with the a20-index sliding from 0.71 in Fig. 9 to 0.68 in Fig. 10. This pattern suggests that some of the variables dropped from the input still contained information that improved the effectiveness of the decision splits. Ensemble tree algorithms recorded similar trends: the RFR dipped slightly from 0.83 to 0.85 (possibly a small improvement from alleviated overfitting) while the GBR fell more sharply from 0.80 to 0.78, pointing to its reliance on a broader feature set.

Kernel-based methods exhibited more differentiated behavior. The NuSVR held its strong performance constant at an a20-index of 0.92, indicating that its kernel transformation still recovered the relevant signal from the compressed feature set. The SVR slipped from 0.90 to 0.87, and the GPR dropped from 0.90 to 0.85, showing these models derived some utility from the discarded dimensions, even if they were not strictly necessary for prediction accuracy. Boosting methods exhibit mild yet statistically meaningful shifts: XGBoost decreased from 0.81 to 0.80 and the ANN fell from 0.89 to 0.83. The ANN's steeper decline implies its capability to model subtle nonlinearities was more reliant on the expanded feature set offered prior. The striking adjustment comes from the TabPFN. Whereas its prior performance in Fig. 9 reached 0.91, it elevates to 0.93 in Fig. 10, claiming the summit among all contenders in both composites. This regressive gain suggests TabPFN capitalized on the dimensionally pruned input; its architecture and learning-ID are attuned to diminish overfitting when pruned to the salient descriptors.

Across the figures, the most robust models to the contraction are NuSVR and TabPFN, the latter leaping past all preceding markers. This observation implies that in this particular dataset, streamlining to the dominant predictors can enhance performance for select sophisticated algorithms, while the net effect on others can be trivial. Such dynamics advocate for the prospective merit of amalgamating TabPFN and NuSVR in a layered meta-learning schema, capitalizing on their respective advantages when feature selection is fine-tuned.

Table 6 provides a comparison of the predictive performance among the algorithms we evaluated. The TabPFN model stood out with the best point estimates, while GPR, NuSVR, and SVR followed closely behind. The bootstrap 95% CIs are quite narrow, which suggests that the estimates are stable. However, the significant overlap between TabPFN and the next-best models indicates that the practical improvement is only modest.

K-fold cross-validation is a standard practice for checking how well a machine learning model will perform on new data. The complete dataset is divided into K roughly equal-sized groups called folds. During each round, one fold is kept back for testing, while the model is trained on the remaining K-1 folds. This process is repeated K times, ensuring that each fold is used as the test set one time. The evaluation metrics from these K training-testing cycles are then averaged, providing a single score that is less influenced by the random quirks of any individual split. This technique effectively demonstrates the model's ability to generalize across the dataset, since every data point is both trained on and tested against the model throughout the K cycles. K-fold cross-validation helps guard against overfitting by presenting the model with several different training subsets. Rather than memorizing the peculiarities of a single split, the model must identify patterns that are consistent across all the folds. Consequently, the averaged performance reflects a sturdier estimate of the model's predictive power than what a single train-test split could offer.

For our study, we chose a fivefold cross-validation scheme (K = 5) to rigorously assess the machine learning models. The full dataset was partitioned into five equal segments; in each fold, one segment served as the validation set while the other four were merged to create the training set.

Results from evaluating a machine learning model can vary significantly depending on which performance metric is chosen for emphasis. Different metrics reveal distinct dimensions of performance -- some focus on predictive error, others on explained variance, yet others on error robustness. To counter the risk of bias from a single viewpoint, a multi-criteria scoring model is preferable. Here, R, RMSE, and variance accounted for (VAF) were chosen. Each model received a rank for each metric; the one with the best R earned a position of 9 (first among nine competitors), the second-best 8, and so on, down to the lowest, which received 1. Each model thus competed against the others within the same metric framework. No weighting was applied, so the overall rank results from the additive performance of these metric ranks.

The fivefold cross-validation performance, evaluated across all features, is summarised in Table 7. Table 8 shows the fivefold cross-validation results considering the most influential features. The final columns of these tables show the total rank score, which is the simple sum of the ranks earned on each of the chosen metrics and indicates the overall standing of each model. This straightforward evaluation approach prevents any favoritism toward models by relying on a single performance metric.

By closely examining Tables 7 and 8, we see how switching from all available features in Table 6 to the six most influential features (FV, SF, FL, FA, w/b, and a₀) in Table 8 modifies the performance profile of each algorithm during fivefold cross-validation. Overall, the feature-reduced setup yields significant improvements alongside a few minor losses, varying by model. The cumulative ranking score, calculated from the ordered contributions to R, RMSE, and VAF, provides a straightforward metric for evaluating the overall impact of this feature pruning. The most pronounced gain arises with TabPFN. Its ranking score ascends from 131 to 133, preserving its lead among the models. Beyond this numeric advancement, TabPFN achieves R values in Table 8 that peak at 0.942, eclipsing the former maximum of 0.912 recorded in Table 7, while the RMSE in the top folds consistently sinks below 0.072. These results suggest that TabPFN reaps the rewards of feature selection and then exploits the slimmer input set to boost generalization and stability across the cross-validation folds.

NuSVR stays at the front, with its ranking nudging from 102 to 109. The small slide in average R is offset by strong fold-to-fold steadiness, allowing it to still edge past most rivals. Its performance hints that the kernel-based method remains stable when weaker features drop, as long as the key variables still reveal the problem's main nonlinear patterns. GPR shows a nearly identical story, shifting from 111 to 108 with little real difference. The close score tells us that the GPR can comfortably adjust to the pared-down feature set without noticeable harm to performance, though its absolute accuracy is still a notch behind the top scorers. ANN loses ground, its ranking score falling from 74 to 63. The cut in feature variety seems to hamper the network's grasp of intricate interactions, and the widening R spread across folds indicates it is still a data-hungry architecture that thrives on abundance.

Among the tree-based techniques, RFR slides from 51 to 66, and GBR from 54 to 51, indicating moderate R declines and negligible RMSE upticks. The pattern likely stems from decision-tree algorithms leveraging a wider array of predictors, including many weak ones, to refine split rules. In contrast, DTR stays anchored at 15 points across both metrics, reiterating its struggle to grasp the problem's inherent complexity, regardless of feature abundance. XGBoost's score holds steady at 30, with R and RMSE budging only slightly. This stability hints that the method's boosting process, which continually adapts the influence of misclassified instances, offsets the departure of less informative predictors more effectively than the averaging strategy inherent in RFR.

Shifting to the overall order of models, three standout items from the feature selection exercise are:

This side-by-side evaluation clearly indicates that feature selection constitutes a powerful catalyst for improved accuracy in specific advanced architectures, particularly in TabPFN, while exerting little influence or even slight detriments in alternative models. These findings lend considerable weight to the strategy of integrating TabPFN with NuSVR and, potentially, GPR within a meta-stacking arrangement, harnessing the diverse robustness of each method and their aptitude for extracting value from a finely tuned subset of predictor variables.

The comparative analysis illustrated in Fig. 11 quantifies and characterizes how feature selection reshapes the predictive accuracy of several machine learning models over five repeated cross-validation partitions. Each algorithm exhibited a unique sensitivity to the reduction of input dimensions, underscoring the co-evolution of model architecture and the structure of the dataset.

Among the tested methods, the TabPFN architecture attained the strongest validation accuracy before any features were pruned. Once feature selection was performed, it manifested uniform but slight enhancements across R, RMSE, and VAF. Importantly, these gains were reproducible across all folds, suggesting that even transformer models with extensive representational power profit from excising irrelevant and weak predictors. The principal mechanism behind the increased robustness appears to be the removal of collinear features, which alleviates redundancy and streamlines the learning of the mapping from a compact feature space to the target outcome.

After the feature selection step, the ANN model exhibited inconsistent results, most notably a decline in most cross-validation folds. The mean R dropped (e.g., Fold 1: 0.832 → 0.723), RMSE rose (Fold 1: 0.099 → 0.128), and VAF lowered (Fold 1: 0.856 → 0.733). These shifts imply that beneficial variables for capturing the network's nonlinear interactions were eliminated, weakening the model's capacity to resolve intricate dependencies. The outcome suggests that the ANN, with its flexible architecture, thrives on a more extensive feature collection to deliver the best generalization on the given dataset.

In contrast, the tree-based models, namely GBR and XGBoost, yielded only modest and variable shifts following the feature selection. The XGBoost implementation registered a tiny R bump in some folds (e.g., Fold 1: 0.690 → 0.705) while dropping slightly in others. GBR, however, typically recorded a lower R (Fold 3: 0.807 → 0.745) along with a higher RMSE, confirming that the excluded features offered only marginal enhancements in predictive power. RFR exhibited virtually constant performance across all measures, underscoring its resilience to redundant features due to the combined effects of bagging and the deliberate randomness in feature selection.

Kernel methods (GPR, SVR, and NuSVR) produced variable outcomes. GPR, for instance, recorded an across-the-board decline in R following variable pruning (Fold 1 0.884 falling to 0.821), accompanied by a heightened RMSE. This suggests that the pruned feature set limited GPR's capacity to fit the underlying function. SVR and NuSVR mirrored this pattern (Fold 1 R 0.871 dropping to 0.822), with RMSE worsening throughout every split. These results contradict the assumption that kernel methods profit from lowered dimensionality, implying that the eliminated features carried meaningful information rather than mere noise.

DTR delivered uniformly diminishing R (Fold 3 0.656 to 0.581) alongside stagnant RMSE, highlighting its pronounced vulnerability to reduced cue availability. Accuracy metrics confirmed it as the weakest performer, reaffirming the limitations of single-tree strategies in regression problems marked by complexity and noise.

The standout finding is that TabPFN preserved top R, minimal RMSE, and peak VAF postpartum, and in all folds improved (Fold 1 R increased from 0.890 to 0.918, RMSE from 0.088 to 0.072). This consistency and modest performance gain in the face of feature compression signal TabPFN's architecture as adept at leveraging compact, high-quality inputs.

The results show that rather than universally enhancing model precision, feature selection led to small declines in nearly all models aside from TabPFN, which either preserved or boosted accuracy, and XGBoost, which gained minor improvements in a few folds. The persistent predictive power of the trimmed features implies that feature selection ought to be conducted judiciously and always validated against the model in use, lest valuable data be discarded.

Rank-based nonparametric tests are recommended instead of multiple pairwise t-tests when comparing several algorithms over a single set of problems or cross-validation folds. This controls for inflation in the Type I error probability and does not rely on assumptions of normality or equal variance, assumptions frequently violated during algorithm benchmarking. The Friedman test is the major omnibus test for this situation, either all algorithms sharing the same performance distribution or at least one differing significantly. Once there is a significant result from the Friedman test, post-hoc pairwise comparisons are made to ascertain which models differ. The Nemenyi test, and its derivatives such as the Bonferroni-Dunn method, are most frequently used at this stage. A critical difference (CD) diagram nicely summarizes these comparisons by plotting average ranks for the algorithms and connecting models that are not significantly different from one another.

In this study, we used Friedman tests on RMSE values across five folds and obtained , (p = 2.9 × 10⁻), signifying significant differences among the nine methods. Then, post-hoc Nemenyi testing was carried out and the results are summarized in the CD diagram in Fig. 12. The value for CD computed was 5.37, indicating that models with average rank differences less than this threshold cannot be considered at α = 0.05. TabPFN, with the lowest average rank, is identified as the best performer in Fig. 12. However, there is no statistically significant difference in the performance of TabPFN versus NuSVR and GPR, as all three make up one statistical group. Models like DTR, XGBoost, and GBR, all had rankings that were substantially lower, and the average ranking differences were more than the CD. These results show that although TabPFN leads in average performance, many high-grade models, especially NuSVR and GPR, output statistically comparable results.

To assess the statistical significance of performance differences between models, the ranking scores with hypothesis testing was complemented. Model errors (RMSE values across cross-validation folds) were compared pairwise using both paired t-tests and Wilcoxon signed-rank tests, depending on distributional assumptions. Figure 13 shows the pairwise statistical significance of the differences in model performance based on RMSE distributions. TabPFN demonstrated statistically significant improvements over all other models (p < 0.05), highlighting its strength as the top-performing method. On the other hand, the differences among ANN, SVR, NuSVR, GBR, XGBoost, GPR, and DTR were mostly not statistically significant (p > 0.05 in most pairwise comparisons), suggesting that their predictive accuracy overlaps considerably. These findings imply that while TabPFN stands out as the clear winner, we should be cautious when interpreting the rankings of the other models.

As in Fig. 5, a Pearson correlation of 0.85 between FV and CMOD indicates a strong positive correlation with a linear relationship. In the context of fracture testing EN 14,651 or ASTM C1609, it can be interpreted that specimens with higher FV tend to demonstrate larger CMOD values during the post-cracking load phase. This points out statistically that FV is one of the most important factors in post-cracking deformation behavior. From a mechanical viewpoint, fiber bridging effect explains this relationship. After the matrix has cracked, fibers within a FRC begin to apply bridging tensile forces. An increase in FV improves the crack bridging due to increased fibers straddling the crack's surface. This results in better post-crack load transfer and is the mechanism for increased crack-bridging effectiveness, which improves the material's ability to carry load and permits the crack to extend further before the load reduction to zero, increasing CMOD. This is similar to the pull-out mechanics where post-crack energy absorption and residual stress improves with increased FV.

Concrete without fibers will crack, but the concrete will fracture suddenly, resulting in small CMOD values. With the provided higher fiber content, separations will be delayed, and fibers will enable the cracks to open wider due to pull-out. Thus, the FV-CMOD correlation is a consequence of statistical coincidence. In summary, CMOD and FV are strongly and positively linked. This link, which is the mechanics of FRC and fracture theory, was tested in this study using experimental research.

To thoroughly quantify how the newly developed machine learning models react to systematic variations in FV, we designed a dedicated experimental campaign that involved 16 distinct UHPC mixtures, summarized in Table 9. Within the campaign, every mixture component was kept consistent apart from FV, allowing the isolatory impact of this parameter on CMOD to be examined cleanly. The FV value was stepped from 0 to 3% in increments of 0.2%, a span and resolution that replicates the conditions imposed during the model-training phase. To maintain a uniform w/b throughout, the water and binder contents were both modulated upwards by 0.1% for every 0.2% rise in FV, thereby keeping the ratio fixed.

On the computational side, we focused on the six features that earlier analyses had confirmed as the main drivers: FV, SF, FL, FA, w/b, and a₀. By limiting the input to this minimized yet fully representative set, we fed each of the pre-trained models the variations of FV obtained in the experiments. This approach allowed us to juxtapose model predictions directly with the experimentally acquired CMOD results across the precisely controlled FV gradient.

Fig. 14 enables a straightforward juxtaposition of empirical CMOD data against the output of each machine learning model for the complete FV variation. The TabPFN configuration tracks the experimental curves without discernible divergence, maintaining a nearly indistinguishable slope and intercept for the entire FV span. The ANN counterpart displays similarly tight fits -- though it tends to slightly underestimate CMOD beyond FV ~ 2.4%, hinting at a subtle undercapture of fiber-bridging influences at the higher dosage tail.

The SVR, NuSVR, and GPR variants yield nearly identical linear regressions, attaining R values of 0.99 or above. Nevertheless, they consistently lie below the experimental data for all FV points. The identical vertical shifts across the domain signal a structural phenomenon (possibly the result of kernel-selection penalization or insufficient representation of the extreme-FV samples during training) rather than merely random scatter. Ensemble frameworks (GBR, RFR, and XGBoost) typically track the experimental trend with certain localized departures. GBR provides solid accuracy and limited fluctuation, while RFR shows a gentle saturation beyond FV ≈ 2.0%, resulting in underestimation at the upper tail. XGBoost, despite accurately reflecting the global trajectory, presents more visible oscillations in the mid-range (FV = 0.8%-2.2%), suggesting a heightened sensitivity to the local variance present in the training set. The DTR model registers sharp local oscillations that contrast with the expected experimental smoothness, especially within the low to mid FV range. This behavior aligns with the known overfitting risk in single-tree formulations, which lack the level of smoothing and generalization that ensemble and kernel methods achieve.

The majority of models underreport CMOD at FV values exceeding 2.4%. This recurring underprediction emphasizes the struggle encountered when algorithms seek to extrapolate the nonlinear effects of fiber-bridging at elevated reinforcement levels, where mechanisms hindering crack propagation increasingly dominate.

The thoughtful design outlined in Table 9, along with meticulous feature selection, established a robust and unbiased foundation for exploring how FV affects CMOD predictions within the model framework. Of all methods tested, TabPFN and ANN delivered the clearest reproduction of the experimental CMOD-FV curve, showing very small bias and consistent performance. Kernel methods and ensemble strategies adequately followed the broad trajectory but revealed persistent offsets or concentrated errors, and the DTR proved markedly volatile. These observations highlight the critical role of sophisticated architectures (especially probabilistic foundation networks and deep neural networks) in accurately tracing CMOD changes as FV varies in ultrahigh-performance concrete.

Complex machine learning models need interpretability, especially in engineering applications. SHAP offers a theoretically solid framework that's transparent and rigorous. SHAP builds on Shapley values from cooperative game theory, treating each feature as a "player" in a predictive "game" where the prediction becomes the "payout." Each feature's contribution gets quantified as its average marginal effect on model output across all possible feature coalitions. This approach guarantees two important properties: local accuracy (SHAP values for any instance sum to the model prediction) and consistency (if a model changes so a feature's marginal contribution increases, its SHAP value can't decrease).

Recent civil engineering studies show SHAP works well for understanding input variables in predicting mechanical and durability properties of concretes. Examples include electrical resistivity of fiber-reinforced coral aggregate concrete and compressive strength of coral aggregate concrete. These studies demonstrate that SHAP can connect data-driven predictions with physical understanding, making it well-suited for cementitious composite applications.

Here, SHAP was applied to the developed machine learning models to quantify each input parameter's contribution to predicted CMOD. Unlike conventional feature importance measures, SHAP breaks down predictions into exact feature contributions at both global (overall trends) and local (individual specimens) levels. This provides trustworthy interpretation, identifies the main drivers of fracture behavior, and reveals how their relationships influence predictions across different specimens. SHAP connects data-driven forecasts to mechanistic fracture processes, offering physically meaningful model validation.

Because the TabPFN model consistently delivered higher predictive accuracy and better generalization than other machine learning methods, the SHAP analysis was restricted to that architecture for a thorough examination of the reasoning behind its predictions.

Figure 15 shows SHAP force plots for three selected samples (A, B, and C), detailing how six important features (FA, SF, FV, FL, a₀, and w/b) each affect the forecasted CMOD values. In the plots, red bars represent features that raise the prediction above the baseline, while blue bars represent features that lower it. The length of a bar shows the size of the effect, indicating how strongly that feature influences the model for the particular sample.

Sample A has a predicted CMOD of 0.72 mm. The strongest influences come from SF at 0.983% and FL at 1.0 mm. SF decreases the CMOD, reflecting its success in limiting post-crack deformation and thus its contribution to stronger matrix confinement. At the same time, FL, FV, and a₀ raise the prediction: the full FL of 1 mm aids in bridging the crack, and the FV of 0.535% adds to that effect. FA and a₀ also slightly increase CMOD. The final CMOD prediction of roughly 0.7 mm shows how fiber arrangement and the geometry of the specimen work together.

In Sample B, the predicted CMOD is 0.60 mm. Here, FL (0.5 mm) and a₀ (0.879 mm) rank as the crucial parameters, while FV and FA exert moderate positive influences. SF consistently shows a negative role. The relatively short FL = 0.5 mm limits the bridging action, and thus lowers the CMOD, whereas the larger a₀ pushes the predicted displacement upward. A modest SF (0.304%) weakens confinement, which only partly offsets the benefits of the other features. This example shows how the interplay of geometric and fiber characteristics shapes the post-cracking behaviour.

For Sample C, the predicted CMOD is 0.58 mm. FV (0.58%) and FA (0.720%) make the most substantial positive contributions, while SF (0.411%) actively detracts. FL and a₀ still help, but their effects come in at smaller magnitudes. The considerable FV boosts load transfer across the crack, accounting for its leading positive role. A moderate FA further encourages CMOD growth, and SF, while stiffening the matrix, keeps displacement in check. This scenario underscores that FV is vital for dictating post-cracking deformation, especially when FL is at its upper limit (1 mm).

Across all three experiments, FV and FL of the fibers consistently emerge as strong positive drivers of the crack-bridging performance, backing earlier lab results that underscored their decisive roles. In contrast, SF tends to exert a negative pull, which hints that too much spacing or too little fiber interaction can hinder the capacity to resist further crack widening. Examination of the SHAP force visualizations uncovers fine, sample-responsive dependencies, indicating that the relevance of each feature can shift even for the same model, dictated by the input feature subsets. Crucially, the TabPFN architecture captures these intertwined, non-linear relationships: the varying roles of the factors FV, FL, a₀, and SF across the three experiments track closely with the core tenets of fracture mechanics.

Taken together, the local SHAP diagnostics yield a mechanistic lens on how each variable steers the CMOD for UHPC. The fiber indices (especially FV and FL), emerge as the primary drivers of crack-bridging capacity, while aggregate gradation and spacing terms adjust the intensity of that response. Beyond reinforcing the TabPFN model's explanatory power, these interpretations provide a systematic basis for refining fiber-reinforced concrete designs, leveraging transparent, evidence-based guidance grounded in the experimental data.

Expanding on the localized SHAP force plots, the global sensitivity ranking presented in Fig. 16 consolidates the influence of the six input variables onto a single axis. The analysis shows that FV significantly influences the model output, boasting a mean SHAP value of about 0.17, which is considerably higher than the impact of other features. FL takes the second position and yields a comparatively lower, yet still significant, impact. The contributions of SF and w/b are of moderate magnitude, effectively coupling mechanical and microstructural responses. Minimal yet positive SHAP values are attributed to FA and a₀, verifying their marginal effect on the computed CMOD within the LabPFN architecture. The obtained ordering aligns precisely with fracture mechanics doctrines, whereby the spatial density of the fibers controls the area available for bridging, the length influences the kinetics of pull-out processes, and spacing together with the w/b ratio mediate the confinement of the cement matrix and, consequently, its toughness.

The SHAP heatmap in Fig. 17 deepens our understanding of feature contributions at the instance level. The FV feature displays a clear monotonic behavior: when FV is low (blue), CMOD predictions drop, whereas at higher FV values (red), CMOD predictions rise sharply. This observation is consistent with tests showing that increased FV allows cracks to remain open longer without immediate coalescence, thereby permitting greater openings to develop. The FL feature, conversely, presents alternating zones of red and blue, suggesting effects that vary with the case and confirming the well-documented balance between fiber bridging and pull-out resistance provided by FL. Variables SF, w/b, and FA exert moderate influences, with patterns scattered across instances, indicating their conditional relevance that depends on overall mixture design. Finally, a₀ contributes little overall, though isolated cases reveal localized impacts on CMOD.

The joint evaluation of Figs. 16 and 17 indicates that FV and FL dominate CMOD predictions in fiber-reinforced UHPC, whereas SF and w/b remain subordinate yet structurally relevant ancillary variables. The LabPFN architecture accurately encodes these mechanistic links and, crucially, resolves the subtle interactions among the ensemble of mixture parameters. The consistent trends seen at both the population level and individual instances back up the model's statistical reliability, laying the groundwork for data-driven optimization of FR-UHPC.

Figure 18 presents a global SHAP summary plot in which each dot represents an individual prediction. The horizontal position stands for SHAP value (both size and sign of influence on CMOD), and color conveys actual feature value (blue for low and red for high). The layout thus permits a simultaneous view of feature significance, influence direction, and variation inside the same feature.

As before, FV remains the leading source of influence. High FV instances (red dots) congregate on the positive x-axis, indicating a substantial positive effect on CMOD predictions. Conversely, low FV cases (blue dots) cluster on the negative x-axis, suppressing CMOD outputs. The consistent separation between red and blue exemplifies the expected mechanism: increased FV improves crack bridging, leading to larger crack openings and consequently elevating CMOD.

FL exhibits a contrasting distribution. Both high and low FL values cluster close to zero, occasionally displaying slight negative SHAP values, which signals that length effect on CMOD varies with context rather than follows a strict trend. This variability can be interpreted in terms of the dual role that longer fibers can play: while their bridging capacity tends to reduce crack propagation, a weak matrix bond may provoke pull-out, thus countering the potential benefit and yielding smaller or even negative contributions in some cases.

SF demonstrates a scattered centroid around zero, favoring negative effects from small interactions while wider (red) values, though sparse, incline toward small positive effects. This indicates that tighter spacing curtails crack advance by enhancing bridging, whereas expanded distances diminish reinforcement action and elevate CMOD values.

The w/b parameter is distributed more or less evenly about zero, though the excess water side (red) shifts the prediction lower. This behavior matches the mechanistic insight that surplus water dilutes the binder matrix, weakens interfacial attachment at the fiber-matrix junction, and subsequently allows wider crack openings.

FA shows comparatively modest spread around the zero mean, and samples with both small and large aspect ratios yield minor away from zero shifts. Hence, FA is comparatively subordinate relative to FV and FL, but the spread indicates that pull-out efficiency can be within reach of influencing behavior under targeted design regimes.

Lastly, a₀ induces a small but uniformly negative contribution in the higher error end (red): deeper notches reduce CMOD predictions, probably by augmenting crack-tip angular stress orientation and lessening reinforcement effects at the bridging segment.

In Fig. 19 the SHAP dependency and interaction plots produced by the LabPFN model elucidate the role of FV and its coupling with other key variables (SF, FA, FL, a₀, and w/b) on the modelled CMOD outcome. Each subplot quantifies the change in SHAP value for FV against its absolute value, while a second feature introduces a graded interaction, represented by the background hue.

Fig. 19(a) plots FV's SHAP value against the measured FV magnitude, with color serving as a marker for SF. The scatter reveals a near-uniform linear ascent, indicating that increments in FV consistently drive the SHAP value higher; the robust rising slope conclusively affirms that FV exerts a predominant beneficial adjustment to CMOD. Superimposed color indicates that larger SF values compress the slope, thereby suggesting that more closely spaced or more abundant fibers temper the magnitude of the core FV effect without negating its direction.

Fig. 19(b) retains the FV axis and now overlays FA as the conditional feature. The same upward trade-line accompanies a colorized gradient that deepens as FA increases, thereby reinforcing the FV dividend. The change in color directly infers that higher aggregate fractions complement fiber content, magnifying the negative incremental linear through a combined influence that ultimately appears to fortify fracture energy dispersion and post-failure dilation.

In Fig. 19(c) we compare FV to FL. Once again the relationship is linear; SHAP values rise steadily with increasing FV. The accompanying color scale reveals that longer fibers, corresponding to higher FL, intensify the beneficial response to FV. This observation agrees with fracture mechanics where longer fibers provide a more effective bridging mechanism, increasing the degree of crack deflection and absorbing more dissipated energy. The combined effect of increasing FV and FL illustrates a synergistic enhancement of toughness in the post-cracking stage.

Fig. 19(d) correlates FV with a₀. The linearity is retained, and FV still exerts the principal positive influence. The color gradient, however, identifies larger SHAP values as a₀ increases, signifying that the beneficial effect of fiber addition is accentuated in specimens already pre-cracked. This reinforced response arises because the same fiber distribution now encounters a wider opening, and fibers thus play a decisive role in restraining further extension of the crack and inhibiting unstable crack growth.

The analysis in Fig. 19(e) corroborates the role of FV when examined alongside the w/b. Although the sportype remains linear, the observed curvature indicates a distinct nonlinear coupling. Low FV contexts exhibit negative or nearly neutral SHAP values, shifting sharply to dominant positive influence under higher FV. Furthermore, the color gradient establishes that elevated w/b ratios scale up the fissural strength of FV, clarifying that in less compact, more permeable regions, supplementary fiber is requisite for efficient micro-crack control.

Collectively the SHAP dependency and interaction illustrations consistently relegate FV as the foremost positive driver of CMOD across all tested couplings. Nevertheless, the strength of that advantage is systematically modulated by additional constituents (SF, FA, FL, a₀, and w/b) whose effects, though secondary, remain significant. The integrated findings validate the primacy of FV and FL in generating crack-bridging force, while the mixture of aggregate type (FA, w/b), pre-existing micro-geometry (a₀), and SF govern overall reinforcement performance. This thorough interpretability examination supplies compelling evidence that the LabPFN framework not only assimilates linear relationships but also accurately captures the nonlinear and interaction rules prescribed by failure mechanics, thereby furnishing a physically consistent and data-validated reference for fine-tuning fiber-reinforced UHPC formulations.

The SHAP analysis not only sheds light on how model predictions work but also pinpoints the most crucial parameters for managing cracks in UHPC. To make these findings more useful, we translated the key features identified by SHAP into practical advice for mix design, which you can find summarized in Table 10. The analysis shows that fiber-related factors are the main drivers for controlling CMOD. It turns out that increasing the FV has the most significant impact, enhancing crack bridging and toughness after cracking. However, using too much fiber can make the mix less workable and increase the chances of fiber clumping, which is why incorporating admixtures like superplasticizers and adding fibers in stages is essential for practical use.

Additionally, FL and aspect ratio play a vital role, especially when it comes to bridging larger cracks. While longer fibers boost toughness, they can also make the mix harder to work with, suggesting that a combination of short and long fibers in hybrid systems might strike a better balance. The SHAP results also emphasize the significance of the bond properties between the fibers and the matrix. A strong bond improves load transfer, but if the bond is too strong, it can lead to brittle pull-out or fiber breakage. Therefore, selecting the right surface treatments and coatings is crucial to encourage stable, energy-absorbing pull-out behavior.

Beyond the fibers, the properties of the matrix like the w/b ratio, fines content, and the addition of SF also play a role in cracking, mainly through shrinkage and stress development. Lowering the w/b ratio and optimizing fines can help reduce shrinkage, but these changes need to be paired with suitable admixtures to keep the mix workable. Likewise, proper curing and early-age practices are vital: extended moist curing or using curing membranes can significantly cut down on shrinkage-induced cracking, which in turn boosts the effectiveness of the fibers.

These findings show that while SHAP effectively points out the most significant predictors from a statistical standpoint, their true engineering value comes from helping us navigate practical trade-offs. In Table 10, a clear summary of these implications, connecting each key parameter to its mechanistic impact, possible limitations, and suggested engineering actions can be find.

Table 11 presents the average absolute SHAP values along with their standard deviations for all samples, offering a numerical complement to the visual SHAP plots. The findings clearly indicate that the FV is the most significant predictor of CMOD, boasting a mean SHAP value of 0.142 ± 0.088, which is notably higher than any other feature. This prominence underscores the crucial role that fiber dosage plays in influencing crack bridging and post-cracking behavior in UHPC. Following FV, the next key contributors are FL at 0.036 ± 0.033 and SF at 0.030 ± 0.022, both of which impact the refinement of microstructure and the interactions between the matrix and fibers. Other features like CA, w/b, FD, and FA seem to have a secondary effect, each showing mean SHAP values in the range of 0.019 to 0.026. On the lower end of the influence spectrum, we find predictors such as ST, SP, and SD, all of which contribute SHAP values below 0.01. While these factors do help fine-tune the model's output, their impact pales in comparison to the dominant influence of FV.

In summary, the quantitative SHAP analysis reveals a clear ranking of feature importance that aligns well with the principles of fracture mechanics: fiber dosage stands out as the primary factor, followed by fiber geometry and binder composition, with other mix parameters having a minimal effect. This hierarchy remains consistent across different folds and is in agreement with previous experimental results.

Uncertainty quantification plays a pivotal role in enhancing both the reliability and the practical deployment of machine learning approaches in engineering and materials science. Conventional models produce single-valued responses, yet they often omit any indication of the associated confidence or dispersion. In critical fields like fracture mechanics and structural longevity, relying solely on these deterministic figures can expose designers to undue risk or unrecognized failure pathways. Uncertainty quantification remedies this by introducing statistically valid confidence bounds around the outputs, thereby upgrading predictions from mere accuracy to actionable trustworthiness. By systematically quantifying the uncertainty in predictions, machine learning frameworks gain a level of interpretability that is indispensable in risk-governed environments, ensuring that probabilistic safety margins and variations in material behavior are appropriately integrated.

Figure 20 illustrates the diagnostic evaluation of the TabPFN architecture specifically tuned to estimate the CMOD in concrete samples. The TabPFN's predictions (depicted as an orange line) are juxtaposed against experimentally acquired values (shown in blue), with bootstrapped 95% CIs cast in light gray bands. This incorporation of uncertainty quantification transitions the evaluation from accuracy alone to a dual exposition of both predictive capability and reliability, clearly illuminating statistical dispersion alongside the mean estimate.

Evaluating prediction accuracy, the TabPFN model performs impressively, with predicted values tracking measured CMOD data very closely. It follows the data's fluctuations and nonlinear trends, showing that the model effectively learns the underlying physics. The overlap between the predicted and observed values across nearly all test samples underscores the model's broad generalization ability. While a few large deviations appear at the extremes, the strong overall alignment suggests that the model consistently manages the noisy and nonlinear properties of the CMOD measurements.

Bootstrapped 95% CIs shed light on the model's reliability. The narrow confidence bands observed throughout most test samples signal high stability and strong confidence in the point predictions. Wider bands at select points, however, indicate regions of elevated uncertainty, where data variability or model constraints exert more influence. Crucially, the true measurements lie within the confidence bands in nearly every instance, confirming that the uncertainty estimates are both realistic and well calibrated. Bootstrapping strengthens model robustness by repeatedly resampling the training dataset, thus better capturing the variability intrinsic to the prediction task. This method guarantees that the resulting CIs reflect both the uncertainty introduced by the model and the variability of the data itself (an essential capability in engineering fields where minute errors can compromise safety or the longevity of structures). Consequently, the TabPFN framework means that, in addition to delivering precise predictions, it offers a clear, uncertainty-aware output that enhances its trustworthiness for real-world applications.

From a scientific viewpoint, the addition of uncertainty quantification makes TabPFN both more interpretable and more relevant to engineering practice. In structural engineering applications involving fiber-reinforced UHPC, CMOD serves as a critical indicator of fracture toughness and overall serviceability. Being able to furnish not only precise CMOD forecasts but also reliable uncertainty bands is vital for verifying safety margins and for the ongoing refinement of design standards. In contrast to techniques like SVR or RFR, TabPFN uses prior domain knowledge along with a Bayesian-inspired framework, enabling it to perform competently with limited training data while still quantifying uncertainty with confidence.

Figure 21 offers a direct comparison between the widths of bootstrap CIs and the experimental scatter observed in CMOD measurements. For models like TabPFN, ANN, NuSVR, SVR, and XGBoost, the bootstrap CI widths (ranging from 9 to 12%) closely match the experimental scatter (between 9 and 10%). This suggests that the statistical resampling method effectively captures the variability seen in experiments. The alignment is particularly notable for TabPFN, where both metrics are identical at 9%, showcasing a strong agreement between the uncertainty derived from the model and the variability found in physical measurements. On the other hand, tree-based models such as GBR, RFR, GPR, and DTR show bootstrap CI widths (14% to 18%) that are considerably larger than the experimental scatter (10% to 12%). This indicates that these models are more influenced by how the training data is divided, resulting in predictive uncertainties that surpass what is typically observed in repeated physical tests. Overall, these results underscore that while bootstrap-based uncertainty estimation can align well with experimental scatter for certain types of models, it tends to provide more conservative (wider) bounds for others. This highlights the importance of interpreting both statistical and experimental perspectives when evaluating model reliability.

Rapid Reads News

Machine learning-based prediction of crack mouth opening displacement in ultra-high-performance concrete - Scientific Reports

POPULAR CATEGORY

misc

entertainment

corporate

research

wellness

athletics