Example Result


1. The optimal model




All-model-result.csv
This file contains detailed performance metrics for all models.
● train, cv, test: Training set, cross-validation set, and test set, respectively.
● mcc: Matthews Correlation Coefficient.
● support: The number of samples.
MissingValueProcessing Standarization Selection Modeling model_ordering train_sensitivity train_f1 train_specificity train_auc train_mcc train_accuracy train_support cv_sensitivity cv_f1 cv_specificity cv_auc cv_mcc cv_accuracy cv_support test_sensitivity test_f1 test_specificity test_auc test_mcc test_accuracy test_support
0 Mean StandardScaler Ensemble LogisticRegression 0 1.0 1.0 1.0 1.0 1.0 1.0 455 0.9789473684210526 0.9738812103030924 0.9470588235294116 0.9702786377708978 0.9295621218707822 0.9670329670329672 91.0 0.9861111111111112 0.9726027397260274 0.9285714285714286 0.95734126984127 0.9245181185940876 0.9649122807017544 114
1 Mean StandardScaler Ensemble ElasticLogit 8 0.9824561403508772 0.9824561403508772 0.9705882352941176 0.9978947368421052 0.9530443756449948 0.978021978021978 455 0.9789473684210526 0.9789443978190556 0.9647058823529412 0.9961816305469556 0.9441732684993444 0.9736263736263736 91.0 0.9861111111111112 0.9793103448275862 0.9523809523809524 0.988425925925926 0.9433397594898876 0.9736842105263158 114
2 Mean StandardScaler Ensemble RandomForest 10 0.9894736842105264 0.9877408056042032 0.976470588235294 0.9986790505675954 0.9671026949575336 0.9846153846153848 455 0.9543859649122808 0.9610888755169956 0.9470588235294116 0.9902992776057792 0.8977253857835974 0.9516483516483516 91.0 0.9861111111111112 0.9861111111111112 0.9761904761904762 0.99239417989418 0.9623015873015872 0.9824561403508772 114
3 Mean StandardScaler Ensemble Support Vector Machine(SVM) 8 0.9824561403508772 0.9824561403508772 0.9705882352941176 0.99890608875129 0.9530443756449948 0.978021978021978 455 0.9859649122807016 0.9808927906007444 0.9588235294117646 0.9973168214654284 0.9489754912939038 0.975824175824176 91.0 0.9861111111111112 0.9793103448275862 0.9523809523809524 0.988095238095238 0.9433397594898876 0.9736842105263158 114
4 Mean StandardScaler Ensemble XGBoost 7 1.0 0.9982486865148862 0.9941176470588236 1.0 0.9953098568937948 0.9978021978021978 455 0.9578947368421052 0.9663672447656166 0.9588235294117646 0.9927760577915375 0.9120484041752498 0.9582417582417582 91.0 1.0 0.993103448275862 0.9761904761904762 0.992063492063492 0.9812328999345132 0.9912280701754386 114
5 Mean StandardScaler Ensemble LightGBM 7 1.0 1.0 1.0 1.0 1.0 1.0 455 0.9614035087719298 0.9664321201233173 0.9529411764705882 0.993498452012384 0.9114341893931424 0.9582417582417582 91.0 1.0 0.9863013698630136 0.9523809523809524 0.9791666666666666 0.962621902223779 0.9824561403508772 114
6 Mean StandardScaler Ensemble AdaBoost 6 1.0 1.0 1.0 1.0 1.0 1.0 455 0.9789473684210526 0.97720474133151 0.9588235294117646 0.9920536635706916 0.9396125985535256 0.9714285714285712 91.0 1.0 0.9863013698630136 0.9523809523809524 0.9854497354497356 0.962621902223779 0.9824561403508772 114
7 Mean StandardScaler Ensemble DecisionTree 5 0.9192982456140352 0.9527272727272728 0.9823529411764704 0.9907946336429309 0.8844753054423304 0.9428571428571428 455 0.8947368421052632 0.9236194778672656 0.9294117647058824 0.9594943240454076 0.8120972401052647 0.9076923076923076 91.0 0.9027777777777778 0.9420289855072465 0.9761904761904762 0.980654761904762 0.858759385846922 0.9298245614035088 114
features.csv
● These are the feature columns selected by different feature selection methods.
● The earlier a feature appears, the more important it is in that feature selection method.
DT_score_c45 RandomForest_gini LassoLars multi_Lasso SVM
radius3 0.9462138563710176 2.806396967481542 0.9433843070512326 -0.4752190560911419 0.2125168422429794
concave_points3 0.6348524547499099 0.6190475127471899 2.0433037634653486 1.0478461375689012 -0.9971233386334518
perimeter3 1.157919364488023 -0.2161239327723453 1.4936844969948433 0.2867930633704256 -0.3418100042954625
area3 1.108379881075629 0.6569065511072884 nan 0.541911988183617 0.0697113408868436
perimeter1 -0.0319553685639685 -0.00006850463470163166 nan 1.0439241504285617 nan
area1 0.0102207497203871 -0.3466821946424107 nan 1.3178009216929685 nan
concavity1 -1.0294849953250471 0.4695027669065759 nan -0.2068832751035996 1.5802308963292075
fractal_dimension2 nan nan nan nan 0.6503240379417228
radius1 -0.189922846236511 -0.5740952348523398 -0.730262979509028 1.3178009216929685 nan
symmetry1 nan nan nan -0.3566352773752282 nan
concave_points1 0.2454136529647115 0.2207120698229523 0.4873284870098482 -1.1066974323379015 -0.2945788134442916
concave_points2 nan nan -0.7488613427945731 nan nan
texture3 nan -0.9543722223125854 -0.0639063115983443 -1.8699939099710825 2.08914789955742
area2 -0.7637578171079085 -0.3364553737845795 nan nan 0.1825945529919905
smoothness1 nan -1.107953377083674 nan 0.1264841862628401 nan
texture2 nan nan -0.7633072192783248 nan -0.6039313484491982
smoothness2 nan nan -0.6674558388837003 nan -1.4250387182112492
fractal_dimension3 nan nan -0.6945485213585393 -1.2967707655971557 -0.1857001401707758
compactness2 nan -1.105943411731276 -0.7082171353369598 nan -0.9363432067457306
concavity3 -2.087878932136224 -0.1308716162516349 -0.5911417057618045 -0.370361652724176 nan

feature_importance.html

View in full screen

shap_plot.html

View in full screen
model_parameter.csv
● This file contains the parameters of the best-performing classification model.
model parameter result
SVM Kernel Type linear
SVM Inverse Regularization Strength (C) 0.4111755686415492
optimal-model-scores.csv
● support: The number of samples
● macro avg: The average of metrics across all classes, calculated as (class1 + class2)/2
● weighteed avg: The weighted average of metrics across all classes, calculated as(class1 + class2) / (support(class1) + support(class2))
precision recall f1-score support
B 0.9594594594594594 0.9861111111111112 0.9726027397260274 72.0
M 0.975 0.9285714285714286 0.9512195121951219 42.0
macro avg 0.9672297297297296 0.9573412698412699 0.9619111259605746 114.0
weighted avg 0.9651849217638692 0.9649122807017544 0.9647247085304307 114.0
accuracy 0.9649122807017544 114.0
sensitivity 0.9861111111111112 72.0
specificity 0.9285714285714286 72.0

2. Data Overview



Correlation Heatmap.png
Image

The correlation heatmap visualizes correlation coefficients, which measure the linear relationship between two variables. The range of correlation coefficients is from -1 to 1:
Closer to 1: Indicates a strong positive correlation. ● Closer to 0: Indicates no linear relationship. ● Closer to -1: Indicates a strong negative correlation.

3. Dimensionality Reduction



PCA.png
Image

● pc_1, pc_2, pc_3, pc_4 represent different principal components. ● The diagonal plots show the distribution of data for individual components. ● Scatter plots display the relationships between two components (e.g., the top-left plot shows the relationship between pc_1 and pc_2).
● More concentrated distributions indicate components with lower variance. ● More dispersed distributions indicate components capturing higher variance.

View in full screen

PLS.png
Image

● PLS component 1 and PLS component 2 are the primary components derived from PLS (Partial Least Squares) analysis.
● PLS component 1 captures the most critical variance for classification, while PLS component 2 provides supplementary information to enhance classification.

View in full screen

UMAP.png
Image

● If the colored clusters are distinct and well-separated (e.g., forming clearly defined blocks), it indicates that UMAP has successfully preserved the high-dimensional clustering structure in the lower-dimensional space.
● UMAP prioritizes preserving local proximity in the data. Points that are close in the plot are likely to share similar features in the high-dimensional space.
● Points that are far apart generally represent significant differences in features.

View in full screen