Example Result


1. The optimal model




All-model-result.csv
This file contains detailed performance metrics for all models.
● train, cv, test: Training set, cross-validation set, and test set, respectively.
● mcc: Matthews Correlation Coefficient.
● support: The number of samples.
MissingValueProcessing Standarization Selection Modeling train_sensitivity train_f1 train_specificity train_auc train_mcc train_accuracy train_support cv_sensitivity cv_f1 cv_specificity cv_auc cv_mcc cv_accuracy cv_support test_sensitivity test_f1 test_specificity test_auc test_mcc test_accuracy test_support
0 Mean StandardScaler Ensemble LogisticRegression 1.0 1.0 1.0 1.0 1.0 1.0 455 0.982456140350877 0.9773581350917462 0.9529411764705882 0.9745098039215684 0.9392219149103654 0.9714285714285716 91.0 0.9861111111111112 0.9793103448275862 0.9523809523809524 0.9758597883597884 0.9433397594898876 0.9736842105263158 114
1 Mean StandardScaler Ensemble ElasticLogit 0.9894736842105264 0.986013986013986 0.9705882352941176 0.9990505675954592 0.9623887672055268 0.9824175824175824 455 0.9859649122807016 0.9825155563811014 0.9647058823529412 0.9957688338493292 0.9533559450619948 0.978021978021978 91.0 0.9722222222222222 0.9722222222222222 0.9523809523809524 0.988095238095238 0.9246031746031746 0.9649122807017544 114
2 Mean StandardScaler Ensemble RandomForest 0.9789473684210528 0.984126984126984 0.9823529411764704 0.9991124871001033 0.957984245918847 0.9802197802197802 455 0.9578947368421054 0.9629194655054804 0.9470588235294116 0.9890092879256964 0.9023247036261156 0.9538461538461538 91.0 0.9583333333333334 0.971830985915493 0.9761904761904762 0.992063492063492 0.9259891667402068 0.9649122807017544 114
3 Mean StandardScaler Ensemble Support Vector Machine(SVM) 0.9824561403508772 0.984182776801406 0.976470588235294 0.9987616099071208 0.9578005664494464 0.9802197802197802 455 0.9789473684210526 0.980685418818304 0.9705882352941176 0.9966976264189886 0.9485312720391456 0.9758241758241756 91.0 1.0 0.9863013698630136 0.9523809523809524 0.986111111111111 0.962621902223779 0.9824561403508772 114
4 Mean StandardScaler Ensemble XGBoost 0.9929824561403509 0.9947275922671354 0.9941176470588236 0.9999380804953562 0.9859408542928276 0.9934065934065934 455 0.9543859649122808 0.9627533525597356 0.9529411764705882 0.9920536635706914 0.903429848725782 0.9538461538461538 91.0 1.0 0.9863013698630136 0.9523809523809524 0.9917328042328042 0.962621902223779 0.9824561403508772 114
5 Mean StandardScaler Ensemble LightGBM 1.0 1.0 1.0 1.0 1.0 1.0 455 0.9719298245614034 0.9719290145332552 0.9529411764705882 0.9919504643962848 0.924972538208797 0.9648351648351648 91.0 1.0 0.9863013698630136 0.9523809523809524 0.992063492063492 0.962621902223779 0.9824561403508772 114
6 Mean StandardScaler Ensemble AdaBoost 1.0 1.0 1.0 1.0 1.0 1.0 455 0.9789473684210526 0.9755236037742152 0.9529411764705884 0.9951496388028896 0.9350053911151622 0.9692307692307692 91.0 0.9861111111111112 0.9793103448275862 0.9523809523809524 0.9904100529100528 0.9433397594898876 0.9736842105263158 114
7 Mean StandardScaler Ensemble DecisionTree 0.936842105263158 0.9451327433628318 0.923529411764706 0.9869762641898864 0.8555292751024219 0.931868131868132 455 0.887719298245614 0.9098900796795212 0.8941176470588236 0.9629514963880288 0.7737487974271353 0.8901098901098902 91.0 0.9583333333333334 0.9517241379310344 0.9047619047619048 0.978670634920635 0.8675534786006366 0.9385964912280702 114
features.csv
● These are the feature columns selected by different feature selection methods.
● The earlier a feature appears, the more important it is in that feature selection method.
DT_score_c45 RandomForest_gini LassoLars multi_Lasso SVM
concave_points3 0.8828167374411843 0.357923605513389 1.0 1.0 0.2219054228564644
radius3 0.957453440363296 1.0 0.5446949706969855 0.4261199973934185 0.4877951267999562
perimeter3 1.0 0.3145010231032062 0.675714633556796 0.591261077821637 0.3315275684824252
concavity1 0.5607267550410223 0.2555495841195829 nan 0.7437220862669903 1.0
area3 0.9999449404977488 0.2790686411950271 nan 0.8315525447256111 0.4386243526469404
concave_points1 0.7544990950932068 0.486659465756002 0.3299291828461251 0.1554228098830404 0.4470074264739125
perimeter1 0.8319863435519131 0.2386422650392634 nan 1.0 nan
radius1 0.7830498605494521 0.0874500204596779 nan 1.0 nan
area1 0.6820071651194491 0.1149889113209174 nan 1.0 nan
area2 0.7781433865588415 0.2293096625262687 nan nan 0.5951119626940901
texture3 0.3203053196139422 0.0009825565176997 0.1398572964362158 nan 0.8632809592607773
concavity3 0.4316868639886629 0.1966655445507771 0.0309058937991895 0.2993126230390918 nan
fractal_dimension2 nan nan nan nan 0.7072479247610157
fractal_dimension3 nan nan 0.0156667398266548 0.2118782941922499 0.4504573121129444
texture2 nan nan nan nan 0.3988497286054553
perimeter2 0.3388245229483302 nan nan nan nan
smoothness2 nan nan 0.0195428603992855 nan 0.1930372131224798
compactness2 nan nan 0.0055233166595647 nan 0.1958287112242678
symmetry1 nan nan nan 0.0363678687106132 nan
fractal_dimension1 nan 0.0 0.0294091824545277 nan nan

feature_importance.html

View in full screen
optimal-model-scores.csv
● support: the number of samples
● macro avg: The average of metrics across all classes, calculated as (class1 + class2)/2
● weighteed avg: The weighted average of metrics across all classes, calculated as(class1 + class2) / (support(class1) + support(class2))
precision recall f1-score support
B 0.972972972972973 1.0 0.9863013698630136 72.0
M 1.0 0.9523809523809523 0.975609756097561 42.0
macro avg 0.9864864864864865 0.9761904761904762 0.9809555629802873 114.0
weighted avg 0.9829302987197724 0.9824561403508771 0.9823623542652152 114.0
accuracy 0.9824561403508771 114.0
sensitivity 1.0 72.0
specificity 0.9523809523809523 72.0
Confusion Matrix.png
Image
● A higher number of true positive and true negative results is better. ● True positive result: the actual classification is positive and the predicted classification is positive.
False negative result: the actual classification is positive and the predicted classification is negative.
False positive result: the actual classification is negative and the predicted classification is positive.
True negative result: the actual classification is negative and the predicted classification is negative.
ROC Curve.png
Image
A ROC curve that is closer to the upper-left corner and has a larger area under the curve (AUC) indicates better model performance.

2. Data Overview



Correlation Heatmap plot.png
Image

The correlation heatmap visualizes correlation coefficients, which measure the linear relationship between two variables. The range of correlation coefficients is from -1 to 1:
Closer to 1: Indicates a strong positive correlation. ● Closer to 0: Indicates no linear relationship. ● Closer to -1: Indicates a strong negative correlation.

3. Dimensionality Reduction



PCA plot.png
Image

● pc_1, pc_2, pc_3, pc_4 represent different principal components. ● The diagonal plots show the distribution of data for individual components. ● Scatter plots display the relationships between two components (e.g., the top-left plot shows the relationship between pc_1 and pc_2).
● More concentrated distributions indicate components with lower variance. ● More dispersed distributions indicate components capturing higher variance.

View in full screen

PLS plot.png
Image

● PLS component 1 and PLS component 2 are the primary components derived from PLS (Partial Least Squares) analysis.
● PLS component 1 captures the most critical variance for classification, while PLS component 2 provides supplementary information to enhance classification.

View in full screen

UMAP plot.png
Image

● If the colored clusters are distinct and well-separated (e.g., forming clearly defined blocks), it indicates that UMAP has successfully preserved the high-dimensional clustering structure in the lower-dimensional space.
● UMAP prioritizes preserving local proximity in the data. Points that are close in the plot are likely to share similar features in the high-dimensional space.
● Points that are far apart generally represent significant differences in features.

View in full screen