PinBioML - Home

Example Result

Use Breast_Cancer_Wisconsin dataset

View Details

1. The optimal model

To determine the optimal model, the scores for CV (cross-validation) and test sets are ranked in ascending order, with the best score ranked as 1. After ranking, the sum of these ranked values is calculated. The model with the smallest total ranking value is selected as the optimal model.

All-model-result.csv

This file contains detailed performance metrics for all models.
● train, cv, test: Training set, cross-validation set, and test set, respectively.
● mcc: Matthews Correlation Coefficient.
● support: The number of samples.

	MissingValueProcessing	Standarization	Selection	Modeling	model_ordering	train_sensitivity	train_f1	train_specificity	train_auc	train_mcc	train_accuracy	train_support	cv_sensitivity	cv_f1	cv_specificity	cv_auc	cv_mcc	cv_accuracy	cv_support	test_sensitivity	test_f1	test_specificity	test_auc	test_mcc	test_accuracy	test_support
0	Mean	StandardScaler	Ensemble	LogisticRegression	0	1.0	1.0	1.0	1.0	1.0	1.0	455	0.9789473684210526	0.9738812103030924	0.9470588235294116	0.9702786377708978	0.9295621218707822	0.9670329670329672	91.0	0.9861111111111112	0.9726027397260274	0.9285714285714286	0.95734126984127	0.9245181185940876	0.9649122807017544	114
1	Mean	StandardScaler	Ensemble	ElasticLogit	8	0.9824561403508772	0.9824561403508772	0.9705882352941176	0.9978947368421052	0.9530443756449948	0.978021978021978	455	0.9789473684210526	0.9789443978190556	0.9647058823529412	0.9961816305469556	0.9441732684993444	0.9736263736263736	91.0	0.9861111111111112	0.9793103448275862	0.9523809523809524	0.988425925925926	0.9433397594898876	0.9736842105263158	114
2	Mean	StandardScaler	Ensemble	RandomForest	10	0.9894736842105264	0.9877408056042032	0.976470588235294	0.9986790505675954	0.9671026949575336	0.9846153846153848	455	0.9543859649122808	0.9610888755169956	0.9470588235294116	0.9902992776057792	0.8977253857835974	0.9516483516483516	91.0	0.9861111111111112	0.9861111111111112	0.9761904761904762	0.99239417989418	0.9623015873015872	0.9824561403508772	114
3	Mean	StandardScaler	Ensemble	Support Vector Machine(SVM)	8	0.9824561403508772	0.9824561403508772	0.9705882352941176	0.99890608875129	0.9530443756449948	0.978021978021978	455	0.9859649122807016	0.9808927906007444	0.9588235294117646	0.9973168214654284	0.9489754912939038	0.975824175824176	91.0	0.9861111111111112	0.9793103448275862	0.9523809523809524	0.988095238095238	0.9433397594898876	0.9736842105263158	114
4	Mean	StandardScaler	Ensemble	XGBoost	7	1.0	0.9982486865148862	0.9941176470588236	1.0	0.9953098568937948	0.9978021978021978	455	0.9578947368421052	0.9663672447656166	0.9588235294117646	0.9927760577915375	0.9120484041752498	0.9582417582417582	91.0	1.0	0.993103448275862	0.9761904761904762	0.992063492063492	0.9812328999345132	0.9912280701754386	114
5	Mean	StandardScaler	Ensemble	LightGBM	7	1.0	1.0	1.0	1.0	1.0	1.0	455	0.9614035087719298	0.9664321201233173	0.9529411764705882	0.993498452012384	0.9114341893931424	0.9582417582417582	91.0	1.0	0.9863013698630136	0.9523809523809524	0.9791666666666666	0.962621902223779	0.9824561403508772	114
6	Mean	StandardScaler	Ensemble	AdaBoost	6	1.0	1.0	1.0	1.0	1.0	1.0	455	0.9789473684210526	0.97720474133151	0.9588235294117646	0.9920536635706916	0.9396125985535256	0.9714285714285712	91.0	1.0	0.9863013698630136	0.9523809523809524	0.9854497354497356	0.962621902223779	0.9824561403508772	114
7	Mean	StandardScaler	Ensemble	DecisionTree	5	0.9192982456140352	0.9527272727272728	0.9823529411764704	0.9907946336429309	0.8844753054423304	0.9428571428571428	455	0.8947368421052632	0.9236194778672656	0.9294117647058824	0.9594943240454076	0.8120972401052647	0.9076923076923076	91.0	0.9027777777777778	0.9420289855072465	0.9761904761904762	0.980654761904762	0.858759385846922	0.9298245614035088	114

features.csv

● These are the feature columns selected by different feature selection methods.
● The earlier a feature appears, the more important it is in that feature selection method.

	DT_score_c45	RandomForest_gini	LassoLars	multi_Lasso	SVM
radius3	0.9462138563710176	2.806396967481542	0.9433843070512326	-0.4752190560911419	0.2125168422429794
concave_points3	0.6348524547499099	0.6190475127471899	2.0433037634653486	1.0478461375689012	-0.9971233386334518
perimeter3	1.157919364488023	-0.2161239327723453	1.4936844969948433	0.2867930633704256	-0.3418100042954625
area3	1.108379881075629	0.6569065511072884	nan	0.541911988183617	0.0697113408868436
perimeter1	-0.0319553685639685	-0.00006850463470163166	nan	1.0439241504285617	nan
area1	0.0102207497203871	-0.3466821946424107	nan	1.3178009216929685	nan
concavity1	-1.0294849953250471	0.4695027669065759	nan	-0.2068832751035996	1.5802308963292075
fractal_dimension2	nan	nan	nan	nan	0.6503240379417228
radius1	-0.189922846236511	-0.5740952348523398	-0.730262979509028	1.3178009216929685	nan
symmetry1	nan	nan	nan	-0.3566352773752282	nan
concave_points1	0.2454136529647115	0.2207120698229523	0.4873284870098482	-1.1066974323379015	-0.2945788134442916
concave_points2	nan	nan	-0.7488613427945731	nan	nan
texture3	nan	-0.9543722223125854	-0.0639063115983443	-1.8699939099710825	2.08914789955742
area2	-0.7637578171079085	-0.3364553737845795	nan	nan	0.1825945529919905
smoothness1	nan	-1.107953377083674	nan	0.1264841862628401	nan
texture2	nan	nan	-0.7633072192783248	nan	-0.6039313484491982
smoothness2	nan	nan	-0.6674558388837003	nan	-1.4250387182112492
fractal_dimension3	nan	nan	-0.6945485213585393	-1.2967707655971557	-0.1857001401707758
compactness2	nan	-1.105943411731276	-0.7082171353369598	nan	-0.9363432067457306
concavity3	-2.087878932136224	-0.1308716162516349	-0.5911417057618045	-0.370361652724176	nan

feature_importance.html

View in full screen

shap_plot.html

View in full screen

model_parameter.csv

● This file contains the parameters of the best-performing classification model.

model	parameter	result
SVM	Kernel Type	linear
SVM	Inverse Regularization Strength (C)	0.4111755686415492

optimal-model-scores.csv

● support: The number of samples
● macro avg: The average of metrics across all classes, calculated as (class1 + class2)/2
● weighteed avg: The weighted average of metrics across all classes, calculated as(class1 + class2) / (support(class1) + support(class2))

	precision	recall	f1-score	support
B	0.9594594594594594	0.9861111111111112	0.9726027397260274	72.0
M	0.975	0.9285714285714286	0.9512195121951219	42.0

macro avg	0.9672297297297296	0.9573412698412699	0.9619111259605746	114.0
weighted avg	0.9651849217638692	0.9649122807017544	0.9647247085304307	114.0
accuracy			0.9649122807017544	114.0
sensitivity			0.9861111111111112	72.0
specificity			0.9285714285714286	72.0

2. Data Overview

Correlation Heatmap.png

The correlation heatmap visualizes correlation coefficients, which measure the linear relationship between two variables. The range of correlation coefficients is from -1 to 1:

● Closer to 1: Indicates a strong positive correlation.
● Closer to 0: Indicates no linear relationship.
● Closer to -1: Indicates a strong negative correlation.

3. Dimensionality Reduction

PCA.png

● pc_1, pc_2, pc_3, pc_4 represent different principal components.
● The diagonal plots show the distribution of data for individual components.
● Scatter plots display the relationships between two components (e.g., the top-left plot shows the relationship between pc_1 and pc_2).

● More concentrated distributions indicate components with lower variance.
● More dispersed distributions indicate components capturing higher variance.

View in full screen

PLS.png

● PLS component 1 and PLS component 2 are the primary components derived from PLS (Partial Least Squares) analysis.

● PLS component 1 captures the most critical variance for classification, while PLS component 2 provides supplementary information to enhance classification.

View in full screen

UMAP.png

● If the colored clusters are distinct and well-separated (e.g., forming clearly defined blocks), it indicates that UMAP has successfully preserved the high-dimensional clustering structure in the lower-dimensional space.

● UMAP prioritizes preserving local proximity in the data. Points that are close in the plot are likely to share similar features in the high-dimensional space.

● Points that are far apart generally represent significant differences in features.

View in full screen

Get Started

PineBioML

Upload

Setting

Result

Download

Total views

Example Result

1. The optimal model

All-model-result.csv

features.csv

feature_importance.html

shap_plot.html

model_parameter.csv

optimal-model-scores.csv

2. Data Overview

Correlation Heatmap.png

3. Dimensionality Reduction

PCA.png

PLS.png

UMAP.png