Ensemble Methods

Created onDibuat padaJanuary 18, 202618 Januari 2026

Last updated onTerakhir diperbaruiJanuary 18, 202618 Januari 2026

3 min read3 menit baca

IDpythondata-science

ContributorsKontributor

Razi

IntroductionPendahuluan The Wisdom of Crowds Kaitannya dengan Ensemble Methods Ensemble Methods Types of Ensemble Methods Why Use Ensemble Methods?Practical Implementation Key Considerations Conclusion

Ensemble methods are powerful techniques that combine multiple machine learning models to create a stronger, more accurate predictor.

#Link to this headingThe Wisdom of Crowds

Wisdom of the crowd adalah proses mempertimbangkan pendapat kolektif dari sekelompok individu, alih-alih hanya mengandalkan satu expert, untuk menjawab sebuah pertanyaan.

Dr. Randy Forrest bekerja di sebuah teaching hospital dan memiliki sebuah tim dengan keberagaman keterampilan. Setiap resident memiliki kekuatan masing-masing.

Setiap kali Dr. Forrest mendapatkan kasus baru, Ia meminta pendapat dari para resident-nya, kemudian secara demokratis memilih final diagnosis sebagai diagnosis yang paling umum dari semua diagnosis yang diajukan.

#Link to this headingKaitannya dengan Ensemble Methods

Ensemble method bergantung pada konsep “wisdom of the crowd”, jawaban gabungan dari banyak model sering kali lebih baik daripada jawaban dari satu model individu.

Di balik keberhasilan ensemble methods:

Ensemble diversity (variasi pendapat yang dapat dipilih)
Model aggregation menjadi satu final decision

Diversity dan independence penting karena keputusan kolektif terbaik merupakan hasil dari perbedaan pendapat dan adu gagasan, bukan dari konsensus atau hasil kompromi.

#Link to this headingEnsemble Methods

Ensemble methods bekerja dengan menggabungkan predictions dari beberapa model. Inti utamanya adalah bahwa sekelompok weak learners dapat dikombinasikan untuk membentuk sebuah strong learner.

#Link to this headingTypes of Ensemble Methods

Bagging (Bootstrap Aggregating)
- Creates multiple versions of a model using bootstrap sampling
- Example: Random Forest
Boosting
- Sequentially trains models, with each new model focusing on the errors of previous ones
- Examples: AdaBoost, Gradient Boosting, XGBoost
Stacking
- Combines predictions from multiple models using a meta-learner
- Can use different types of base models

#Link to this headingWhy Use Ensemble Methods?

Improved Accuracy: Combining models often leads to better predictions than any single model
Reduced Overfitting: Averaging reduces variance in predictions
Robustness: Less sensitive to outliers and noise in the data

#Link to this headingPractical Implementation

Here's a simple example using scikit-learn:

from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
 
# Create dataset
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
 
# Random Forest (Bagging)
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
print(f"Random Forest Accuracy: {rf.score(X_test, y_test):.3f}")
 
# Gradient Boosting
gb = GradientBoostingClassifier(n_estimators=100, random_state=42)
gb.fit(X_train, y_train)
print(f"Gradient Boosting Accuracy: {gb.score(X_test, y_test):.3f}")

#Link to this headingKey Considerations

Computational Cost: Ensemble methods require training multiple models
Interpretability: Can be harder to understand than single models
Hyperparameter Tuning: More parameters to optimize

#Link to this headingConclusion

Ensemble methods are essential tools in the machine learning toolkit. By combining multiple models, we can achieve better performance and more robust predictions.

Introduction to ML

Backlinks

IF3270