Ensemble Methods
Ensemble methods are powerful techniques that combine multiple machine learning models to create a stronger, more accurate predictor.
#Link to this headingThe Wisdom of Crowds
Wisdom of the crowd adalah proses mempertimbangkan pendapat kolektif dari sekelompok individu, alih-alih hanya mengandalkan satu expert, untuk menjawab sebuah pertanyaan.
Dr. Randy Forrest bekerja di sebuah teaching hospital dan memiliki sebuah tim dengan keberagaman keterampilan. Setiap resident memiliki kekuatan masing-masing.
Setiap kali Dr. Forrest mendapatkan kasus baru, Ia meminta pendapat dari para resident-nya, kemudian secara demokratis memilih final diagnosis sebagai diagnosis yang paling umum dari semua diagnosis yang diajukan.
#Link to this headingKaitannya dengan Ensemble Methods
Ensemble method bergantung pada konsep “wisdom of the crowd”, jawaban gabungan dari banyak model sering kali lebih baik daripada jawaban dari satu model individu.
Di balik keberhasilan ensemble methods:
- Ensemble diversity (variasi pendapat yang dapat dipilih)
- Model aggregation menjadi satu final decision
Diversity dan independence penting karena keputusan kolektif terbaik merupakan hasil dari perbedaan pendapat dan adu gagasan, bukan dari konsensus atau hasil kompromi.
#Link to this headingEnsemble Methods
Ensemble methods bekerja dengan menggabungkan predictions dari beberapa model. Inti utamanya adalah bahwa sekelompok weak learners dapat dikombinasikan untuk membentuk sebuah strong learner.
#Link to this headingTypes of Ensemble Methods
Bagging (Bootstrap Aggregating)
- Creates multiple versions of a model using bootstrap sampling
- Example: Random Forest
Boosting
- Sequentially trains models, with each new model focusing on the errors of previous ones
- Examples: AdaBoost, Gradient Boosting, XGBoost
Stacking
- Combines predictions from multiple models using a meta-learner
- Can use different types of base models
#Link to this headingWhy Use Ensemble Methods?
- Improved Accuracy: Combining models often leads to better predictions than any single model
- Reduced Overfitting: Averaging reduces variance in predictions
- Robustness: Less sensitive to outliers and noise in the data
#Link to this headingPractical Implementation
Here's a simple example using scikit-learn:
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
# Create dataset
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Random Forest (Bagging)
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
print(f"Random Forest Accuracy: {rf.score(X_test, y_test):.3f}")
# Gradient Boosting
gb = GradientBoostingClassifier(n_estimators=100, random_state=42)
gb.fit(X_train, y_train)
print(f"Gradient Boosting Accuracy: {gb.score(X_test, y_test):.3f}")#Link to this headingKey Considerations
- Computational Cost: Ensemble methods require training multiple models
- Interpretability: Can be harder to understand than single models
- Hyperparameter Tuning: More parameters to optimize
#Link to this headingConclusion
Ensemble methods are essential tools in the machine learning toolkit. By combining multiple models, we can achieve better performance and more robust predictions.