Introduction to ML
Created onDibuat pada
Last updated onTerakhir diperbarui
2 min read2 menit bacaIDAIdata-science
ContributorsKontributor
Razi
Machine learning, subset dari AI.
#Link to this headingDefinisi
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. (Tom Mitchell, 1998)
An automated process that extracts patterns from data. (Kelleher et al., 2015)
#Link to this headingTipe dalam learning
- Supervised Learning:
- Unsupervised Learning:
- Reinforcement Learning:
Ensemble methods work by aggregating predictions from multiple models. The key insight is that a group of weak learners can be combined to form a strong learner.
#Link to this headingTypes of Ensemble Methods
Bagging (Bootstrap Aggregating)
- Creates multiple versions of a model using bootstrap sampling
- Example: Random Forest
Boosting
- Sequentially trains models, with each new model focusing on the errors of previous ones
- Examples: AdaBoost, Gradient Boosting, XGBoost
Stacking
- Combines predictions from multiple models using a meta-learner
- Can use different types of base models
#Link to this headingWhy Use Ensemble Methods?
- Improved Accuracy: Combining models often leads to better predictions than any single model
- Reduced Overfitting: Averaging reduces variance in predictions
- Robustness: Less sensitive to outliers and noise in the data
#Link to this headingPractical Implementation
Here's a simple example using scikit-learn:
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
# Create dataset
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Random Forest (Bagging)
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
print(f"Random Forest Accuracy: {rf.score(X_test, y_test):.3f}")
# Gradient Boosting
gb = GradientBoostingClassifier(n_estimators=100, random_state=42)
gb.fit(X_train, y_train)
print(f"Gradient Boosting Accuracy: {gb.score(X_test, y_test):.3f}")#Link to this headingKey Considerations
- Computational Cost: Ensemble methods require training multiple models
- Interpretability: Can be harder to understand than single models
- Hyperparameter Tuning: More parameters to optimize
#Link to this headingConclusion
Ensemble methods are essential tools in the machine learning toolkit. By combining multiple models, we can achieve better performance and more robust predictions.