Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction to AIOps with Open Source Solutions
- Overview of AIOps concepts and advantages
- The role of Prometheus and Grafana in the observability stack
- The place of Machine Learning in AIOps: predictive versus reactive analytics
Establishing Prometheus and Grafana
- Installation and configuration of Prometheus for time series data collection
- Designing dashboards in Grafana utilizing real-time metrics
- Exploring exporters, relabeling techniques, and service discovery
Data Preprocessing for Machine Learning
- Extracting and transforming Prometheus metrics
- Preparing datasets for anomaly detection and forecasting tasks
- Leveraging Grafana’s transformation features or Python pipelines
Applying Machine Learning for Anomaly Detection
- Fundamental ML models for outlier detection (e.g., Isolation Forest, One-Class SVM)
- Training and evaluating models on time series datasets
- Visualizing detected anomalies within Grafana dashboards
Forecasting Metrics with Machine Learning
- Constructing basic forecasting models (ARIMA, Prophet, introduction to LSTM)
- Predicting system load and resource utilization
- Utilizing predictions to inform early alerting and scaling strategies
Integrating Machine Learning with Alerting and Automation
- Establishing alert rules based on ML outputs or predefined thresholds
- Configuring Alertmanager and notification routing
- Triggering scripts or automation workflows upon anomaly detection
Scaling and Operationalizing AIOps
- Integrating external observability tools (e.g., ELK stack, Moogsoft, Dynatrace)
- Deploying ML models within observability pipelines
- Best practices for implementing AIOps at scale
Summary and Next Steps
Requirements
- A foundational understanding of system monitoring and observability principles
- Practical experience working with Grafana or Prometheus
- Proficiency in Python and familiarity with core machine learning concepts
Target Audience
- Observability engineers
- Infrastructure and DevOps teams
- Monitoring platform architects and Site Reliability Engineers (SREs)
14 Hours