Get in Touch

Course Outline

Introduction to AIOps with Open Source Solutions

  • Overview of AIOps concepts and advantages
  • The role of Prometheus and Grafana in the observability stack
  • The place of Machine Learning in AIOps: predictive versus reactive analytics

Establishing Prometheus and Grafana

  • Installation and configuration of Prometheus for time series data collection
  • Designing dashboards in Grafana utilizing real-time metrics
  • Exploring exporters, relabeling techniques, and service discovery

Data Preprocessing for Machine Learning

  • Extracting and transforming Prometheus metrics
  • Preparing datasets for anomaly detection and forecasting tasks
  • Leveraging Grafana’s transformation features or Python pipelines

Applying Machine Learning for Anomaly Detection

  • Fundamental ML models for outlier detection (e.g., Isolation Forest, One-Class SVM)
  • Training and evaluating models on time series datasets
  • Visualizing detected anomalies within Grafana dashboards

Forecasting Metrics with Machine Learning

  • Constructing basic forecasting models (ARIMA, Prophet, introduction to LSTM)
  • Predicting system load and resource utilization
  • Utilizing predictions to inform early alerting and scaling strategies

Integrating Machine Learning with Alerting and Automation

  • Establishing alert rules based on ML outputs or predefined thresholds
  • Configuring Alertmanager and notification routing
  • Triggering scripts or automation workflows upon anomaly detection

Scaling and Operationalizing AIOps

  • Integrating external observability tools (e.g., ELK stack, Moogsoft, Dynatrace)
  • Deploying ML models within observability pipelines
  • Best practices for implementing AIOps at scale

Summary and Next Steps

Requirements

  • A foundational understanding of system monitoring and observability principles
  • Practical experience working with Grafana or Prometheus
  • Proficiency in Python and familiarity with core machine learning concepts

Target Audience

  • Observability engineers
  • Infrastructure and DevOps teams
  • Monitoring platform architects and Site Reliability Engineers (SREs)
 14 Hours

Number of participants


Price per participant

Upcoming Courses

Related Categories