Advanced Docker Enterprise Troubleshooting
In this service and SRE-focused course, you will learn broadly applicable techniques for diagnosing platform and application failures in Docker Enterprise. We will cover early response strategies for Swarm and Kubernetes applications, see how to identify and avoid the most common cluster failure modes, and practice troubleshooting and disaster recovery actions for UCP and DTR. This course is designed to help experienced Docker Enterprise agents self-manage a broad range of support needs, reducing resolution time and accelerating service case outcomes.
- COD:Â CN310
- CATEGORIES:Â Â Mirantis CNA
DESCRIPTION
COURSE OBJECTIVES
ADDITIONAL INFORMATION
DESCRIPTION
Who should participate
- Anyone looking to provide day-2 operations and support for production-grade Docker Enterprise clusters hosting mission-critical applications.
- SREs, support teams, or operators who manage Docker Enterprise
Laboratory requirements
- Laptop with WiFi connectivity
- Participants will need to have the latest version of Chrome or Firefox installed and a free account on strigo.io .
COURSE OBJECTIVES
- Containerized application diagnostic strategies
- Audit and tracking of containerization tools
- Workload tracking and troubleshooting
- Network tracking
- Severity assessment and identification of real problems
- Logging & Monitoring Strategies
- Platform and application data sources
- Manipulation and entry of container registration data
- Docker Enterprise Documentation
- Orient yourself in the documentation
- Find documentation on usage, troubleshooting, and best practices
- UCP Support Dumps
- Generate support dumps automatically and manually
- Interpret the contents of support dumps
- Troubleshooting Resource Problems
- Detect memory, CPU, and I/O constraints
- Mitigate excessive resource consumption
- Troubleshooting Networking Problems
- Swarm network implementation review
- Common Swarm Networking Issues and Mitigations
- UCP network requirements, failures and mitigations
- Swarm and Kube DNS troubleshooting
- Troubleshooting UCP
- Correlate UCP errors with UCP components and logs
- State reconciliation error investigation with etcd and rethinkdb
- Troubleshooting DTR
- Correlate DTR errors with DTR components and registers
Resources and DTR sizing to mitigate errors - performance
- Audit of DTR job logs and activity monitors
- Automatic reset of DTR
- Correlate DTR errors with DTR components and registers
- Disaster Recovery
- Backup the Swarm, UCP and DTR
- Restore from backups
- Â
ADDITIONAL INFORMATION
Duration – 3 days
Delivery – in Classroom, On Site, Remote
PC and SW requirements:
- Internet connection
- Web browser, Google Chrome
- Zoom
Language
Instructor: English
Workshops: English
Slides: English