From time to time I’m being asked, “What do you do?” or “What is SRE?”
I always start by saying “Site Reliability Engineer” or “DevOps, ya know?”
yea – it cost nothing, easy, and very self-explanatory… 😂
NOT.
SRE (Site Reliability Engineers) from my perspective are the natural evolution of system admin engineers, where we solve and research issues on operation, performance, observability, tooling, and platform from the engineering point of view – writing design docs and executing with code.
Besides executing our own projects in the reliability domain, we also mentor and guide the culture of the developers teams – providing them tools and documentation for a self-service approach.
The typical area of operations for the SRE that I’ve witnessed would be:
- Reliability (duh)
- Infrastructure (Terraform, Ansible, Rundeck, etc…)
- Observability (Logs, Metrics and Tracing)
- Performance (eBPF)
- Tooling (Bash, Python and GoLang)
- On-Call (First line of defense)
- Incident Management
No Comments
Leave a comment Cancel