Description of assignment 43920 Göteborg
Back to the assignment listWe are seeking a Senior Site Reliability Engineer who excels at working at the Operational side of
DevOps. Attention to detail, proactivity, and problem-solving skills are key, as is the ability to communicate and collaborate effectively.
Job description
Position: Senior SRE Engineer within Platform Operations and Support
• A service minded team player with a quality driven approach
• Manage and dispatch incident and service requests.
• Provide high quality support, drive trouble shooting, RCAs and be advisor to Dev teams
• Be responsible for maintaining the platform availability, shorten time to market for new features, and improve performance.
• Play a crucial role in troubleshooting and quality assurance from an end-to-end perspective.
• Focus on understanding, monitoring, and improving the production system, actively preventing future incidents.
• Be a leading star for continuous improvements and innovations.
Overview of responsibilities
System support & troubleshooting
• Guiding and coordinating junior colleagues within the team.
• Assist in initial technical analysis for production incidents.
• Support development team in building capabilities for alerts and monitoring.
• Conduct code review for reported cases, fixes development, and delivery.
Infrastructure Automation and Configuration Management
• Develop and maintain automation tools, scripts, and configuration management systems.
• Implement Infrastructure as Code (IaC) practices using tools like Ansible, Terraform, or
Kubernetes.
• Collaborate with development and operations teams to automate build, test, and deployment
processes
Reliability Engineering and Resilience
• Design and implement systems and processes to enhance infrastructure reliability and
resilience.
• Continuously improve system reliability by analyzing logs and trends, identifying areas for
improvement, and implementing preventative measures.
System Monitoring and Incident Response
• Develop and manage monitoring tools and systems to track software and infrastructure
health, performance, security, and availability.
• Set up alerts, dashboards, and metrics for proactive detection and response to incidents.
• Investigate and diagnose root causes of incidents and work towards resolution in a timely
manner.
Continuous Improvement and Collaboration
• Drive a culture of continuous improvement by identifying areas for automation and efficiency.
• Document procedures, incidents, and best practices for knowledge sharing and team
efficiency.
• Stay updated on industry trends and emerging technologies to propose innovative solutions.
• Collaborate closely with cross-functional teams to ensure smooth operation of systems.
Required skills & experience.
• Bachelor"s degree in computer science, Engineering, or a related field (or equivalent
experience) with 5+ years of DevOps SRE work.
• Proficient in scripting/programm[i]ng languages such as Python, Bash.
• Experience with cloud platforms (AWS preferred).
• Experience in DevOps practice, CI/CD, and monitoring tools.
• Experience with automation tools and configuration management fram[e]works such as
Terraform, AWS CDK, Puppet, or Ansible.
• Strong troubleshooting and problem-solving skills with a keen attention to detail.
• Excellent communication and collaboration skills to work effectively in a cross-functional team
environment.
• Strong experience in system administration, infrastructure management, or site reliability
engineering.
Location: Gothenburg, minimum 3 days on site
Language: Fluent English
Date:
31 May, 2024 to 31 May, 2025