Beskrivning av uppdrag 43920   Göteborg  

Tillbaka till uppdragslistan
We are seeking a Senior Site Reliability Engineer who excels at working at the Operational side of DevOps. Attention to detail, proactivity, and problem-solving skills are key, as is the ability to communicate and collaborate effectively. Job description Position: Senior SRE Engineer within Platform Operations and Support • A service minded team player with a quality driven approach • Manage and dispatch incident and service requests. • Provide high quality support, drive trouble shooting, RCAs and be advisor to Dev teams • Be responsible for maintaining the platform availability, shorten time to market for new features, and improve performance. • Play a crucial role in troubleshooting and quality assurance from an end-to-end perspective. • Focus on understanding, monitoring, and improving the production system, actively preventing future incidents. • Be a leading star for continuous improvements and innovations. Overview of responsibilities System support & troubleshooting • Guiding and coordinating junior colleagues within the team. • Assist in initial technical analysis for production incidents. • Support development team in building capabilities for alerts and monitoring. • Conduct code review for reported cases, fixes development, and delivery. Infrastructure Automation and Configuration Management • Develop and maintain automation tools, scripts, and configuration management systems. • Implement Infrastructure as Code (IaC) practices using tools like Ansible, Terraform, or Kubernetes. • Collaborate with development and operations teams to automate build, test, and deployment processes Reliability Engineering and Resilience • Design and implement systems and processes to enhance infrastructure reliability and resilience. • Continuously improve system reliability by analyzing logs and trends, identifying areas for improvement, and implementing preventative measures. System Monitoring and Incident Response • Develop and manage monitoring tools and systems to track software and infrastructure health, performance, security, and availability. • Set up alerts, dashboards, and metrics for proactive detection and response to incidents. • Investigate and diagnose root causes of incidents and work towards resolution in a timely manner. Continuous Improvement and Collaboration • Drive a culture of continuous improvement by identifying areas for automation and efficiency. • Document procedures, incidents, and best practices for knowledge sharing and team efficiency. • Stay updated on industry trends and emerging technologies to propose innovative solutions. • Collaborate closely with cross-functional teams to ensure smooth operation of systems. Required skills & experience. • Bachelor"s degree in computer science, Engineering, or a related field (or equivalent experience) with 5+ years of DevOps SRE work. • Proficient in scripting/programm[i]ng languages such as Python, Bash. • Experience with cloud platforms (AWS preferred). • Experience in DevOps practice, CI/CD, and monitoring tools. • Experience with automation tools and configuration management fram[e]works such as Terraform, AWS CDK, Puppet, or Ansible. • Strong troubleshooting and problem-solving skills with a keen attention to detail. • Excellent communication and collaboration skills to work effectively in a cross-functional team environment. • Strong experience in system administration, infrastructure management, or site reliability engineering. Location: Gothenburg, minimum 3 days on site Language: Fluent English Date: 31 May, 2024 to 31 May, 2025
Logga in för att söka detta uppdrag