Lead Infrastructure Developer job in Houston, TX| Recruit Arrow
Recruit Arrow
Email Password
Forgot your password?
Welcome, Guest! New User?
Tell a Friend
Success Stories
From my very first contact with Recruit Arrow, through the application process, interviews, and my acceptance of the position, this service was an invaluable asset. My recruiter gave me insight into the company by sharing her own experiences and views.


Job Search

What


 job title, keywords

Where


 city, state, zip






JOB CENTER

Your Window to a World of Opportunities


Lead Infrastructure Developer
Location : Texas, Houston
Refer job # PJQD339285
 
Job Responsibilities and Requirements: In the role of Site Reliability Engineer, you will be responsible for: Automating operational tasks - analyze existing BAU tasks and ensure 80% of BAU tasks are automated through self-healing Automating incident resolution - Review the root cause analysis of all major incidents and ensure that 60% are resolved by automation after first offense Developing and maintain runbook to fully enable an automated failover of an application Analyzing the system using frameworks like ChaosMonkey to identify weak points in the architecture and work with development and engineering teams to handle all incident situations Developing a Continuous Integration and Delivery model with products like Jenkins to ensure delivery of new content is fully automatable with pipelines that do automated integration and functional testing Leveraging application performance monitoring (APM) products such as AppDynamics, Dynatrace, Splunk, etc. to analyze production code and partner with engineering and development teams to proactively identify gaps in production releases to prevent incidents Partnering with application and engineering teams to identify all monitoring requirements and ensure coverage is achieved at all stacks of the application (UX, Web, App, Middleware, DB) using the firms standard monitoring solutions Work with capacity management tools to proactively monitor performance and identify gaps to ensure tuning or capacity increases are achieved before incident s 24x5 real coverage with on call support on weekends to provide a follow the sun model for support and incident remediation for all major incidents Partnering with application owner to understand all audit/csa/rcsa requirements and deliver solutions proactively and reactively to ensure satisfactory and compliant results Qualifications Site Reliability Engineer: Software expertise with the ability to code and script in multiple languages (PowerShell, Python, Ansible, Puppet, Shell, etc. and other open source technologies and tools Hands-on experience with strong understanding of infrastructure (operating systems, web, middleware, and networking) Background in an engineering or support role to manage infrastructure Strong knowledge of Continuous Integration and Continuous Delivery Experience with Scrum/Agile development methodologies Capable of delivering on multiple competing priorities with little supervision Excellent verbal and written communication skills Computer Science or similar degree with experience in the following software/tools: Infrastructure automation technologies: Ansible, Puppet, Chef, etc. Declarative Programming languages: YML, Ruby, etc. Scripting languages like Python, Perl, Shell, etc.
 
 
 
[Apply Now] [Email to a Colleague]

This particular job is currently not active. However, since our clients regularly share with us similar and other job openings, we strongly recommend that you submit your resume. We shall review your resume and get in touch with you as soon as a suitable vacancy comes up to further discuss your interest in exploring the opportunity.Assisting you is our highest priority.

Please be assured that none of your materials will be forwarded to any employer without your consent. Of course, all inquiries are kept strictly confidential.


Newsletter and Job Updates