HartfordRecruiter Since 2001
the smart solution for Hartford jobs

Cloud Service Reliability Engineer

Company: Data Robot
Location: hartford
Posted on: May 3, 2021

Job Description:

As a Senior Service Reliability Engineer,you willownandimproveServiceReliability andAvailabilityof this DataRobot's AI platform.You will be tasked to make DataRobots AI/ML platform more reliable, efficient, and scalable. You will play a key role in how the DataRobot tools and practices enable seamless scale while preventing failures. As an SRE, you will be part of the team that builds and enable the DevSecOps toolchain whilecontinuously improving our ML/AI platform at scale. You will contribute to thefull-servicelifecycle:from service development to live service response, as we continuously deploy new and innovative functionality for our customersResponsibilities: Must be familiar with AWS, GCP, and Azure architecture patterns and capabilities Well versed in Software Defined Network definitions, capabilities, and limitations Handle high-pressure situations in a calm and professional manner Lead resolution effort of complex service problems from the network layer to the application at scale Motivate, encourage, and provide technical leadership to team members Workhand-in-handwith software developersto facilitate the adoption of "Paved Road" solutions Build and support large-scale servicesacross multiple platforms (Azure,AWS,andGCP) Diagnose and repair issues by editing code innode.js,modifyingMongoDB, Postgres, Redis, andconfiguration changes incloud service providers Create, edit, and maintain ad hoc scripts to resolve issues quickly with minimal user impact Contribute to the development of new tools and automation that ensures the service can be optimized and tuned with minimal human intervention Supportperiodic on-call dutyTechnologies: MongoDB,Mongo MMS,node.js/IISon AWS/GCP/Azure Demonstrable experience in one or more languages: Python, Perl, PHP is a plus Strong knowledge of TCP/IP networking, SMTP, HTTP, load-balancers, highly available network servers GitHub/Artifactory/RabbitMQ, Application Performance Monitoring principles, CDN, DNS Knowledge of IP networking, network analysis, performance, and application issues using tools like fiddler andWireshark Requirements: A passion for automating everything A passion for collaborating and tearing down communication silos Experience maintaining large scale infrastructure, 100+ servers minimum 5+ Years experience with AWS 3+ Years experience with Terraform or CloudFormation 5+ Years experience with Linux (Ubuntu, RedHat, or similar)Qualifications:Bachelor's Degree in CS, MIS,or equivalent experience; 6+ years of relevant experience with Windows/Unix systems fundamentals,monitoring,cloud services, networking, storage, database, and application knowledge; Solid communications skillsIndividuals seeking employment at DataRobot are considered without regard to race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, or sexual orientation.

Keywords: Data Robot, Hartford , Cloud Service Reliability Engineer, Other , hartford, Connecticut

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category

Log In or Create An Account

Get the latest Connecticut jobs by following @recnetCT on Twitter!

Hartford RSS job feeds