StackOverdrive provides a great alternative to managing a large infrastructure and operations team in your organization. Whether your team is under temporary pressure, or you prefer to keep your engineering team focused on application development rather than operations, StackOverdrive can fill the breach.
We offer two types of Managed DevOps Support options, and for clients without their own infrastructure teams, we recommend both.
Proactive Support – we stop problems before they start
StackOverdrive works with many clients, which lets us bring you the benefit of experience and monitoring developments across many infrastructures. There are two big categories of work we do in our proactive support:
- Fix accidents waiting to happen, like out of date software, or software with a newly found security vulnerability; and
- Do the work to update your infrastructure for your changing needs, whether big ticket items like commissioning new servers, or relatively simple but important tweaks, like updating firewall rules.
We’re available as both a second string for dedicated operational teams, and we make a great support for application development teams without dedicated operational specialists. Among the things we can do for you:
- Updating and patching your infrastructure as new vulnerabilities are found – dealing with so-called “Zero-day vulnerabilities” and making sure you’re on the latest versions for security and bug fixes
- Doing daily (or other frequency) health checks of your infrastructure to make sure that there are no issues building up
- Monitoring new additions to your infrastructure until they have settled down
- Ensuring backups happen
- Upgrade outdated software smoothly
- Setting up new hardware, and setting up software after the hardware is installed
- Routine Auditing of your infrastructure
- Systematic Monitoring
- Updating Firewall Policies to fit with your changing needs
- Management of Pager Duty and similar on call communication tools
- Updating existing infrastructure-as-code scripts
Reactive Support – 24×7 support for crises
Unfortunately, things sometimes fail unpredictably. StackOverdrive is available to help when things go down, any time of day or night. We can fill in gaps in your support schedule, take the pressure off your team to provide 24×7 support, and be available as escalated support when your team needs a second opinion in a crisis. We work really well with teams who know the specifics of the infrastructure setup, while we provide the experience dealing with the failure modes of the infrastructure you use.
Incident Reporting – ongoing monitoring and resolution of issues
We can provide first or second line handling of issues as they come up, and take the following steps to get things back on track:
- Identify that an incident is occurring
- Log the incident with details including date, time, and category
- Prioritize the incident as Low, Medium or High
- Respond based on our Service Level Agreement with you
- Fix the issue ourselves or escalate to the correct person as needed
- Once issue is resolved, figure out how to prevent incident from happening again, either by leading postmortems, or conducting our own investigation on your behalf