Computer ScienceScience & MathematicsEconomics & FinanceBusiness & ManagementPolitics & GovernmentHistoryPhilosophy
97 Things Every SRE Should Know
Site reliability engineering (SRE) is more relevant than ever. Knowing how to keep systems reliable has become a critical skill. With this practical book, newcomers and old hats alike will explore a broad range of conversations happening in SRE. You'll get actionable advice on several topics, including how to adopt SRE, why SLOs matter, when y...
SLO Adoption and Usage in Site Reliability Engineering
To realize the full benefits of SRE, organizations need well-thoughtout reliability targets known as service level objectives (SLOs) that are measured by service level indicators (SLIs), a quantitative measure of an aspect of the service. As detailed in the following section, the measurable goals set forth in an organization's SLOs eliminate t...
Training Site Reliability Engineers
This book discusses how to train Site Reliability Engineers, or SREs. Before we go any further, we'd like to clarify the term "SRE." "SRE" means a variety of things: - Site Reliability Engineer or a Site Reliability Engineering team, based on the context (singular, SRE, or plural, SREs) - Site Reliability Engineering concep...
A Case Study in Community-Driven Software Adoption
Within an SRE organization, teams usually develop very different automation tools and processes for accomplishing similar tasks. Some of this can be explained by the software they support: different systems require different reliability solutions. But many SRE tasks are essentially the same across all software: compiling, building, deploying, canar...
Engineering Reliable Mobile Applications
Imagine a situation where your services report healthy and serving but you receive multiple user reports of poor availability. How are these users accessing your service? Most likely, they are using your service through a client application, such as a mobile application on their phone. SRE traditionally has only supported systems and services run i...
Building Secure and Reliable Systems
Can a system be considered truly reliable if it isn't fundamentally secure? Or can it be considered secure if it's unreliable? Security is crucial to the design and operation of scalable systems in production, as it plays an important part in product quality, performance, and availability. In this book, experts from Google share best prac...
Case Studies in Infrastructure Change Management
The Infrastructure Change Management (ICM) program at Google drives migrations, deprecations, and other large-scale infrastructure changes. Case studies in this book explore how infrastructure change projects are managed at Google. From these case studies, we'll provide insight into lessons learned from these different approaches, and provide ...
RubyFu
This book is a great collection of ideas, tricks, and skills that could be useful for Hackers. It's a unique extraction reference, summarizes a lot of research and experience in order to achieve your w00t in the shortest and smartest way. Rubyfu is where you'll find plug-n-hack code. Rubyfu is a book to use not only to read, it's whe...