At first glance, LightStep [x]PM is just another Application Performance Management tool. It checks all the boxes in terms of features, speeds and feeds, making it easy for customers that run large-scale complex software systems to understand how the system is behaving, enabling root-cause analysis during incident and firefighting efforts and making the system faster in a steady state. It’s a great tool to get a grip on ‘what talks to what’, seeing how things depend on each other.

Something’s different about [x]PM, though, and is not that easy to to pin-point. Maybe because even LightStep itself struggles with the APM label? Let’s dive into a number of things that stand out.

The people

The founders of LightStep have designed and deployed a massive-scale monitoring technologies, called Dapper and Monarch, at Google. They have hands-on experience building APM-tooling on a global scale. 

This also means they have first-hand experience with microservices, moving away from monoliths, containers and all these other ‘new stack’ things that organizations struggle with.

Conway’s Law

They’ve seen the impact of moving towards a microservices architecture on organizations, as well as the impact of the organizational chart on the architecture (known as Conway’s Law). This goes far beyond tech alone, and this shows. Ben and Spoons are both exceptional conversationalists, talking about context, business models, the theory behind monitoring and much more with ease, telling a compelling story where LightStep fits in. They are able to educate the market, positioning [x]PM in the APM field in a new and exciting way.

The mission

This all indicates that they know what the issues are they’re trying to solve. They’ve experienced those issues first hand. In production. At Scale. Massively distributed across many services. With insane concurrency.

Now, this is not to say customers need to do things The Google Way, but having a team that understands the extremes of that global scale have an advantage in creating a product for us  mere mortals. Tom Hollingsworth posted about the Cargo Cult of Google  as a result of how LightStep approaches this problem.

Because you know, distributed is a way of saying x, but harder, they made solving performance issues across those massively distributed architectures their hyper-focused mission.

Microservice transactions feel like Rube Goldberg machines in practice
Ben Sigelman, LightStep

The reality is that usually, while moving towards a  microservices architecture, visibility is reduced. Teams developing and operating don’t know what’s going on in their systems and have little to no control; asynchronous concurrency and massive distribution pose huge problems for organizations.

This usually results in overcompensation with logging and metrics (and high SIEM cost to boot), finger-pointing between teams, public outages and declining number of new releases.

This is where [x]PM aims to solve issues: telling a coherent story on transactions across the distribution and concurrency, abstracting away those complexities and surface the right information and insights.

The product

So they know how to build an APM product and they know all about microservices (‘the new stack’). That’s basically the ‘what’ of their [x]PM product: APM for microservices. But they still do APM for monoliths, too, as most customer are somewhere on the journey from monoliths to microservices. A journey that never really ends, too.

But it goes a lot further than that. They cleverly use a decentralized architecture, optimizing for scale on the SaaS-part, while optimizing for cost and efficiency on the on-prem part.

This architecture allows them to process and analyze 100% of transactions across all services in production without massive storage requirements. This is no small feat, and makes LightStep unique, because APM solutions usually generate massive amounts of data with a very negative effect on cost and usability (ever tried searching multiple terabytes for a specific trace?).

To prevent sending the data LightStep collects to the far-away SaaS-service (and possibly incurring extra network traffic and associated cost), the architecture leverages ‘on-prem’ satellites (emphasis on ‘on-prem’ as this means ‘running near the application’, not necessarily ‘in a customer-local datacenter’) to temporarily store trace information and process it locally, before sending the processed results to the SaaS.

The customers

LightStep came out of stealth mode in 2017 with [x]PM, and had a number of big launching customers, from Lyft to Yext to DigitalOcean. These are not small customers by any stretch of the imagination, and having them on-record talking about their experience is unique in the space.

There’s multiple reasons their customers use LightStep, but a thing called ‘Developer Velocity’ seems to be a major concern in an increasingly complex distributed environment. Having a single source of truth to see a complete, reliable picture of the system in real time, across geo-distributed teams helps in discovering and triaging performance problems quickly. This fast and easy root cause analysis reduces the time-to-fix and improves availability, which is obviously a major consideration for customers.

The common theme is that moving from monoliths to a distributed microservices architecture is hard. The exponential growth of service relations, growing transaction and data volumes and number of development teams calls for end-to-end visibility to optimize performance, minimize errors and downtime.

Using LightStep allows engineering teams to spend less time on operational issues around performance, availability or reliability, freeing up time to add more features or remove technical debt, and generally being more productive and efficient.

Many of these customers share a certain profile, too. They are companies that bet big on digital business, contribute to open source projects and are becoming ‘software companies’.

Different, but good

So LightStep are different. They have a fresh take on the APM-field and focus on issues real-world customers have when moving from monoliths to a microservices architecture. This makes them uniquely positioned to help a growing number of customers that start on their journey towards microservices and the cloud. 

Clearly, the makeup of modern applications is different from those in the past, but it’s not like existing APM vendors are ignoring that change. Datadog, New Relic, AppDynamics and others are all promising similar results as LightStep is. That said, companies like Twilio down leverage tools that don’t deliver outcomes and so clearly LightStep is doing something right. The APM space is pretty busy, so it will be interesting to see how big of a niche LightStep can carve itself.