1. Library
  2. Podcasts
  3. Jamstack Radio
  4. Ep. #132, Open Source Observability with Pranay Prateek of SigNoz
Jamstack Radio
28 MIN

Ep. #132, Open Source Observability with Pranay Prateek of SigNoz

light mode
about the episode

In episode 132 of Jamstack Radio, Brian speaks with Pranay Prateek of SigNoz. This talk explores application performance monitoring, insights on utilizing OpenTelemetry, and reasons to consider open source observability solutions instead of those from SaaS vendors.

Pranay Prateek is the Co-Founder and a maintainer of SigNoz. He previously held the position of product manager at both Microsoft and DocsApp.

transcript

Brian Douglas: Welcome to another installment of Jamstack Radio. On the line we've got Pranay. Pranay, you're with Signoz. How are you doing?

Pranay Prateek: Hey, BDougie, thanks for inviting me. Doing good.

Brian: Do you want to introduce yourself to the audience and tell us what you do and how you got here?

Pranay: Yeah, sure. So I'm Pranay, I'm one of the co founders and maintainers at Signoz. SigNoz is an open source observability platform. You can think of us as an alternative to Datadog, New Relic. Yeah, we started two years back, as part of Y Combinator with the '21 batch. Yeah, before that I'm an engineer by training, worked at Microsoft in the product management team and before starting Signoz I was leading a product team.

That's where we faced this problem of observability that, hey, if you're running decently complicated infrastructure, if things start to get things wrong then you really need tools to monitor it well and when such problems happen, it's always an identified problem and you want to figure out what's going wrong, and observability tools are a great way to solve that problem.

The way we started was that in that company where I was working, I was heading the product team and basically the engineering team was under me, and we would have this incident reporting that, hey, something is going wrong, why don't you look into it?" And we would get on the piece, but because we didn't have good observability tools, we had some Prometheus and Elastic, but not the complete toolset which provides you to link to solve problems quickly.

Hence, we thought that, hey, these open source tools currently, like Prometheus, Elastic which do part of the things, but there's other closed source tools like Datadog which are much superior, right?

So we thought, hey, why isn't there an open source version of it, very similar to Datadog? Being developers ourselves, that seemed to be a natural thing which we thought that should exist in the world because such tools are used by developers and DevOps engineers, so why not have a tool which has all the signals like metrics, traces and logs?

Which are that key thing in observability which can help you solve issues faster. So that's where I started thinking about it, and I sat with my co founder who was leading engineering teams at other startups. Then he was also facing similar problems and we thought, hey, start something which we started working on. So yeah, that's how we got into building Signoz.

Brian: It's the age old problem of folks, you work at a large enterprise or a company, you see a problem and it's pretty consistent. Usually a lot of the bigger companies have the problem solved pretty well. You have some of the smartest engineers working on these problems and it's more of an afterthought to think of observability at a Microsoft.

When you look at up and coming startups like Signoz, an up and coming startup, I work at Open Sauce, Heavybit has a ton of dev tools startups, you always reach for the thing to solve the problem because you don't want to solve that yourself or pivot your entire company into solving observability just so you can ship for enterprises.

Honestly, it's like when I take a step back after talking to so many folks and founders and maintainers, and dev tools, it's like a common occurrence. I think this world needs Signoz to be able to reach for this, but also have the community and the open source angle as well. I'm curious, with Signoz today, what is the focus or what's the ideal use case for folks to reach for it?

Pranay: Yeah. So the key value prop of Signoz is that we have... Okay, so stepping back. The core problem we are trying to solve is observability, right? Which is if you run something in production, you have applications running in AWS, if things go wrong, how do you monitor it? Or how do you get alerts proactively, if you think something is going wrong?

For example, you have APIs which are applications, expose that the APIs are suddenly becoming slow, how do you monitor that? How do you monitor that, hey, my infrastructure is getting... my CPU levels in my EC2 instance is getting more than 80%, and how can we tell ourselves that, right? So the key use we're trying to solve is to help people who are running applications in production to get proactive alerts, much faster, and if some issues happen, to be able to solve it much easier.

Generally, this problem is called observability and it has three signals, which one is metrics, traces and logs. Metrics are just like types which tell you that, "Okay, this is the state of my CPU at this point, this is memory at this point, et cetera." Logs are logs, like you said, events and that gets captured. Then traces is it captures a request across the stack, and then tells you like, "Hey, this request is taking this much time in this service and hence you should look into this service."

Or, "This is taking this much time and hence we should look into this database." So the key use case that you're focusing on now is really tying these three signals together. There are open source tool currently, for example there is Prometheus which solves metrics, there's Elastic Search which solves for logs. But I think we are the first one that is trying to solve all these things together in an open source way.

We have all the metrics, traces and logs, and tie these things in closely and do it in an open source way. So that's the core focus. The second key thing which we're doing, there's a project called Open Telemetry which is a CNCF project and it basically makes the instrumentation layer which is how you send data to applications, they provide us tickets for that, right? And we are based on Open Telemetry natively, so we are working on top of this open source community that has tickets, and building the backend and visualization layer for that.

Brian: Cool. So I guess the question I have now is why open source this? You had alluded to this earlier, but Datadog is not open sourced, a ton of these other products are not open source. Usually the examples are the sales process starts in open source where someone uses a thing, but Signoz is open source, so have you seen any benefit on that? And what was the purpose of starting that way first?

Pranay: Yeah. So like I said earlier, why we started with open source was because we believe that these tools are used by developers and being open sourced, as developers ourselves we always start with an open source tool because, one, you can just get started much quicker. You can just start running on your laptop or running on your machine and then it's set up with us and then you also have a huge community generally around open source projects where you can go ask questions. It's much easier to get help.

When we were starting and really evaluating projects, that's where we were getting frustrated with Datadog and other closed source vendors, that, hey, you can start with the free trial but there's no community around it which can tell you how you can start using it. Or if you ask specific questions, how does that work? So that we find with pretty appealing, that you can see community around it and start learning that tool much faster.

You can ask questions that get answered pretty quickly. And what we've seen is that being open sourced just helps you get feedback from the community much faster so that's more from what a third person development is that even if people are not contributing code, many people are just kicking the tires, using the product and sharing feedback which I think is very valuable. Especially for early projects because you want to get feedback there. So that will be pretty helpful for us.

Open source also helps a lot, especially for our case, open source is very important because we use Open Telemetry for the tickets so being an open source backend tool which consumes and visualizes that data makes us the default tool which goes with it. So you have an open source SDK and then generally being an open source tool helps to get embedded in docs pretty easily, like the community adopts it much faster. So the whole piece around getting feedback from people and then getting adopted by the community has been very helpful for us.

Brian: Yeah. That makes a ton of sense too, as well, because if someone can test drive this as fast as possible, the best way to discover your customers is through word of mouth and having them be able to try it. So my question to you is how does this embed into their ecosystem and their deployment platforms? You had mentioned SDKs, are we installing some sort of metaware through our servers? What's the approach?

Pranay: So the way people use Signoz is that you need to send data, so Signoz is the backend data store and the visualization platform. Your applications and infrastructure generate data, for example if an API call comes, how much time it took to respond to it so suppose you are tracking API latency. The SDKs in your application code, for example your Python code, will capture that data and it needs to send to a backend data store or backend platform.

So that backend platform is Signoz. It listens to that data, stores it and also gives you a frontend or the visualization layer where you can play with it, create lots of signals. So the way you get admitted in the ecosystem is that, as I mentioned, Open Telemetry. They provide SDKs which, for example, Python developers can embed in their Python code, and then you need a backend which understands that and shows you graphs based on that data, et cetera.

That's where we get in. We give you these docs for Open Telemetry SDKs and, if you're using Prometheus exporters, how you can get your data into Signoz and then store that and visualize that.

Brian: This is fascinating because I don't spend a lot of time on the server side or the cloud infrastructure of products, but I'm aware of the space and I think what we're actually observing, because you've mentioned Open Telemetry and Prometheus a couple times, even just recently how you install it. We have standards on how folks have been building software and building cloud and building infrastructure.

Correct me if I'm wrong, but it sounds like you were able to adhere to some standards that folks are already using, and the infrastructures that are building some of the best cloud providers that we all love and use. But they all can use Signoz because if you're using Open Telemetry or you're familiar or if you leveraged Prometheus in your set up, I think we could actually think of folks like Kubernetes, obviously giving us a standard for orchestration.

But I think the CNCF, Open Telemetry and Prometheus are both CNCF projects, so now we have a standard that came through open source that now is building our cloud infrastructure and now we have tools like Signoz that can play very nicely to that. I assume you've been involved in this space and seen all the come up of these projects, what's your take on foundations like the CNCF? Did they set you up for success, to knock your product out the park and have pretty much everyone adopt it?

Pranay: Yeah. So Open Telemetry as a project is very interesting. Until before coming up with Open Telemetry, what has been happening is that each vendor like Datadog would have their own SDK and people would use that SDK, and then use that SDK to send data to Datadog. Neuralink will have their own SDK and they will use that SDK to send data to Neuraldata. A few things are happening in this.

One, everybody is creating their own SDK so you are investing that R&D multiple times, essentially doing the same thing. Second, once the company starts using one SDK, it's very difficult to move out. So if somebody gets locked into Datadog's SDK, it's very difficult for them to move onto Neuralink SDK and that acts as a multiplier from a customer perspective.

It's become a very monopolistic environment, that essentially you're locked in there, you're locked into a particular vendor. What Open Telemetry is doing, because it is standardizing that data format, it's saying that, "Hey, this is a data format in which you export, and if any vendor or any open source project supports that data format, then that should work."

And that has been very powerful, so as Open Telemetry is getting more and more mature, and we took on Open Telemetry in 2021 when it was still pretty early, and now it is seeing much more adoption. Lots more bigger and bigger companies are using it. So if somebody invests instruments with Open Telemetry, they can choose Signoz or any other vendor which supports Open Telemetry's format, and that leads to much better innovation in this layer, which is the backend, data store and visualization layer.

It's rather than just saying, "Hey, you need to have an SDK also to understand that data." That's a much bigger effort. So what essentially CNCF, especially around Open Telemetry, has done is commoditized the SDK layer, commoditized the instrumentation layer and then opened the platform up for innovation by different layers on the visualization and data store front which I think is pretty interesting. We are seeing today Open Telemetry is the second most active project in the CNCF community after Kubernetes.

So it's really lots of big companies are picking it up and we are seeing lots of developments on that, and we think that it has potential to unlock a lot more innovation going forward. So as this data stream, the data part gets standardized, more and more tools can get built on it. There has been startups coming up which are building test cases automatically based on this telemetry data, which you can't do if you don't have standardized data formats.

Brian: Yeah. This is true, and I think this podcast being Jamstack Radio, it's always find the thing to help you solve the problem so you can solve the harder problem. I think with this, I don't need to get as in detailed of learning personally observability for myself. I know the value of it, but if I build a service to ingest data and then send data to other places or web hook servers for my software...

Actually, currently we just shipped a CLI built on Cobra so it's a Go CLI, it's got services in there, we will eventually have some sort of telemetry in there as well. A challenge with CLI tools is telemetry, you've got to start early because if you don't, people are going to question when you add it later. But there's some clear standards in how you can do that now and you can do it anonymized and not feel like you're sourcing a bunch of personal information, which I guess with Signoz, is the data that you stream...

It's all about observability into your infrastructure. Actually, let me build you a scenario and you pitch me on how I can leverage Signoz in this. So Langchain is a thing where you can actually do prompts, collect prompts and do some pretty cool things that product out of that. Sourcegraph has a tool called Cody which is a very clear tool where you can take a repo, take embeddings and then provide questions to the prompt, the code you service, questions about your repo.

We're building something similar. It's an intern project. We're building the same thing basically, using Langchain. It was more of can we build it is the question that we had, and so we're building it and we're about to stand this up on Azure. But now we need to figure out, because when you do the embeddings for a GitHub repo, some repos will take about 45 minutes to index and some repos will take 36 seconds.

So we figured out using Rust on how to make it more of a couple minutes. But now we want to be able to identify, okay, what projects are taking 60 minutes and what projects are taking 30 seconds? And how can we index and optimize and provide a better interaction for the user. The answer I know is observability. We just put something in there to find out where the pain points are, but my pitch to you is pitch me using Signoz in this situation.

Pranay: Yeah. So this is not the typical use case to serve. Our typical use case is around application monitoring. As of now we are more focused on backend services and frontend services, typical production services. It's essentially like if it's an application, basically what you can do is start traces there which tracks how much time a particular request is taking.

If you track that and then if you send that data to Signoz, you should be able to see if in your code, let's say, one process is taking this much time, the second process took this much time, the third process took this much time. And so if you start sending that data, that tracing data, then you can see in Signoz that this much took this much time, this much took this much time, and then you can see what other type of things or what are the queries which took how much time.

Then you should be able to get that. It's not a typical use case which you support, but I guess if you used tracing in this way... We are seeing users who use tracing in very interesting ways, like they would use tracing to test their dev environment and they were testing a compiler, the performance of a compiler to leave tracing. So I'm guessing that's something which is similar to what you were doing.

Each compiler extension would span out a set of places, and then they would monitor why that took time and then they would add events in those traces. This took time, this trace got fired and this event was fired, and then monitor that more closely. So I'm guessing you can do that with tracing and even carrying events to that.

Brian: Okay. Yeah, I think that sounds like a good enough approach because we have a frontend interaction where you send it to repo and then we send it to our server. The server does the work and then you can ask questions to it, and as you ask questions we can trace that experience. But it's that initial go embed, go find the embedding so that project hits the server.

What we've been doing, so we haven't implemented, we've just been testing the past couple weekends. So we've come to the solution of a serverless function to tell them when it's done, and that's as best as we could do. It takes 25 minutes, you get a little notification, "Hey, it's ready for your prompts." Because that's the other challenge, is you've got the captive audience.

You know it's going to take a while for them to go index this repo, but they've already walked away, started something else, opened a new tab, and we've got to let them know, "Hey, thanks for trying us. I know it took a long time. We're ready for you to start doing some work with us." It's an interesting situation for us because we don't have a lot of products and services that do this heavy lifting.

It's a lot of we're using the GitHub API, we're using existing services to power the product. But this is where we have to do some computational heavy lifting and this is one of the first places we'll try and figure out some hand waving, onboarding experience. But also be able to track and trace this to be able to say, "Okay, it's getting better. It's not best."

Which I'll save my pick on how we're solving this for the pick section. But I do want to say, Pranay, I've enjoyed the conversation. If folks want to get started, I think you had alluded to this again, but for reiterating, you've listened to this whole conversation. I want to try out Signoz, I want to maybe read some documentation, where would they go?

Pranay: Yeah, sure. I think the first place would be just visit our GitHub repo. We have a very well written read me, so just go to GitHub.com/SigNoz/ S-I-G-N-O-Z. We have a pretty active Slack community so just go to Signoz.io/Slack if you have any questions on does this use case make sense for Signoz, or you are trying to learn and facing some issues. Just drop a comment there, we have today around 2,600 members of our Slack community so somebody should be able to help you out. Yeah, go to Signoz.io/docs if you just want to see how to get started.

Brian: Sounds good. Thanks for that conversation. I want to transition us to picks, these are things that we're jamming on, things that are keeping us going throughout the day. Could be music, food, could be whatever your tips for working remotely. Yeah, all of it's valid and if you don't mind, I'll go first. I was going to mention, we have this Octernship Program. If you haven't checked out the Octern program, folks who are listening, I had a great experience.

I don't think I actually shared this as a pick, but I think it's education.com/octerns and every, I guess spring at this point, this is the first year they've done it, they get a bunch of college students to submit to do internships for open source projects. They reach out to maintainers and companies that have open source and you do a proper interview process. We had about 500 applicants for our internship program and the way I approach this is I had every single intern build the same project.

We all had the same structure, and then everyone who built a project, the goal was to open up a PR with the project working, and then the ask was basically describe, just like a normal code review, describe what you built. Because I had the same contacts to go to 500 other students, technically about 300 did it. 500 apply, but only 300 did the project. I did get to see communication style, ability to just solve the coding problem, code quality.

It was just a great experience to hang out with a bunch of students and get to know them and share our product with them. We chose two interns that have joined us and they've been working on specifically AI features. The reason for that is everyone has got to do AI, so I figured let's just go about some features that I would love to build but we just don't have the bandwidth to go shift the team to go work on some of this stuff.

It's been a perfect interaction because Open AI has so many great interactions. I had mentioned Langchain is the thing that we started using just recently. Then one of the interns introduced Rust, one of our first Rust projects into our ecosystem. What we saw was the first project was built in TypeScript which is like the 60 minutes to go do the embedding, and then they introduced Rust and now it's down to like 45 second to index the same repos.

Pranay: Interesting. Do you use the WASM thing? WASM? Or do you just convert?

Brian: No, it's just straight Rust. We haven't done the WASM stuff or anything like that, or Web Assembly and converting. Yeah, everything, it's a server application, we're still testing, we don't have a frontend for it yet. But what's been pretty amazing is just seeing the benefit of Rust and doing system design and system development.

I just haven't been as close and personal with Rust until very recently. So the Octernship program, the Rust language, definitely it's worth a try if you're going to solve some of these computational heavy situations. Highly recommend it. We didn't try Python, Python probably would've been pretty approachable as well. But yeah, they went straight to Rust and it's been interesting.

Pranay: I think Rust is really much more performance oriented. People have been reporting it to have much better performance. I think my pick would be I recently saw Figma has recently introduced this developer mode and I spend a lot of time designing products, et cetera. So that was pretty amazing, if you go to that developer mode in Figma you can see the CSS, et cetera, made from the design statically and you can just start coding.

Brian: Yes. I did see an email come through about this.

Pranay: Yeah, that was pretty cool. They will give you the CSS files, et cetera, directly so no need of changing things and figuring out your old CSS. I think this will be loved by frontend developers that after going from designing to CSS, you don't need to do much. You can just copy it. I'm just wondering, with AI, does the role going from design to a website directly become much easier? Because now if you can do this from design, you can go to CSS in Figma, can you just also create React code from those inputs as prompts?

Brian: Yeah. I think it's very cool what Figma has been doing with developers. We've had all these interactions and plugins to convert code from Figma, and now they've just embraced the experience where for the longest time it was like... I worked at GitHub so we always tried to get Figma integrations working on GitHub, and it does work today. But now you see a trend where now you're trying to get developers inside of Figma.

Which my thing has always been, "Ah, give me a screenshot with all the pixel stuff like that and a PDF or something, and I'll work from there." I just never found design tools intuitive for me as a developer. I'm getting better. I do like Figma today. But when I was full time engineering, yeah, more of a struggle to try to get me in there. But yeah, very intrigued by this dev mode. I definitely want to try it out.

Pranay: I think they also added direct integrations with GitHub repos so you can download those CSS files and add it directly to the repo, rather than... So you can sync it directly, rather than manually copy pasting that so I think they are going in that direction.

Brian: Cool. Well, Pranay, thanks so much for the conversation. I'm definitely going to check out SigNoz. I don't know if I have the use case for it, but I'm also not building our systems and our servers.

Pranay: You should try that tracing use case. I think the use case you mentioned should be able to be solved through traces. I would point to one of your smart interns to do this. I think they should be able to figure this out because we have seen some companies try SigNoz for performance optimization which is a problem we are trying to solve with that.

Brian: Cool. Well, I'm looking forward to diving in and, listeners, keep spreading the Jam.