1. Library
  2. Podcasts
  3. Open Source Ready
  4. Ep. #20, Exploring AI Memory with Vasilije Markovic of Cognee
Open Source Ready
50 MIN

Ep. #20, Exploring AI Memory with Vasilije Markovic of Cognee

light mode
about the episode

In episode 20 of Open Source Ready, Brian and John sit down with data engineering and cognitive science expert Vasilije Markovic to explore AI memory and how we can build more intelligent systems. From the challenges of "context rot" to the practical applications of AI memory in construction, education, and finance, this conversation covers how to give your AI the context it truly needs.

Vasilije Markovic is an expert in big data and data engineering with a decade of experience in Berlin. After working as a consultant, he pursued studies in cognitive sciences and clinical psychology, which led him to found an open-source project, Cognee, a Python SDK focused on building the next generation of AI memory. His team of 10 is dedicated to solving complex data challenges by combining LLMs with structured, multilayered memory systems.

transcript

Brian Douglas: Welcome to another installment of Open Source Ready.

John, how are you doing?

John McBride: I'm doing good. I am now on the East Coast.

Brian: Yeah, different time zone.

John: Different time zone. It's wild how much of a difference even just two hours can make.

But it's nice because I have a bunch of European colleagues so I can connect with them in the mornings.

Brian: Yeah, not bad. You don't get the benefit of having West Coast sports time.

So NBA, you'll have to stay up till like 11:00PM to watch West Coast games.

So I'm not sure if you'd still be a Denver Nuggets fan at this point.

John: Always.

Brian: Always?

John: Go Jokic! I think they're going to rebuild. I can tell.

Brian: Yeah, "Go Jokic" is like pretty clear 'cause he's the only one that's going to be going this season.

John: Yeah.

Brian: Everyone else will probably end up on new teams. Cool.

So yeah, we didn't come here to talk about basketball and the NBA. We actually came to talk about AI and AI memory.

So we have a special guest, which is Vasilije Markovic. And you want to say hello?

Vasilije Markovic: Hey, guys. Nice to talk to you. I also know Jokic from my home country.

So everyone's very connected, I would say. So, you know, happy to be here.

Brian: Excellent. Are you also a horse racing fan as well?

Vasilije: No, I have missed the horse racing trend, but I'm happy to jump on one if you guys give me a good intro.

Brian: Well, yeah, maybe if OpenAI invests in your startup, and then you can buy a horse.

Vasilije: It's a dream of my ancestors. So maybe, you know, who knows?

Brian: Excellent.

So yeah, you want to give us an intro?

Who are you? What do you do?

Vasilije: Sure. So I'm Vasilije, originally from the Balkans.

I've been in Berlin for around 10 years. I mostly worked in big data, data engineering batch streaming, everything in between managing these data warehouses for larger companies here.

After around 10 years of that, I was working as a consultant building projects for companies that haven't solved their data issues in time, putting out fires and you helping everyone in need.

Not very, you know, Robin Hood of me. I got paid for that.

But effectively, as I was finishing these consulting gigs, I went back to uni, studying cognitive sciences, clinical psychology.

And as I was back to the studies, LLMs appeared or got to prominence, and I effectively started building and experimenting, which led me to build a Python SDK for AI memory.

We are an open source project, and definitely we've been building in this space for around two and a half years. We are now a team of 10.

So yeah, happy to tell you more, happy to tell you more about the AI memory, about the technology, about myself. Yeah.

Brian: Honestly, like let's go into AI memory, 'cause you're the first conversation I had around this concept.

And this was probably like maybe six months ago, we had met originally.

And then I started immediately seeing this AI memory becoming more and more of a thing within LLMs and stuff like that.

So you want to give us like a quick little rundown?

Vasilije: Sure.

So what we usually do when we want to talk to ChatGPT and give it our data is we copy paste emails or ask it to actually infer something on the topics it knows about from our own data. And to do that effectively, we need some type of memory store.

People started using vector stores, building what's called a RAG, retrieval augmented generation. So they turn this data into mathematical representation story there, search for the most similar items.

So apples and pears are going to be together, MacBooks and telephones, and then you get something, hopefully the right thing.

That didn't really work. So what we effectively ended up with is now building this next generation tooling to enable us to put in this LLM context only the data we need at scale to retrieve and organize that data similar to how human brain works, and that's what we call AI memory.

It's also got a new name now. Dexter, I think, named it context engineering.

There is competing names, there is competing subreddits, AI memory, context engineering.

All of these terms are pretty much the same thing.

How do we put stuff into the context that matters and how do we do that in a self-improving way so we don't really have to rely on outdated information.

Brian: Yeah, and I'm excited to...

John, I worked with you for a few years at OpenSauced and we had built a RAG pipeline to chat with GitHub repos.

And I'm curious your thought because you built that. We were building stuff live as people were figuring things out.

We were on the r/LocalLLaMA Reddit and were figuring out what papers were dropping and what questions were being answered.

So I'm curious your thought. Have you thought about the AI memory since then? Or maybe you should block out all of OpenSauced.

John: I haven't blocked it out, no.

I have wondered, it's curious how early we seem to be in that space, at least in the sense that there was a lot still being figured out with RAG.

And then I remember getting more into agentic flows and thinking about how can you optimize for some of the context that you're sending these LLMs?

'Cause it seemed that as context windows keep getting bigger and bigger, the idea with agentic flows has just been like, just chuck it all in the context window.

Now there's even more research coming out about context rot and how doing that seems to degrade performance.

So I haven't looked at AI memory or context engineering specifically, but I am curious if you could give us the quick explainer on how that differs from RAG.

And maybe some of the slightly technical details and the difference between vectorizing a bunch of content and then retrieving it, and maybe how AI memory differs from that.

Vasilije: Sure, so let's say we have some text and we want to create a RAG.

So we'll split this text into chunks of certain, let's say, words, chapters, whatever we define chunks to be.

And we'll store them and embed them using open AI embeddings, let's say, and put them in a vector store.

When we search for it, we're going to search for the most similar term to the search term we are using, let's say, an apple or a horse to come back to the original topic.

And then I'm going to get all the information about, let's say, a book about horses and horse breeding indexes.

So let's say I get that information. I might be searching for a particular horse in that book, let's say, a horse called Moses.

And then that horse called Moses, when I'm looking for him, I'm going to get all the information about all the horses that are mentioned in the book, horse breeding in general, and I won't be able to really isolate and find the information I need.

So this is pretty much the problem of RAG. And let's now kind of see how we would have solved it with this, let's say, memory approach is that we would try to create and extract entities and understand who are the main characters mentioned in the book, how they relate to each other.

You know, maybe there is the horse Moses, it's a friend with another horse called Peter, I don't know.

And then these horses, effectively, I don't know much about horse names, so sorry about that.

John: It's perfect, yeah.

Vasilije: Yeah, so horse Peter and horse Moses can have a relationship, being friends, and they can belong to a stable, and the stable can belong to a farmer.

And, we can create this type of a narrative that effectively lets us have some type of a taxonomy, ontology, how it's called.

This is a rule set or rule representation that whenever we go back, we always will know the context of where is this information about the horse coming.

And that's the context of how AI memory works.

John: Yeah, this was a problem that we ran into a lot with open source because the taxonomy of just repositories those live in orgs.

Those have code or like a type of language that they've been implemented in.

Those have users or maintainers or more in-depth maintainer or governance structures, and that was really hard to represent in RAG.

So I could definitely see the benefit of trying to represent something like that a little more with like some taxonomy.

Yeah, how do you go about that? Is that giving semantic meaning in those individual pieces and then like creating a graph or?

Vasilije: Yeah, good question.

So when I started with this around two years ago, I based my first prototype on this concept in psycholinguistics called multilayer models of language.

So this concept is effectively talking about that we as humans, when we store words or phrases or sentences, we store them in different layers of the memory.

So individual words will be stored in one layer, then phrases in another layer above, and then whole sentences above.

And then when you're retrieving, you might just trigger the word or a phrase or a sentence and things just pop up in your memory.

And all of these layers are cross-connected and kind of storing the information retrieved across multiple layers, triggering at the same time.

So what I tried to build is this multilayer memory that effectively stores the data in any way we want to analyze it.

So maybe, from one scientific article, I might extract the abstract, and then that will be one layer.

And then let's say authors will be another layer. And then the third layer will be the reviews of that paper by other people.

And then I will cross-connect, if a certain person referred to a part of the paper, to the abstract from the reviews, I will have a connection there, and it goes on and goes on like that.

So these connections and this idea came also from Microsoft around the same time we were starting with that.

We were very early with that. It didn't have a name.

Then it got the name GraphRAG, and then it got the name AI Memory because GraphRAG is just one implementation of the concept.

And then the whole broader field is now AI Memory Context Engineering.

So how we did that is we asked the LLM to extract the entities from text.

So we ask it, hey, tell me who are the people in this text, how they relate to each other.

That works well enough, but not really that good because LLMs just regurgitate the context they got from someone else or what they know they know.

What might happen is that I have some business knowledge that, let's say my horse Moses went on a race 22 years ago and won, and I want to feed that.

So what we allow people to do is to define also their own ontologies and taxonomies that are based on all RDFs.

So this is the semantic web concept that existed around 10, 15 years ago and was very popular.

And then we merged those with this LLM-generated context.

So LLM-generated context allows us the scale, but these RDF ontologies, they ground it into the real information, real-world data, and let us actually know what is the facts and we know what is the LLM-generated facts, which are a bit less reliable.

Brian: Yeah, I'm finally impressed that you're able to use RAG to remember all these horse names, 'cause I'm like Moses and who's on first and who's Peter.

But what I'm getting at is I'm curious 'cause you're building a product around this, and I think it's like valuable context for LLMs to have this context specifically at time where you're trying to ask questions and trying to get all this data.

And I've been in multiple positions where I'm just like that oracle that kind of knows all the stuff. I've seen things, I've talked to folks, and folks will come to me.

And I see that super valuable where I don't need the staff engineer to come in and unblock me on a thing that I broke.

I can chat with my internal assistant to then pull up the docs that can make that happen.

So that's all valuable. But then we've got, as of today, of this recording, GPT-5 is launched.

And I think last Sunday, the reasoning model for GeminI launch, like we're getting longer and longer context windows.

And I think I was actually watching Claude Code's, one of their announcements, they have these monthly sit downs with Claude and the community, and they're recommending don't add context.

Don't add all that stuff on demand. Rather, let Claude use as minimal as possible, for Claude Code specifically , so it can go reason and find it itself.

So there's competing conversations happening where people want to build that context engineering experience and AI memory.

But then you've got these model builders who are like, "You know what, foundational model dropped today. Don't worry about making it smarter. We're smart already."

Vasilije: I've seen those discussions come up, and they've been there since this thing started and LangChain kind of showed up.

And everything was supposed to be solved with LangChain and all types of tools like this.

And I don't think anything has been solved. So the average company has a data warehouse of a size of 250 terabytes. Good luck giving that to LangChain to ingest and reason on that.

So the problem is much bigger than just feed a database to a model, it's about these human things.

Like if I name my revenue, Revenue2_XYZ, and that is actual revenue in the table in SQL in someone's garage somewhere. No one even knows where to find it.

So I've seen a lot of data modeling issues through my career, and it was never about the technical challenge of, "Can we actually add this data to a system?" It was more about actually modeling and understanding what is what and what belongs where, and that's what the difficult problem is.

And I think that one we are trying effectively to solve, and it's not even close to being solved now because all of these reasoning models, they're not really reasoning.

They're just kind of trying to attack the problem from different approaches.

And even with reinforcement learning approaches, I don't feel like it's still getting there.

John: Yeah.

This makes me think of something kind of random, but I had a crazy experience years and years ago when I was learning this technology called Bosh, which is B-O-S-H.

And it's an underlying technology that works for Cloud Foundry and all this VM technology and stuff.

But what Bosh stands for is the "Bosh Outer Shell." So it's one of those weird recursive jokes inside of the name of the actual thing itself.

And I remember thinking, oh, that's kind of funny. Like, ha, ha, ha, cool programmer joke.

And then the person that I was working with that day, they said, it's cool, but it's really hard for people who aren't English native speakers to understand that weird recursive joke.

And maybe they understand recursion. Maybe they understand SH is shell.

But it sort of creates this weird semantic disconnect that if you hadn't been working in Cloud Foundry for long enough, you just wouldn't get.

And it almost instantly makes me think of just how some of these semantic things that we live with every day, especially in technology, in data, in these people orgs of engineers and people trying to do stuff and get stuff done and do business.

It's so much less like, can we do it? And so much more like you said, where it's the people problems, really.

It's like the weird semantic things that maybe we don't understand.

What this crazy file is called or this insider joke from five or 10 years ago that now people who onboard to the company need to understand, right?

Can you model that inside of AI systems? Are we getting to a place with AI memory or context engineering where we can start to do that?

Vasilije: Yes. So what we have is, let's say, our approach is currently looking at experimenting heavily with that.

And I would say we are not fully there yet, but there is some promising signs.

So let's talk about how we treat memory. So in which ways.

So let's say, a relational database. We know that doesn't change. We know what's inside.

So we can ingest it into the memory as set of schemas and maybe examples of these fields and put that in one layer of memory.

And we can always access it. We always know what it is. We don't really need AI for that.

Then we have some PDFs that mention these files or like Notion documents.

And we put that in in a second layer and we cross-connect that to the relational database information and now we can search and understand what was the revenue mentioned and what was the yearly, I don't know, churn rate or things like that that people usually analyze.

And then what we have on top of that is, and what we call is the memory pipeline.

So it's a pipeline that just runs and you can treat it as a post-processing pipeline effectively that goes and applies this type of reasoning and connects the dots.

So if Moses the horse mentioned revenue from last year or something like that, and he mentioned it on three meetings, let's assume Moses goes to meetings now, then effectively, we would have like some type of a connection being created and more, let's say, semantics being added to the system.

So when we were searching for it, we got the information.

Also, let's say Moses was unhappy with the performance of the marketing department and wanted to change the strategy.

So he gave the feedback to the system and the feedback got attached to their KPIs and their strategy plan.

And the next time there was some interaction with the system that also came in and the proposal for, let's say, the next steps was different compared to before.

So we have this, and we've been building these types of, let's say, algorithm by algorithm memory approaches, and we're trying to kind of connect these dots and build on top of that.

But we think that a lot of that is still going to be a difficult problem to fully solve.

And that's why we're trying to kind of narrow down the problems we're solving because solving for cultural sensitivity is a whole another startup effectively and I've seen some that try to do that.

For us, it's more about, let's solate the exact thing we can cross-connect, connect the domains, get the meaning right, and then move that forward.

John: Yeah, makes a ton of sense.

Brian: John had mentioned something in passing, which is context rot.

So there's cases where opportunity for us to reestablish the context for the next conversation and pick and choose things on demand.

And I've been messing around with Cognee, your tool, we didn't spend a lot of time talking about Cognee.

But messing around with your tool, and what's really cool about this is, and I wrote a blog post around this, is understanding the idea of rules.

And if you're caching or storing your prompts in context or in memory, that gives you an opportunity to then generate, hey, read that back to me.

Like, we have all these, like, Granola, I think, is probably the one I first started using.

But now Gemini, records your meetings, like all these meeting notes.

And then we have these Notion docs that you just never know what you need when you need it.

And I guess what I'm getting at is to be able to basically say, hey, read that back to me or tell me something important that I should do, that becomes super valuable.

So I'm curious, what are some use cases for folks using AI memory in production?

Enterprises, small companies, all the above, would love to ground the conversation to something actionable.

Vasilije: Sure.

So let's talk about developers because we're talking about developers. So that's probably the most interesting to them.

We can effectively, via our MCP, we can load all types of information from, let's say, CICD systems, index all repositories, index, you know, GitHub issues, whatever you need to build, you can just build it as a data pipeline and put it all in different memory domains and then cross-connect those.

So when you had, like, a GitHub action failing somewhere, and that was caused by a certain function in the code, we would be able to connect that, navigate you right away to what is the thing that needs to be fixed versus you telling cursor, hey, copying it over, saying, like, this failed, like, why did this fail?

It doesn't know, it doesn't have the full logs, things like that.

So effectively, one thing is helping developers improve the workflows and ingesting the data that they usually deal with across multiple repositories, multiple source systems, and everything in between that we can do.

Also, even tool calls, we just released last Thursday, POC or experimental pipeline on our community repo that allows you to cache tool calls.

So let's say you have tool calls many times, like you don't need to really recalculate or spend that much money, we're going to store that in memory, we're going to retrieve it.

So we call it the muscle memory. And then on top of that, we have all this, let's say, memory of the repo and information about it, then we have also the reasoning.

So it becomes powerful. In terms of the other use cases we've seen, we work with a company here in Germany in the education space that was trying to connect all the learners on their platform and understand how these people relate to each other by just kind of creating a mapping in space of people visiting certain IP addresses.

That was an interesting one because then we could like profile these users, map them, connect them, and then effectively transition the knowledge from the better students in a class to the worse off because we now know they are sitting in the same class.

There we'll publish a short case study. It's going to be live, I think, by the time people listen to this.

But on top of that, we had all kinds of various cases like construction industry, a company in Singapore that's trying to connect the supplied material to the construction yard and to what project managers are installing, and they're calling the same thing in different names.

So we can connect that and then that same thing now is finally tracked from the beginning to the end, and you can imagine on the supply chain issues how that can influence, let's say, the cost.

So you can order just in time versus being late, delaying the construction or ordering something that's going to sit for six months just there on the construction yard.

So that's another one. And we've seen things like banks, for example, they need to give to their clients exact information of what is the interest rate that they're going to charge for a particular type of a card.

But with vector store, you might get all kinds of interest rates from all kinds of cards you can't really guarantee it.

So we can create a model and an ontology of all cards of all payment products in the system, always be able to navigate to the one they need.

And then all these shred bots that they need to implement can actually be used in production.

So a lot deals with accuracy, but a lot also deals with really finding that needle in the haystack and also being able to showcase that that needle, how it came to that result versus the pure LLM.

John: Yeah, one thing that's really interesting about that, is in a past life, it really wasn't that long ago, but I was working with a bunch of data teams on top of Snowflake and these data lakes that we were trying to implement these semantic layers where those data teams were going to define those really pieces of what the LLM could use.

And they were really having to go and define those in the tables. And then they had built all these medallion layers on top of it and stuff.

And a lot of the feedback we had gotten was like, oh, this is just a lot of work to then enable these chatbots to work.

Where's the pieces of automation that make this a little more seamless, a little more possible versus data teams having to go in there and define each of those pieces of the credit card stuff for the LLM to slurp up?

Vasilije: Yeah, good question.

So what we've done now is effectively built these basic pipelines that can do most of the heavy lifting for typical document use cases.

We have now a time graph pipeline which allows you to also reason on time and find what happened yesterday, what are all the events that happened since 1980, if it's mentioned somewhere, which is not a trivial problem to solve.

We also have other pipelines that allow you to look at the problem from different perspectives.

And that all comes prepackaged, so you can just start using that.

And then what we let people do is just define a set of Python functions that are tied into some type of a framework or let's say a data pipeline, and they can add their own custom logic.

Maybe you want to filter the IPs. Maybe you want to change the phone numbers.

Maybe you want to add certain pieces of information or data.

So our approach is like modular, build it yourself.

Here is a couple of things that we know they work, but you can just kind of continue and go along and use this and add more things of your own.

Where that is a problem is, let's say, on this purely vertical side where you really have something very specific to your company, you still will need to kind of model the data a bit and play with it, but that is effectively just a pedantic structure you need to put into the system and that everything is going to work after that.

So we made it as easy as we could, but there is still some work, of course, there to do, but we also think that human in the loop is needed to a degree in the systems a bit for now, and we are automating some of this, let's say, rule generation and this ontology generation, and we also have a system in the background that goes and tunes all those things like chunk sizes and all of those boring things that you don't want to spend months optimizing for.

So half optimized, half automated, and half is, let's say, still something that you need to kind of keep an eye on.

John: Yeah, it makes a ton of sense. I think the human in the loop thing, especially for now with current models and stuff, it's still definitely critical, but even just a little bit of getting some of that semantic meaning in there, yeah, it makes a ton of sense.

Vasilije: Yeah.

I feel that there is a problem with unrealistic expectations, especially when I go to San Francisco and I'm there on these meetups, it becomes this self-fulfilling prophecy that we need to do certain things, but the problem is these agents, once they start going off rails, they're compounding the error, and that error compounding needs to get stopped somehow.

And I haven't seen a better way to stop it than to either ground it in reality with good data and pretty much keep an eye on it, but on the other hand, always have someone who can flip the switch when needed, because simply, they might just wipe the system to solve the problem, because it is solving the problem in a way.

Brian: Yeah, there's a good article from GitHub, actually. The GitHub Next team put together an article, and I think this might have been, was this a doc?

Actually, I guess a few of the team put it together, talking about continuous AI.

And what's cool about this, is what we were just talking about there's a name to it, where you kind of measure what's the time between humans have to be in the loop, and there are certain things that you can really sort of set and forget, and then you sort of get pings when things happen.

So, I say this because this is actually going to be my read, which was going to be a tweet about this.

I'll share the tweet later, but I've been really going ham on how much can I get done without jumping into the editor.

And I take like a 25-minute bus ride from my house. I got a five-minute walk, and then take a bus ride to the office, and I work like three days a week in San Francisco.

And what I've been doing is I pay for Cloud AI, the desktop, and I just ask questions.

I'll grab a GitHub issue where I've half thought, like, hey, here's the thing I want to work on.

Claude, use your reasoning, your deep opus reasoning model to break this into scope of phases and how you would build it.

And so, I've been basically, since Monday, I've been working on SEO and LLM infrastructure improvements for my side project.

And it's broken into like these are things you should think about, things that I've actually never done for myself ever for any side projects, like set up a sitemap.

Like, why do I need that? Turns out it's really important. I'll never build that myself ever again because I can get Claude to actually give me a scoped plan to build it.

And then the other thing I've been doing is, I've been using Copilot, and now Gemini's got their own asynchronous agent that can go just take issues.

And because I don't have credits for all of them, I'll just have all of them take an issue.

And if I have enough in my memory, which is my docs and my test, they can just take it and open up PR.

And I do have to jump in and be like, "hey, this is bad." Or I got to clean up this thing and this code broke my entire database or whatever.

This is literally since Monday and we're recording on a Thursday, and I think I've crushed through like 32 tickets on building small, scopable things to ship to improve my site speed, caching service workers, et cetera.

Vasilije: Yeah, so just to maybe add a quick thing on that, I've seen, and I think I've talked to a lot of people...

There is like two types of companies, the ones that started before 2022 and the ones that started after 2022. And in terms of how you approach tech, I've seen huge marketing departments producing some copies, doing things in the way you were doing them 10 years ago, but now it just doesn't make any sense.

Or having all of these automated workflows, triggers, notifications, compounding, just the ability to move faster and adding more and more things.

And we do the same, like we have these n8n workflows and release notes to Discord, to here, to there, like giving updates to people, trigger the updates to the documentation based on the release notes and just draft information based on what's in the code base.

Integrate Cognee as a memory for all the team meetings where all of the transcripts are going into it and then when you need to ask something or what was discussed then, there's no suspicion or doubt and no one can say they forgot or adding Cognee on local devices so you can effectively just send all that to the cloud memory.

And then if a developer worked on an issue X, you can always ask like what was his flow and what was the question he asked and where did he get stuck because I'm seeing a buck here potentially and that is the valuable context.

You don't have to jump on a call with someone, I know you have their alter ego in a way of their interactions.

Of course hopefully they don't search for anything problematic, but I think Cursor is going to stop you anyhow with that.

I think we need to adopt these approaches you're mentioning faster and more and we shouldn't be stuck in before 2022 times.

John: I mean, speaking of the things that maybe would be more nefarious, something I think about a lot is like prompt injection attacks and ways that like I get pull down a bunch of changes and then I'm just like feeding that into an agent.

It would not be that hard to just get a bunch of bad instructions into something.

Is like a smarter memory model one way to solve for this where you could start to kind of have better semantic meaning around stuff that might be slightly more nefarious that's getting fed into AI agents?

Vasilije: Great question. I was actually thinking about that last night.

I was watching "Foundation," the TV show and I like Isaac Asimov how he talks about the three laws of robotics so the robots shall not harm humans and but do the action or inaction and all of that.

And I was thinking how do you actually implement like something, you know, they had these positronic brains in the sixties that was the component and that was like hardware solution that they implemented.

But how do you actually stop something like that from happening with one potential level injection that can change the nature of reality that, you know, this artificial memory represents?

At this point, I don't have a good answer. I can only tell you that we've been thinking a lot about like the injections and kind of sanitizing the inputs and storing the data in different, let's say namespace as you call it, schemas, whatever, just to kind of have ability to always know what's reliable, what's modifiable and they always trace back to where things are coming from.

So I feel we'll need effectively to have instant evaluations that will be happening on both the injection and to search so we can actually validate and guarantee these types of systems so we can trigger other mechanisms that can go and prune and deal with this context rot because it's not a thing you can avoid.

I feel it's like pen testing or something like that. It's something that's just going to go into different directions over time.

Brian: Cool, so speaking of different directions, I do want to be conscious of time, we want to wind down the conversation.

But I want to ask like for folks who are just now thinking about the idea of AI memory, like what are some like quick wins and things they should be thinking about in their projects, whether a greenfield or brownfield or whatnot.

Like how could folks start bringing this up internally as they're now being dictated to build out perhaps some pipelines or maybe they're looking to clean up some old projects and bring them into the forefront of like people will care about 'em soon.

Vasilije: Yeah, great question.

So I would say my first answer to people asking about AI memory, I'm like, do you really need it? Do you need a complex system that is going to be managed by a team of engineers and it's going to serve like an ingest terabytes of data? Most likely you don't.

So then just start with something small. Learn how vector stores work, build a basic rack pipeline, understand the basics and then as that's not working out for you, move towards context engineering and the AI memory field and see what's there out of tooling that you can use.

To get started quickly. I would say check Reddit. There is always a lot of good resources there.

We have this AI memory subreddit with a lot of, let's say tutorials and information.

There is a context engineering one, there is a lot of content.

We also on our side publish in notebooks and blogs and we try to really do our max to onboard people to these new topics.

But then as you are effectively kind of leveling up, you might need to have these questions on how do I manage users?

How do I isolate memory for different people in the organization? How do I move and secure this, can it run with my local models?

All of that we also cover and we have tutorials for those and I would suggest to check also r/ollama subreddits and these other communities that really focus on enabling an individual to do this themselves.

And then finally--

When you come to like a very complex domain modeling exercise, always doubt the people that are going to tell you it's going to be easy.

So that's something that, I would always advise if someone tells you like, no, no, this is not that big of a deal, multiply that by 10. Usually that is, where you should be and you know, this is still a relatively complex in a new field and there is no easy quick solutions for very specific vertical domain issues.

John: Yeah, plus one to that, looping it back to the open source rag ginormous data pipelines. It's no small undertaking any size of data pipeline like this.

brian: Excellent. Well I've got a question for you Vasilije, are you ready to read?

Vasilije: Sure.

Brian: Excellent.

So reads, these are basically our picks and I explained this offline, but for the listener we've got a couple of reads and actually I'll go first since I actually closed my show notes, but I already know what it is.

It's a tweet from Simon Willison who is very excited about the idea of asynchronous coding, who I guess we're peas in a pod. I'm also super excited and they kind of sort of spilled the beans and what my read was going to be prior.

But yeah, so Simon, he also wrote a blog post around continuous AI.

It's kind of nestled in a couple other of his long talking points.

And the world is crazy to think like a year ago we were this happy for auto complete and now we're now doing asynchronous coding where you don't need to actually watch it do a thing as long as you give it enough context, it could do the thing within rails or within boundaries.

And then what's beautiful about this is I've been playing with a tool called Jules, which is from Google and it's very similar to all the other ones, but it also, you chat with it, you give it an issue, it works through the issue, it gives you a plan, you implement the plan.

If it breaks the test suite or the build, you could go back to Jules, you give it more context or you could also add mention as well.

And it is 100% you interacting with a junior team member that you give it something, a task to go do.

And I'm curious, like I spent a lot of time like talking about open source and getting people to do good first issues and making contributions and I feel like there's a world where I don't know if the good first issues is like as necessary.

I think it's like good first prompt if there's something that you want to see how the project works or you want to build able an extension or you want to interact with it, it feels like that's a thing people are going to do open.

Like we have Hacktoberfest coming in like a month or so, like that's a thing people just going to do table stakes.

So I'm curious like your thought John, 'cause I know you've spent a lot of time maintaining large projects in the open.

Do you think the world is, like people are just going to be handing junior agents things to do and then we're all going to have to level up pretty soon?

John: I think yes.

I think some projects will continue sort of to resist for certain reasons one way or another, maybe licensing reasons, maybe there's GPL code in there that they don't want accidentally injected into their projects.

But I mean I think inevitably, yeah, like these AI agent coding tools are just a little too good to ignore.

I think though where the level up has to happen is really like as a product minded engineer, like probably less on the code side where I think some project maintainers or project people could like show up to a project and be like, I'm really, really, really good with go routines or you know, go concurrency or something and I can just show up to a project, not know anything about what it does or what it's like real use case is, but, oh, here's an issue around like Go concurrency, sure, I can just cut it.

I think though that with these agent coding tools, sure they can cut the code but if you're not really sure what it's doing and why and you're not able to like speak intelligently to the project maintainers and to the agents about like really what the end use case should be really, again, that product minded focus, I think again, yeah, that's just where it's got to be.

That's sort of where I've kind of been trying to put more focus in is like, there was this great thing that somebody said that, oh, 99% of my skills have gotten worse and I need to invest in that 1% or those things have gotten like so devalued that now that 1% that product mindedness.

And I think a lot of people have started calling this like taste, like taste going into the future is going to be super important.

That's my hot take. Yeah, I'm curious what you all think though.

Vasilije: Yeah, I think we are seeing the increase in these, let's say options now on the market to move everything forward in a way where your creativity comes to forefront.

And I feel that in cognitive psychology and clinical psychology, there is a lot of this research on like adolescence and where do people move in certain stages of life.

And the thing is that they did analysis around people from 12 to 24 to see how they choose their life path.

And I think what they found is 33% of people are actually creative and they go and try to define their own new thing, 33% take the stuff from their parents.

So if their father was a bank teller, they'll be a bank teller or a baker or whatever. And then the rest, they have no actual idea what they need to do.

They just kind of follow whatever people tell them to do.

You know, they just pick whatever is the most opportunistic thing at that moment.

And what I'm afraid of and what is also going to be exciting is that if you never really took it on upon yourself to be creative, but you just had to kind of perform in a certain societal duty or role, how are you going to actually find yourself now when you know the tools and the output of your work is kind of meaningless in the end because a machine can do it.

And I think that's going to be a very interesting thing for the future, but I'm getting now into a bit broader topic, but for me that's something I kind of wonder about.

brian: Yeah, I was going to say like, so I've been doing a lot more engineering work than I've done previously in this current role.

So I sit in the engineering standups and we have these longer conversations after stand up about structure and like linting and testing and best practices, and like it hit me where all these questions that like at me when I was a junior engineer, like you'd bring up the stuff because I read a blog post and this is the way to do it.

And then you read another blog post, like this is the way to do it.

And then you realize everyone's going to write a blog post about whatever best practices and like approach and not everything fits for every situation, but at the end of the day it's like you don't have to worry about junior engineers bringing a random blog post and trying to implement or...

There was like library driven development where I did a lot of front end and angular and then React came out and then it was very clear if you did React, you have your promised jobs in San Francisco and like maybe that's not true today, but like, you don't need to chase the dragon out the next greatest thing.

You just sort of let the machine make the decisions and like whether you should be using this library or net library, you could scope out things in isolation and staging.

I have like all the confidence in the world to go set up side projects inside of the main project.

Because I know the amount of time it takes me to go spin it up and like a quick little PR is minutes instead of like the hours that it would've took me to like, oh let me go like read the docs here and do all that.

So yeah, it's like, I think the barrier of entry is higher, but the folks who don't mind deep diving and having the context and reading the books honestly...

Actually, I've got this book right here "Designing Data-Intensive Applications."

Vasilije: Oh, I read that. That's a great one.

Brian: Yeah, and I think this is the first chapter I'm like, boom, I get it already.

Like I know what to prompt, I'm good.

And I'm like a couple chapters in, but now I'm like, oh man, what if I could like read this faster and like give me the cliff notes of like?

Okay, what part should I look like? Obviously there's a table of contents in there, but now I'm like, what if I can just chat with the book at the time that I need to like open up the issue and like page here or chapter here.

I dunno, I feel like if you change your perspective of like approaching hard problems, I approach it where I'm learning how to learn always.

So if I don't know how to do a thing and I'm prompting, I'm also prompting it to also ask me questions back.

So then I can also level up alongside of this sort of this memory with my agent.

So I know it was long-winded, but yeah, super excited about in the outlook of this.

Vasilije: Yeah, maybe just one, last note on this one is someone compared it, and this is not my thought, but to photography and what it did to art in the beginning of the 20th century.

So you had all these academic painters that their whole job would be to draw a faithful representation of some rich man from New York, and his family and like some oil baron or something, and then photography came and you know, they all lost their jobs.

But what came out of that was abstract art because now we needed to get creative, we got like cubism, we got all of these things that no one thought about because there wasn't no space in this Paris Grand exhibitions in 1879 to show something new and they were all very stuck to their ways because that was profitable.

So let's see. It's going to be fun.

Brian: So you're saying if I want some abstract code, I should go to Fiverr now 'cause that's where they're going to go.

Vasilije: I think you should sell it as an NFT and NFTs might come back. Who knows.

Brian: You know, the ceiling has been risen, so like I'm all for it. NFTs, if they make sense, let's do it.

Vasilije: Sounds like a plan.

Brian: Yeah. John, did you have a pick for us?

John: I did. So this one was from CloudFlare, from their blog and I just found it so fascinating.

I'll read the title. "Perplexity is using stealth undeclared crawlers to evade website, no crawl directives."

And this kind of lands in that intersection of kind of cybersecurity and AI and just really like the web at large for those unaware CloudFlare, really is a network.

Like that's the company and that's the business is, you put some stuff on Cloudflare's network and their WAF and DNS protections and a lot of that they'll fight North Korean hackers for you basically.

And here's Perplexity, one of the biggest AI companies doing all this stuff with search for AI.

They're like rotating residential IP addresses, which is like something you would see these sketchy, you know, nation state backed hacker groups doing in order to get around, pretty advanced protections that CloudFlare can set up.

So it's a fascinating read just how they went about even detecting that this was happening based on certain telltale signs, just kind of crazy.

It really also speaks to me just about how the gold right now is really like all that data, like all that data that they have to go to such ridiculous measures to get around no crawl directives.

I don't know if there's much else beyond that, just that my jaw was on the floor.

Brian: Yeah, and I was actually Perplexity user most of this year recently I've kind of switched between other ones and found like Claude's going to do it better for me for what I'm actually trying to get it done.

But anyway, not a dig against Perplexity, but they're definitely doing some digging in other people's sites.

But when I was using it, like I'd use it during... we were joking about the Denver nuggets, like I'd be watching basketball.

And like while watching basketball, I remember during the playoffs like a couple achilles were like torn.

So I'm like immediately like asking Perplexity of like, hey, how long is an achilles injury?

Or like think Steph Curry had a hamstring strain. It's like what's the ramifications?

And then I think it was that question about Steph Curry that then Perplexity came back and was like, Steph Curry, X, Y, Z here's all this former injuries, this has never happened before.

And it was very clear that like it was doing some real time crawling and scraping on news sites as well and it was like, oh this is interesting and this like type of stuff as a front engineer, like we would definitely do this for testing, like change your user agent and do sort of the trickery to like make sure you could test on mobile and whatnot, but that they're just using like that sort of basic approach to then also like trick the robot that you're specifically saying, hey, don't crawl the site, but they're still going to crawl the site so it makes sense.

They're pretty edgy and we'll see how fast they sprint towards whatever they're trying to get to.

John: Yeah, it's interesting I think Cloudflare is also kind of emerging like this like champion of the people sometimes how it feels.

'Cause again, really they don't have much of a much skin in the fight around AI and all this data.

Brian: Yeah. Vasilije you're actually based in Europe where GDPR is like, hmm, it's a thing. And so, what are your thoughts?

Vasilije: Yeah, I mean my pick on this side would be a tweet recently I read from one of my former managers who says, "I feel like most tech and content is psychologically priced for the US consumers in mind."

Even in Western Europe, spending 2K on a gadget, hobby, course, conference is a lot of money.

And I do definitely feel that, especially when you go to US, you know, things are bigger, more expensive and people are willing to pay.

And in Germany, every euro accounted for, and like every data point needs to be monitored.

So I think with that in mind, we'll see this evolve in various ways.

We haven't, I feel imagined it, especially from this perspective of like feeding all of your data into some systems that you'd have no control over that you don't know what's the data used for.

I feel there is going to be a big battle of how that's going to work in the next couple of years, and where Europe is going to end up there, and I don't think it's going to end up in the right place because they tend to overregulate sometimes on these.

But also I think there is good to have pulls in all directions. So, something equalizes to the right place.

All in all in Europe right now, I feel everyone's asleep. No one's aware of what's happening.

And you know, I was just on a Langfuse event here in Berlin and there was a Vercel AI SDK guy and you know, a couple of people, but the audience, there was no one over the age of 35 in the audience.

You know, maybe one, two people and then everyone's like in their early twenties.

And I think, this is the part of the future revolution, but I feel like the general population still very unaware.

Brian: Yeah, I believe it.

And this is some data that I got earlier last week was the majority of the folks who are writing code in AI, and I say majority as in like 80% plus, is still copy and pasting from ChatGPT, and it just comes down to like, if you're kind of in the scene or you're paying attention or you're in Hacker News, like it feels like everything's amazing and like guns are blazing between all the model builders, but in reality it's like most folks are like, " Oh, yeah, ChatGPT like made this macaroni and cheese for me. Like I got the recipe from there."

It's not the same. So like us as engineers and like data focused, like we're seeing it right in our faces, but at the end of the day, it's like we're still got to come back to reality and like we get pay the bills and get the mail out the mailbox.

So I appreciate Vasilije, you coming in and talking about AI memory. This was a fascinating conversation.

And also your take from the European mind. Sometimes we forget to check in with the EU.

Vasilije: Thanks for having me on. And I'm always happy to be in US. I think the culture's amazing and what we lack in EU, US has to offer.

So therefore, I appreciate you guys taking the time and also opening a lot of these interesting discussions.

Brian: Cool. Well listeners, stay ready