Incongruent

What Your Movement Reveals: The Future of Computer Vision

The Incongruables Season 5 Episode 1

Spill the tea - we want to hear from you!

Galvin Widjaja, founder and CEO of Lauretta AI, shares his journey from restaurant management to leading a privacy-first computer vision company that predicts human behavior without using biometrics. His unconventional path provides unique insights into building AI that respects privacy while effectively understanding human intent.

Here are the seven biggest takeouts from the interview:

1. AI Should Predict Human Intent — Without Biometrics

“Lauretta’s fundamental technology is the ability to do non-biometric tracking across CCTV cameras… I can do that right now with a 95% to 98% accuracy across like a quarter of a million people walking through a shopping mall at the same time.”


2. Security vs Retail: Same Tech, Different Purpose

“Security is the industry of inconsistent purpose. And retail is a business of understanding the level of desire behind your purpose.”


3. From Fried Noodles to Frontier Tech

Galvin’s entrepreneurial roots run deep — from his father’s fried noodle shop in Jakarta where he managed three migrant-serving restaurants in Singapore. That hands-on experience with human behavior in physical spaces now powers his AI vision.

4. The YouTube Problem in AI Training

“The problem that we have right now is that almost all AI is trained on YouTube videos… it has a director and a cameraman. And a director and a cameraman means there is an implicit intent of what the purpose of the video is.”


5. Privacy by Design, Not by Patch

“We captured no biometric from the beginning… it’s built in such a way that at every stage of the system, there is transparency on the information that we capture, the way that we use it, the way we encode it, the way that we transfer it, and the way that we connect it with other information.”


6. Winning Homeland Security Without a US Office

Despite being headquartered in Singapore at the time, Lauretta AI won its first Homeland Security contract in 2019 — a signal that vision and trust can matter more than geography.

7. AI’s Evolution: From Detection to Understanding

Galvin sees AI as progressing in stages: inference → classification → aggregation → higher-order thinking. Lauretta’s bet is on this next frontier: AI that doesn’t just classify behavior, but understands context.


If you're appreciating all topics on AI and these interviews, please subscribe, like, and comment where possible. We really appreciate your support!

Links:
https://lauretta.io/

Arjun
Hello and welcome to our brand new season of the incongruent podcast season five. It's a full new lineup. And today we are joined by Galvin Widjaja who is the founder and CEO of Lauretta AI, a privacy first computer vision company that helps organizations understand human behavior in physical spaces without relying on identity and biometrics.

Galvin's unconventional journey into AI began not in a lab, but behind a counter of a quick service restaurant that he ran once for his father in Singapore. After early career stints in consulting, banking, and tech, a pivotal decision to reconnect with his estranged father led him to managing three migrant-serving restaurants. Born in Singapore and raised partly in Indonesia, Galvin grew in an entrepreneurial household that saw

both massive growth and total collapse. Those early lessons in volatility, resilience and human complexity now inform his mission to build AI that doesn't just react, it understands. Galvin, welcome to the show and how are you doing today?

Galvin Widjaja
Hey Arjun, yeah, I'm doing really well today. It is a very exciting time to be in the AI space. a lot of people talk about AI in a commercial way. think the thing that's really exciting about AI is that it is now kind of taking very meaningful steps in terms of its increasing understanding of the world. And every point of understanding is a new data point. And I think...

Data is a little bit of a narrative that's gone through different parts of our life, through very different things. And every new data point is an awakening of a different industry and a different means of work. I think that's a big part of our world that we are in right now.

Arjun
Amazing. So, Galvin, just to begin with, could you introduce yourself by sharing the insights from your nonlinear journey from consulting and finance to running restaurants and now leading an AI startup?

Galvin Widjaja
Yeah, so I am fortunate to live a full and maybe slightly different life from the average AI founder. my parents, my mother's side was contractors, so physical contractors, built buildings, old 1960s style army camps, which are kind of like what you see in "Mash", aluminum foil, semi-circle, barrack style buildings. And my dad from my dad's side of the family was in the restaurant business. And it was not the high-end restaurant business. It was the 45 cent fried noodles in Jakarta restaurant business, which is the best kind for sure. And so that's kind of what I started with. 

I wasn't a very good student, but I had a... very unique break in my life. I had a really big lucky break when I was nine years old. I was selected to be part of a program in Singapore called the Gifted Education Program. And essentially it's a program driven selected by IQ and nothing else. And the best thing about IQ is that nobody can test it well. And so they can say that even though your exam grades are terrible, which they were, I still was in the same band because they haven't updated their IQ number, right? 

So I was in a good class for many years despite being absolutely trash student and from two families that had no education, no emphasis on education whatsoever. And during that time, I moved into a more and more technical part of education. I started to do science and we love math. That was where I was naturally inclined to. But at the same time, what ended up happening was that I had an epiphany just before choosing my major that I wanted to do something like what my parents did was, which was what my grandparents did, which was to be an entrepreneur. And so I took a business degree at the of, at the start of my university days in 2007. And I made Immediately after I did that, I regretted it immensely because it was none of the things that I was good at. So being good at technical things do not allow you to explain what you do. A lot of business degrees is communication and understanding of the world. I didn't know any of that. 

And so I became a quant, the most technical possible business degree you can get, which is essentially high speed training, how to build training algorithm models, high-speed trading models. And I was very fortunate in that because 10 years down the line, the same mathematical structures of quant finance, which is matrix multiplication, became the foundation for AI in 2017, a full decade later. And that's kind of what happened with my life. 

I started in very stable jobs. So I did five years in management consulting, and then went into, I was poached into a client as many management consultants do. I go into banking. I did that for another five years. And in that time, a couple of things happened. My family divorced and my dad had secretly from Indonesia started a restaurant chain in Singapore to kind of be closer to his kids. And then he lost his visa. And then he told me one day, hi, Galvin, I haven't talked to you in a long time. But I need you to sign some papers for me and you need to take over a restaurant chain. And so I took over a restaurant chain.

I did that every evening. And just like a lot of the restaurant chains run like this, things were very iffy. I was kind of on the border of legal. And there were lot of dependents and people who were... The best cooks, right? I always say come from two places. One is they are grandmas and grandpas, right? This is where you get an old timey feel. And the top big kitchen chefs in the world are often found in prison. And this is, that may sound really terrible, but it's actually a natural extension of what you would have heard in kitchen confidential, right? Which is that you always hear about kitchen crews hanging out together and taking drugs together. This is the reason why they're in prison. So that's why we hire from these places.

And so I had to take it over because you cannot run a place, you cannot have a place like this run by itself. And that was the start of my journey. That journey allowed me to go from digital data, which is where everybody lived, and to move into the physical world of data. And so while most of the AI founders in the world are people in the technical space looking at how to expand the digital realm, I had taken the steps to migrate my data expertise into the physical realm to take over the restaurant chain. And I took the same trade physical data and I've expanded it into what the business is today.

Stephen King
That's fantastic. you weren't just a techie building technology for technical sake. You actually had a problem where you were identifying problems and you were able to apply your technical background to affect those problems. Fantastic. I'm assuming that's where Lauretta came from. Could you perhaps now just explain what Lauretta does or if that's not too big a jump from, from the street shop to the AI machine.

Galvin Widjaja
You know, I think one of the things, I know I always spend a lot of time talking about my history and one of the reasons for that is because I think it is almost my responsibility to tell you that the people behind some of what can often be called the most soulless technologies are real human beings. And what Lauretta is, is a company that is...used by people like the police and the TSA, as well as the retail spaces, so large buildings, to predict human intent. And human intent means that the technology's final aim is to guess what you want and what you are thinking. And what is the path and what does Lauretta do to do that?

Lauretta's fundamental technology is the ability to do non-biometric tracking across CCTV cameras. That is to say, I see, it's essentially, we decided that CSI should be a real technology. That is to say, if I see like a little photo of like your shoes next to your socks, and then I see another photo of your shoulder, I'm like, this is a picture of the same person as that. He's walking the same way and his style is the same. And I can do that right now with a 95 % to 98 % accuracy across like a quarter of a million people walking through a shopping mall at the same time. 

So every single person has a consistent journey. And once we have the journey, your behavior through the mall, we then add information on the things that you do every time I see you, where you are looking, and places that you pause and the people that you're with and the people that you're friendly with versus the guy that you're kind of pushing out of the way because he's standing in front of you and the TV screen and taking all of that information and then guessing to some extent whether you are there for the purpose of being there or you're there for something else. 

And security is the industry of inconsistent purpose. So you are where you are not supposed to be or you are there for a reason they are not supposed to be there for. 

And retail is a business of understanding the level of desire behind your purpose. That is to say, I bought a $5 watch, but I would have bought a $10 one if a $10 one was good. And so the same technology is used across multiple physical spaces by property managers and governments across the world right now.

Stephen King
I'll ask a follow up on that one then. So this is, I'm assuming, what we mean by computer vision, is it?

Galvin Widjaja
So by computer vision, simply is the idea of taking images to a camera screen and interpreting the data in a different way. computer vision has been around for something like 80 years. And for the last 60 or 70 years, it has largely been driven by identifying a phenomena. And so by identifying a phenomena, it kind of means like when somebody is moving through a screen,

But if I take that screen and I reduce it to just 10 pixels, and let's say I'm wearing a red shirt, what I'll see is I'll see a red dot moving across the screen, because that is me. And so it's like red dot moving across screen in a way that's very human-like, probably, a person walking. That's kind of what computer vision looks like. Deep learning AI computer vision, so 2017 onwards, has largely been driven by detection of things. 

This is a person and then it goes deeper and says this is a white male or it could be this is a person walking versus running. And so it starts to identify and classify behaviors. And so that's kind of the basis of the technology. And then you force fast forward now to 2022 and 2023 with CHGPT and all of that. And now it's like, this is a person and these are the 17 things he did. And why do you think he's there? And then maybe I'll say he's searching or he's purchasing, he's got no time so he's gone directly to the shop that he wanted to go to and he bought something he left versus he's shopping and he's browsing. And maybe you would think to yourself maybe a browsing person is more susceptible to marketing. Like a doom scroller is much more interesting than a person who is just opening that one YouTube channel that he wants to watch and leaves.

Arjun
Cool. You know, sorry. You've recently received backing from a good amount of organizations. What is driving this interest to get the support from Department of Air Force's AFRICS program, TSA in Texas, you guys are amongst the top 10 in Australia's market accelerator program, a project from Homeland Security and more.

Galvin Widjaja
So I think that one thing about our business that kind of drove this, I think it's very interesting. So we won our first Homeland Security contract in 2019 when we had no US citizens and no US presence. And Homeland Security, which is essentially the department of the

Ministry that does security in the United States that kind of oversees things like the police forces. Why would they give it to a organization that was at a point in time based in Singapore and I think a lot of that is vision based more than technology base. So I think We have over the last 10 years or so AI could be seen as a continuous a continuum, but I think it is more wise to think of AI as having gone through three specific stages. There is a stage of inference where AI supports an inference. Maybe that is a B, 75 % chance it's a B, right? And then it went to a point of classification. That is when you completely trust the AI's ability to decide that it's a B, and then you can now automate based on if B then.

If B, then find honey, or whatever. So you have now moved away from AI is supporting you to AI as a data input. And then there was that third point where, because AI is an input, now the next question was, can we aggregate AI inputs because they are automated? Can we aggregate them and create higher order thinking? And so we believed, even when we were pitching this in 2019,

Arjun
Mm-hmm.

Galvin Widjaja 

That AI will inevitably reach higher order thinking. And so because of that, in many ways, they decided that we were the appropriate people to build these projects. And yeah, it was really interesting because we are not a company that invents, we're not open AI, right? We do not build the frontier models. We just believe that a frontier model would be around 2014. And we operated that way, and that's kind of how we work.

Stephen King
Yeah, so that's really quite interesting because we're seeing lots of these different technologies that are coming out, but they only have one use. And once you finish with that use, or chat GBT or open AI, they decide to create something which is similar, that purpose is no longer existing and the technology is defunct because it's already been absorbed. that's really quite...

When you say it, it's quite obvious how you've gone through it. Now, I mentioned earlier that I saw Computer Vision about a few years ago. It was being used in news in particular, whereby you were able... I don't know whether you consider this Computer Vision, but the news journalists were able to find clips to support whatever it is that they were reporting on and they were able to...fill out their stories quite nice and easily. There was also cases where you could take, they only had pictures at this point, they hadn't had moving pictures, but from a still image they could determine what the people were likely doing.

Galvin Widjaja
Yes. Yeah.

Stephen King
and that was mind-blowing, and that was about five years ago. Where have we reached since then, and where do you think we're gonna go?

Galvin Widjaja
So this is a really interesting thing because if you were a practitioner and you were very honest with yourself, you would probably say that it hasn't changed at all. And the way that I would think about it is this. Let's say I had a picture where, let's say I was being interviewed. So I was standing in front of a TV, it's got my name there. And then behind me in the black background, it's slightly blurred, right, to kind of.

be a nice background for me, two guys start to fight, they start to beat each other up like violently in the background. And you take that video and you send it to an AI system. The AI system in almost every case until maybe three months ago would just say, this is an image of an interview with an exciting background. Whereas a human would say, the thing that is really happening is that there is a fight going on.

And the guy in front is now the background for the fight that's going on behind him. And a big part of that is because the behavior algorithms are largely what is called a classification algorithm. the way to think about AI is that AI can be kind of split, or least deep learning AI can't split into two models. One is to detect and one is to classify. A detection model says, is there a...

Is there one of these 50 things in this image? That's a detection model. And it often comes out with the answer, none of these 50 things are in the image. A classification model is the opposite. Classification model says, here are 50 options, MCQ, multiple choice question, right? Here are the 50 questions. Here is the image. You have to choose an answer. The reason that is, with a model like that, it will definitely take the most significant thing in that image and make that the answer. This is a big part. This is a problem that our, so in my business, this is the unsolved part of the question actually, because in a CCTV camera and in real life, and if you imagine if you were planning out what's happening in the world with now your robots and all of this stuff, you're not always looking at the big thing. You're looking at

Galvin Widjaja
the context and the richness of activity that is happening. The problem that we have right now is that almost all AI is trained on YouTube videos. Now, YouTube videos have something very important that the world does not always have. It has a director and a cameraman. And a director and a cameraman means there is an implicit intent of what the purpose of the video is.

And so that is the reason why if you type into chat jpt something like I want to see an exciting, a busy airport, it's going to show you an image of what like a YouTuber would say, this is a busy airport. It is not going to show you Richard scary busy, busy airport, which is kind of what you really want, right? Which is all the different parts going on and something interesting happening. It's not there. And that's a limitation that it cannot be solved immediately right now without many more things happening.

Stephen King
This all brings up this whole thing because there's always, whenever you're about photography, there's a lot of privacy issues where Arjun is in the UEE, there are strict laws on photography and videography is even more sensitive. How do you address those kind of issues? Specifically, I'm in the European Union, there's EU GDPR on data, you're collecting significant amounts of data on people's body shapes and forms, et cetera, et cetera, et cetera. How are you addressing those or are you addressing those at this moment?

Galvin Widjaja
So I think there's two ways to answer this question. So one is I can tell you what we are doing. And the answer to what we are doing is we captured no biometric from the beginning. For example, if I'm taking a photo of you. There are two things that you can do. So let's say I take a photo of you. One is I can take your whole image, so that's including your very good looking shirt they have on today and all of your other clothing. I can combine that into a single image and say, this is the feature of Steven that I'm using today. And this is what we do. And the reason why we do that is because if tomorrow you come in with a different search, with a different shirt on and a different pants on, or even wearing it different, there may be your belt a bit lower.

It's a different person to that image because it's matching all of you against all of you. The second thing you could have done is we could have said, let's do that, but we can also do facial recognition. Facial recognition has a problem. The facial recognition key is a transferable identity of your face. It is much more unlikely for you to change what your face looks like than what it is to change what the rest of your body looks like.

And I have tried this myself just to terrorize myself, just to give myself, know, like, night sweats a little bit. I copy and pasted the facial recognition from one of my cheapo little cameras into a different... into my own PC. So essentially, I hacked my lousy Chinese camera, a facial recognition door thing. I took out the facial encoding and put it in my PC, and my PC could recognize me immediately. Which essentially means, not only is this a key... is a stealable transferable key. Your privacy is gone once that thing is created, right? And so we're like, let's not even build those components in. It's built in such a way that at every stage of the system, there is transparency on the information that we capture, the way that we use it, the way we encode it, the way that we transfer it, and the way that we connect it with other information. That's kind of the way that we do things.

I think there's a bigger question that we also need to look at very briefly. And that is, you look at chat GPT today, we often talk about how chat GPT has a problem with trademarks. So if I Google, if I type in, when chat GPT first came out, one of the famous things that people used to do is they would say, make a video like a Studio Gilby image, right? And then it looks like a Studio Gilby image.

The word Studio GuilB was built into the model. There's no way to get rid of that. You can hide it. You can have a little prompt that says if Studio Guild B is in prompt, then remove it and say, Queen Japanese animation or something like that. But you can't really get rid of it unless you had gotten rid of it at the training data level. It had never entered the system. One of the things that we believe is that with a lot of

Decision making and computer vision data today, like I said, now it's trained on CCTV, it's trained on YouTube videos. Now, once you start to train it on the rest of the world to create understanding videos that can be used to, for example, command and control embedded AI robots and so on, which will live in the real world and not in the YouTube video world. You, this information doesn't, hasn't been trained on yet because it doesn't really exist yet.

Which means that we once again are at that crossroad where we can choose to eliminate Studio Gilby and all of our identities and all the branding on our cars from the training data. And we're kind of at that point again. So we probably have only like a couple more chances before AI kind of takes over the world.

Stephen King
So hopefully the regulators will take note. We're almost at the end. Arjun, do we have any last questions?

Arjun 
So now that you're in Singapore and the US and possibly in Australia to appear to be in front of your growth strategy, are there any other regions you guys exploring to kind of expand out the technology to?

Galvin Widjaja
So the interesting thing about technology is that we work with property managers and property owners. And property managers and property owners, the way to think about it is the tracking systems works best in a jurisdiction, and the best jurisdiction is a big owned location, which is a big building. So property owners make a lot of sense. And so the thing about property owners is that almost every property owner is cross-border in nature.

Arjun
Mm-hmm.

Galvin Widjaja
And so our expansion has been very organic, right? So we have Joan Lang LaSalle and CBRE, two property managers as people who are using our technology. CBRE starts in Singapore and they expand westward. The HQ is in India, so they're going westward in direction GLL Southeast Asia practices in Hong Kong. And so they're moving northward of it. And so we have work happening in Thailand and Vietnam and Malaysia and Indonesia. And it kind of expands out that way and in the US in the same way we just signed, for example, our first partner in Canada. Yeah, so it's an interesting space.

Stephen King
My brook, my brook, that's, congratulations. Although I'm in the UK now, I keep slipping a little bit into Arabic, and for no reason at all. That sounds like a good way of finishing for today. Thank you very much indeed, Galvin. Arjun, do want to close us down?

Arjun
Well, once again, Galvin asked Steve, so thank you so much for joining us and kind of sharing in your experience and giving us more insights onto Lauretta.  So like, yeah, thank you once again for joining us. And this is... The entire crew from incongruables Stephen King and myself Arjun Radish signing off. Good night