Innovative Conversational AI - Kao Isla Case Study

Eddie Caldwell
Written by
Eddie Caldwell
Cover Image for Innovative Conversational AI - Kao Isla Case Study

Kao Isla is an advanced AI and data visualisation project developed in collaboration with Point 3 Co. for their client: Kao Corporation. The project took place over the course of 5 months and was showcased at the Consumer Goods Forum 2023 in Kyoto, Japan. The system features:

  • 3D AI Avatar: Capable of natural conversation with users through intelligent conversational systems.

  • Gait Metric Dashboard: Providing a visual representation of the user's gait tracking data as well as a 3D visualisation of their walking pattern.

Project Overview

Our objective was to create a more natural and intuitive way for user's to interact with their gait tracking data and resulting metrics.

Users performed a short gait tracking measurement using Kao's proprietary gait tracking mobile app, the results of this test were then sent to, and processed by, our system.

As an innovative company, Kao wanted to make use of the latest AI systems to create one of the world's first conversational AI systems to complement their gait tracking application.

Due to the fact that there were no existing systems that facilitate end-to-end conversational AI's with realistic 3D avatars, we were tasked with building the system from scratch.

Challenges

Our largest challenge was that there were no existing products or solutions that matched our needs. No one had developed an end-to-end conversational AI solution.

The closest solution was Nvidia Tokkio. However, at the time of project development, this solution was still in development itself and hadn't been released yet. Meaning, we had to go custom.

Another great challenge was latency. For a conversational AI to be convincing, we needed minimal down-time between the user asking a question, and the AI responding vocally. This meant using the most powerful tech we could get our hands on and writing custom software to connect them all seamlessly, with as little latency as possible.

Finally, we were faced with the hurdle of bringing everything together. How could we bring multiple innovative systems into one, with each constituent part working together and communicating rapidly?

How did we do it?

How do you go from idea to solution? The first step is to truly understand the concept you're trying to bring to life.

Conceptualisation

There's nothing that will sink a great idea quicker than not fully understanding what the goal is. Our first step in the process was to truly understand what the objectives were for the project.

  1. Natural conversation

  2. Low latency

  3. Interconnected systems

  4. Life-like avatar

  5. Engaging data dashboard

We held collaborative brainstorming sessions to understand the specifics of our requirements. We then formalised our requirements as goals and target milestones - another great way to ensure success is highlighting what you want to have accomplished, and when.

Once we had a solid idea of what we wanted to build, it was time to start designing.

Design

During the design phase we experimented with different technologies to create prototypes. Once we had a good idea of what technology we wanted to use we went on to design the architecture.

We knew that we had to achieve a high level of connectivity and reduce latency as much as possible so we designed a system architecture utilising web sockets to allow our different system modules to communicate.

The final step was to mock-up our user interfaces and decide how we were going to lay everything out. We had to create a design that would allow a large amount of data to be represented without feeling cluttered or overwhelming. This step involved plenty of internal discussion and back-and-forth with the client to ensure they were happy with the look and feel of the UI.

We now had a concrete design for our system, so the next (and most exciting step) was to make it a reality.

Bringing Isla to life using Metahuman and JALI

Isla (named after the Scottish island "Islay"), was our virtual assistant for the system. We decided to use Metahuman to create a 3D model as close to life-like as possible. Metahuman is a photorealistic avatar engine developed for Unreal Engine and allowed us to create a custom model suitable for our needs.

Now, a life-like model becomes immediately unconvincing if the lip-sync isn't also just as high-quality. We experimented with multiple solutions (including a custom-written lip-sync animation package) but we found that nothing looked realistic enough to match the high quality Metahuman model.

Thankfully, JALI had us covered. JALI Research provides an extremely realistic lip sync solution that could be used directly with our Metahuman character. In fact, JALI's lip sync solution is so good it was actually used in Cyberpunk 2077.

The final step in bringing Isla to life was to create some custom animations for her different actions: a thoughtful pose whilst the AI was generating a response, a gentle sway when awaiting user input, and some engaging hand movements when speaking, to name a few. As 3D application enthusiasts, we had a lot of fun with this step.

Giving Isla a voice with GPT and Nvidia Riva

The key thrust of this project was the newly released (at the time) GPT 3.5 turbo - a generative AI software that can be used to understand and produce human language. Our goal was to make use of it as the "brain" of Isla.

We couldn't just use GPT as it comes off the shelf, however. We had to create a wrapper around it that would allow us to moderate its language and replies - we didn't want it saying anything inappropriate or talking about irrelevant topics. So, we created a system that used the following steps for dialogue:

  1. Interpret the intent behind the users input (asking a question about their metrics, requesting more information, agreeing, denying, making general conversation, etc).

  2. Provide either a pre-defined response such as: "I'm sorry, I can't talk about that" in the case of inappropriate user input, or generate an appropriate response using GPT.

  3. (In the case of a newly generated response) moderate the response to ensure it is also appropriate and doesn't contain hallucinations.

Our custom conversational system involves techniques such as: intent and sentiment analysis, retrieval augmented generation (RAG), and vector databases.

With Isla's brain implemented, we needed to allow her to understand users and speak her mind.

Equipped with high-powered RTX Nvidia graphics cards, we made use of Nvidia Riva. Riva is a toolset that allows audio to be transcribed into text (for understanding user input), and for audio to be generated from text (for generating audio responses). We chose Riva because its computation speed was significantly quicker than the other software we had experimented with.

Building an interactive dashboard using Unity

One of the most common challenges you face when developing dashboard-type user interfaces is: how do we represent all of this data in a neat and intuitive way? The key is to understand what data is vital, what data can be secondary, and to make good use of things like iconography and whitespace.

We decided to make use of the Unity engine for our dashboard because:

  1. It allowed us to embed 3D graphics in the dashboard

  2. It allowed us a high degree of interactivity

  3. It allowed us to build robust API wrappers and modules in C#

  4. We have years of experience building graphical applications using Unity

We made sure to make smart use of prefabs - essentially user interface blueprints that can be reused in different locations. As well as this we made sure to use iconography where we could to reduce text and provide some engaging visuals.

We knew that we had a large amount of data to visualise, so we decided to keep the essentials visible at all times and hide the details in popup windows. One part of this we're very proud of was implementing the ability to understand which of the data points that Isla was actually speaking to the user about using sentiment analysis. We could then use this information to automatically show the appropriate popup without the user having to click anything.

As an extra flourish, we decided to add a 3D representation of the user's physical walking pattern. By processing the data outputted by the client's gait-tracking application, we were able to map the positional data points onto a custom 3D model - demonstrating, visually, how the user walks.

Our next step was to tie the systems together by implementing wrappers for our custom modules. We developed a wrapper for our web socket server to allow the dashboard to quickly communicate with other systems and we built wrappers for our API based systems to facilitate operations like transcribing user speech and generating AI responses & audio.

Showcasing Isla at the Consumer Goods Forum

The Consumer Goods Forum is a yearly international summit held in different locations all around the world. It brings together lots of big-name brands and companies and offers an opportunity to learn and network.

We were lucky enough to have our very own booth at the 2023 CGF in Kyoto. The booth consisted of a futuristic looking platform for users to perform their walk analysis.

Image - CFG Booth Walkway

Once the user completes their walking analysis, they're lead directly from the platform to our star of the show - Isla.

Image - CFG Booth

We manned the booth for two days, showcasing it to a variety of interesting people. We're thankful that we only ran into a couple of technical issues and they fortuitously only seemed to occur during the less busy periods.

Reception

We were extremely pleased with the turnout over the two-day event. We found that people often investigated the system themselves only to return with a group of friends/colleagues to have them use it as well.

Image - CGF Booth Crowd

People were very engaged with their personal metrics and often spent quite a while investigating their results. One unexpected outcome was how much people enjoyed comparing their results to others - almost competitively.

Image - CGF People Using System

Post-CGF: Kao Museum

Following the excitement and success of the CGF event, we were brought on for further work on the system.

Kao wanted to exhibit Isla in their company museum in Tokyo, as well as adding a couple of upgrades:

  1. Multi-lingual Support: We gave Isla the ability to speak in both Japanese and Chinese, as well as updating the dashboard to support both languages textually.

  2. Using the Latest AI: We made use of more recent and more impressive GPT models, as well as improving our custom wrapper to further reduce latency.

The system now sits proudly in their museum, highlighting what's possible with creative ideas and smart technology.

Conclusion

The Isla AI system was a challenging project. Tasked with creating a completely new technology in only 5 months certainly had it's stressful moments.

That being said, we're incredibly proud of - and satisfied with - the work we put in, especially after seeing how much people enjoyed interacting with it.

To go from an idea to a real, tangible product is always an exhilarating and gratifying process and we love getting to do it.

Massive thanks to Point 3 and their expert management of the project from start to finish - we couldn't have done it without them!

Image - CGF Team Photo

Westland and the Point 3 Team at CGF 2023

Let us empower your innovation journey

Have you got a creative vision you'd like to bring to life? As experts in innovative software, we can make that happen.

Use the link below to schedule a free consultation and kick-start your innovation.

Schedule your free consultation