LLM with a RAG system for financial data

GraphRAG – a joint research initiative

Our LAB team is always delighted to collaborate with clients’ AI Digital and Innovation team on research projects and publications.

🙌 This research publication was the result of a joint initiative with BNP Paribas, Neo4j and Lingua Custodia

👩‍🔬 The research project titled ‘GraphRAG: Leveraging Graph-Based Efficiency to Minimize Hallucinations in LLM-Driven RAG for Finance Data’ has been accepted for publication for the GenAIK 2025 workshop at Abu Dhabi in January 2025

ℹ️ The research explores different ways to introduce structured knowledge from Knowledge Graphs (KG) in a RAG pipeline.

💁 While it is not straightforward to convert graph-based knowledge to text leverageable by an LLM, we show that, if done correctly, it can help to reduce hallucinations and the cost of inference by drastically reducing tokens consumption. 💪

Well done to all the authors! This is a fantastic achievement 🤩

Buy vs Build Software – ‘To be or not to be that is the question’!

The Buy v Build software dilemma for AI is a very pressing challenge for our financial clients. Skills and software are needed to build an inhouse solution, whereas purchasing the solution can be faster, though it might be more difficult to customise the solution to fully meet business requirements and there might be concerns about data security and privacy.

We asked our Our Head of Sales Frédéric Moioli, his thoughts on the Buy vs Build Software debate.

1) Lingua Custodia’s solutions are a ‘buy’ option, so how can we ensure our solutions match our clients’ needs?

Lingua Custodia’s solutions are uniquely positioned to match client needs due to several key factors:

Expertise and Innovation

Lingua Custodia leverages extensive expertise in Natural Language Processing (NLP) and AI ensuring solutions meet specific client challenges. Our dedicated Research & Development department, The LAB, drives innovation by developing new applications and keeping our products at the forefront of technology.

End-to-End Control

We maintain control over the entire value chain, from creating custom Large Language Models (LLMs) to rigorous data management and training. Our secure Retrieval-Augmented Generation (RAG) tool addresses crucial data security concerns.

Verto Platform : the one stop shop platform

Our advanced platform, Verto, serves over 10,000 users, integrating AI-powered tools for translation, transcription, data extraction, and efficient document analysis. By combining expertise, continuous innovation, comprehensive control of AI development, and a powerful platform, Lingua Custodia effectively aligns its solutions with the evolving needs of financial sector clients

2) How does Lingua Custodia help its clients considering a ‘build’ option?


At Lingua Custodia, we’re not a consulting house ; instead, we leverage our extensive expertise in AI technologies to support clients in the financial sector. Our LAB partners with the innovation labs of our clients on AI research projects.

Since our founding in 2011 by finance professionals, we have developed a deep understanding of our clients’ pain points. This allows us to customize AI models to meet their unique requirements effectively. The LAB’s innovations include the development of our cutting-edge Document Analyser, a generative AI tool designed for efficiency without requiring massive investments.

By providing secure, innovative solutions, we empower clients to enhance their operational efficiency while ensuring data security and compliance. This approach enables us to deliver high-quality, domain-focused solutions that align with the needs of the financial industry.

3) What is Lingua Custodia’s competitive advantage?


Lingua Custodia’s competitive advantage stems from several key factors:


At Lingua Custodia, we offer unparalleled security for our clients. Unlike many competitors, we don’t rely on public cloud services. Instead, our solutions are hosted on physical servers located in Europe, ensuring the highest level of data protection and compliance with strict financial industry regulations.


Our team consists of dedicated and versatile professionals with deep expertise in both finance and AI technologies. This unique combination allows us to understand the intricacies of financial operations and develop tailored AI solutions that address specific industry challenges. We are ultra-specialized in finance, having been founded by finance professionals in 2011. This means we don’t just understand AI; we intimately know how the financial sector works. Our deep industry knowledge allows us to create solutions that seamlessly integrate into existing financial workflows and address real-world pain points.


Our LAB, our dedicated Research & Development department, keeps us at the forefront of AI innovation. It continuously develops cutting-edge applications tailored to the financial industry’s needs.


By combining our secure infrastructure, specialized team, deep financial expertise, and continuous innovation through our LAB, we offer a unique value proposition that addresses the specific needs of financial institutions in today’s rapidly evolving technological landscape.

How to understand the core concepts of AI, LLMs and RAG!

If you find some of the different terminology used for Large Language Models (LLMs) and AI confusing, you are not alone!

This is the first in a series of articles about AI, LLMs and Retrieval Augmented Generation (RAG) where we aim to explain clearly and succinctly, some of the key terminology you might be hearing about. We hope you find these posts helpful!


What are foundation models?


A foundation model is an AI model, trained on huge amounts of data (documents, audio, images, text….). It is trained to ‘generate’ the next word as it ‘learns’ the language. It should then be specialised and fine-tuned for a wide variety of applications and tasks, which then means it is no longer a foundation model!


What are LLMs?


A LLM is an umbrella term used for all foundation and specialised models.

For example:

In the case of Llama, the foundation model is not usable directly but serves as the foundation for all the subsequent specialised models. Llama instruct is a question and answering model and code Llama is a coding assistant.

All three models are LLMs.

What are the benefits and challenges of a foundation model?

In terms of benefits: 


Flexibility and adaptability

Foundation models are flexible and adaptable as they can be be fine-tuned for a wide range of tasks, saving time and resources compared to building new models from scratch for each specific task.

Cost efficient

While foundation models are costly, once you have them, you can adapt them as many times as you want on new tasks.

Accessibility

Open source foundation models are accessible as smaller companies with less access to computational resources can leverage these models to create innovative AI applications. (Note that there are many closed models which are not accessible!)

(Note – Open source foundation models – almost anyone can use, access the source code and customise the foundation model which in theory, improves accessibility, transparency etc.  Meta’s Llama 2 is an open source foundation model.  Chatgpt is not open source. 

As for the challenges: 

Bias

Foundation models are trained on large and diverse data sets which may contain biases present in the data, and which will be mirrored in the model’s outputs.

Security and privacy

The huge amounts of data needed to train a foundation model naturally raises security and privacy concerns.  The data should be secure and handled responsibility.

Lack of transparency

Foundation models can be a ‘black box’ .  The issue with data has already been highlighted.  In addition, it is important to understand how the foundation model generates its outputs to identify any potential errors or bias.  This is a hot topic with ongoing empirical studies.

Lingua Custodia wins the Large AI Grand Challenge Award organised by the European Commission!

AI award

Lingua Custodia wins the Large AI Grand Challenge

The French Fintech company Lingua Custodia, a specialist in Natural Language Processing (NLP) applied to Finance since 2011, was delighted to receive an award in Brussels yesterday. This award, which was presented by EU Commissioner Thierry Breton, is designed to reward innovative start-ups and SMEs for devising ambitious strategies and making commitments to develop large-scale AI foundation models that will provide a competitive edge for Europe.

Together with 3 other technology SMEs, Lingua Custodia will share a prize of a total of €1 million and access to two of Europe’s world-leading supercomputers, LUMI and LEONARDO for 8 million hours. This challenge was highly competitive and received 94 proposals.

Lingua Custodia’s AI foundation models

Lingua Custodia’s winning proposal focused on developing a series of AI foundation models with 3 major objectives, using the company’s existing skills and known expertise in the AI arena:

  • Build very cost effective, fast and efficient models to run on smaller servers and democratize the technology while reducing energy consumption
  • Ensure the models can handle multilingual queries and make them available to non-English speakers
  • Tune the models for the retrieval of information (RAG) to enhance the usage of generative AI for multilingual knowledge management.
Lingua Custodia’s focus on cost and energy efficient AI foundation models


Olivier Debeugny, CEO of Lingua Custodia, declared to Thierry Breton: “Lingua Custodia is an AI company, that has raised a modest amount of capital since its launch. This has been a catalyst for our creativity and resourcefulness and we therefore have the skills to optimize everything we develop. This is why we have been working on the design of multilingual, extremely cost and energy efficient models to be applied to an AI use case with a high Return on Investment.”

A propos de Lingua Custodia

Lingua Custodia is a Fintech company leader in Natural Language Processing (NLP) for Finance. It was created in 2011 by finance professionals to initially offer specialised machine translation.

Leveraging its state-of-the-art NLP expertise, the company now offers a growing range of applications: Speech-to-Text automation, Linguistic data extraction from unstructured documents, etc.. and achieves superior quality thanks to highly domain-focused machine learning algorithms.

Its cutting-edge technology has been regularly rewarded and recognised by both the industry and clients: Investment houses, global investment banks, private banks, financial divisions within major corporations and service providers for financial institutions.

LLMs – Generative AI is not Sci-fi!

LLMs

Lingua Custodia was delighted to co-host this event with Cosmian, a company specialised in cybersecurity, at Le Village by CA Paris.


What are LLMs?

Gaëtan Caillaut’s presentation for Lingua Custodia focused on Large Language models (LLMs) and aimed to ‘demystify’ the engineering and science behind large language models. He highlighted LLMs are a type of AI program able to recognise and generate text. These models are trained on large sets of data, which allow the models to learn the probability of generating the next word, based on the context of the word or phrase.

What are the limitations of LLMs?

The limitations of LLMs were also discussed. The quality of the text which is generated is very dependent on the underlying data and there is also a risk that these models can misinterpret the context of the words or phrase. A LLM hallucination happens where the model generates text that is irrelevant or inconsistent with the input data.
LLMs are also very expensive to run and complicated to train.

Retrieval Augmented Generation and RLHF for finetuning

He highlighted the benefit of RAG (Retrieval Augmented Generation) which references an external knowledge base to improve the accuracy and reliability of LLMs. RAG helps to enhance LLM capabilities and has the advantage of not requiring particular training.

RLHF (Reinforcement Learning from Human Feedback) is one of most used finetuning approaches. It helps the model by using human feedback to ensure the model is more efficient, logical and helpful.

Lingua Custodia’s Generative AI Multi-Document Analyser


Olivier Debeugny, Lingua Custodia’s CEO then presented the multi-document data extraction technology which uses RAG to optimise the data extraction quality.

Please note that Lingua Custodia now has a new address in Paris, Le Village by CA Paris, at 55 Rue La Boétie, 75008. We are delighted with our new offices and thrilled to be part of this dynamic eco system which prioritises supporting startups and PMEs.

Digital Finance – TQ Accelerator

Lingua Custodia is delighted to be accepted to participate in the Digital Finance program at Tech Quartier, Franfurt.

Lingua Custodia is one of 15 start ups selected to network and connect at Frankfurt financial centre.  The program brings together corporates and startups in the digital finance domain with the goal of driving innovation to create tangible solutions for the finance industry.

The program runs over a 6-week period from 28 May to 11 July 2024.

Our CEO, Olivier Debeugny has just completed the first 2 weeks of this program and has found the experience to be insightful and very investing.  The selected start ups have been encouraged to collaborate on specific finance use cases, with each start up able to share their experience and knowledge of digital AI solutions.

The networking opportunities have also helped Lingua Custodia to meet potential clients and broaden its contacts in Germany.

Olivier Debeugny participated on the discuss panel at the Network Fintech Start Up event on the 6 June, which focused on ‘Open Finance & AI: Shaping the Future’ for digital finance.  Some of the key points which were highlighted during this event were:

AI and productivity

The expectation is that the implementation of AI in the financial domain will help to boost productivity.  So, the priority use cases are for better managing risk and compliance as well as the simplification of decision making and improved forecasting accuracy. AI will also be used for optimising the client experience, though the main focus for financial companies is really productivity gains.

The challenges and risk of AI for finance

There is clear recognition of the regulatory and ethical implications, as well as data security.  The panel underlined the importance of critical thinking when reviewing AI processes.

The future outlook for AI for finance

The future is promising! 

AI should help to reduce the repetitive and low value-added aspects of individual’s roles.  The skills which will be important are adaptability and flexibility, as well as critical thinking skills.  Individuals should embrace new AI technologies while aiming to understand them and being aware of the importance of ethics, diversity and security.

Generative AI Data Extraction for Due Diligence Reviews

Due Diligence Review

Solution

The Multi-Document Analyser enables the legal team to upload and analyse these legal documents. They can input queries like “Identify non-disclosure agreements.” The tool extracts relevant sections, translates them if necessary, and presents clear answers, helping lawyers expedite the review process and identify critical legal information

Generative AI Finance Document Processing

Le défi

 A law firm is conducting due diligence for a merger or acquisition, involving a vast number of legal documents, contracts, and agreements in various languages. Legal experts need to quickly identify clauses related to liabilities, intellectual property, and compliance issues to assess potential legal risks.

Client

Legal and Compliance Team

Services offerts

Multi-Document Analyser
Translation technologies 

Financial Document Analysis

Financial Document Analysis for Investment Decisions

Solution

Lingua Custodia’s Generative AI Multi-Document Analyser allows the analyst to upload these documents, input specific queries (e.g., “Show me revenue growth trends”), and receive concise answers in natural language. The tool’s multilingual support and data extraction capabilities streamline the analysis process, enabling the analyst to make more timely and data-driven investment choices as they can ask their queries in their native language and receive the responses in their native language, even if the documents are in other languages. 

Le défi

A financial analyst at a large investment firm needs to quickly extract critical information from a diverse range of financial documents, including annual reports, earnings transcripts, and regulatory filings, in multiple languages. They need to identify key financial metrics, such as revenue growth, profitability, and risk factors, to make informed investment decisions

Client

Financial Products Team

Services offerts

Multi-Document Analyser
Translation technologies 

Lingua Custodia’s Generative AI Multi-Document Analyser

 

Multi-Document and Multi-Lingual Data Extraction 

Lingua Custodia’s Multi-Document Analyser is now in production and can be accessed via the secure platform or through API.
This means that a group of documents can be uploaded at the same time, with key information extracted within seconds. As this technology is also integrated with our machine translation engines, it is possible to load your documents in different languages and extract your data using a different language prompt.

As an example, French, Chinese and Spanish documents can be uploaded and you can ask your queries in English or any other language which is supported by the platform.

You can upload up to 15 pdf documents together. The technology has been optimised to read and extract information in tables and to respond swiftly to queries.

What are the uses cases for the Multi-Document Analyser?

The use cases for the multi-document analyser include financial product and regulation queries, client support queries, requests for proposals and due diligence questionnaires. So, it is possible to upload a group of Key Investment Documents and extract the risk performance indicators and other details.

Lingua Custodia focuses on innovation

Lingua Custodia is very proud of this highly innovative technology which is part of a suite of financial processing services it provides. It was set up by 2 financial professionals in 2011, originally to provide specialised machine translation services, having identified a clear use case for this service.  Since 2020, new technologies, such as speech to text and data extraction, have been added progressively.

 Its aim is to be the market leader in financial document processing for financial institutions, and Lingua Custodia is distinguished by its focus on data security as it recognises that this is a priority for its clients.  Data is stored on bare metal servers in Europe.

 

Why AI will not be replacing humans anytime soon!

Lingua Custodia

Why AI will not be replacing humans anytime soon!

The last 18 months has seen dramatic developments in the arena of Artificial Intelligence (AI). The emergence of Large Language models such as ChatGPT, which can analyse, respond and generate text was a major event.  This then led to the rapid emergence of other models, focused on sentiment analysis, image and voice recognition.  

This has understandably led to concerns about the impact of these innovations on the human workforce. Will AI innovations make humans redundant?!

At Lingua Custodia, we feel strongly that the response is no. These technologies will boost productivity and create new job opportunities.  AI is to be embraced rather than feared!


AI and humans learn differently

Large language models use queries or prompts, based on mathematical formulas to process and identify patterns in a huge volume of data.  These prompts are then converted to text outputs.  

These models learn by correlation, so for example, they can link 2 variables – such as studying and grades, but a human brain learns by causation – that the change in one variable can impact the other one – so if you study, you might get better grades, whereas if you do not study, your grades might suffer. 

So, AI and human brains do not learn in the same way – they are different.  The AI may well be able to process huge volumes of data faster than a human brain, but the human brain can identify causation as well as adding layers of creative thought, consciousness, and ethics. 

AI should be used to boost productivity

Future job roles will use AI to as a tool to boost productivity.  So, an engineer might use AI to check their code for potential errors.  In terms of the financial industry, which is championing AI, it can be used to identify risk, rapidly analyse investment opportunities and optimise client services through the use of chatbots.

The Lingua Custodia platform which is specialised for the financial services sector, contains several AI technologies which are all focused on adding value for our clients. Our secure platform allows the rapid communication, extraction and analysis of data in different languages. For example, machine translation technology translates documents and text within seconds, while our Document Analyser, rapidly extracts key data from large pdf documents.