If a mid size bank wants to create its own CHATGPT + its own data (like policy, procedures etc.) loaded into it for fast retrieval and search, what advice folks will have for the reference architecture and components/tools that might be needed to accomplish that? Any thoughts on resources and cost will also be welcome.
Sort by:
Two words 'Explainability' and 'HITL (Human-In -The-Loop) '. Make sure that you have enough guidelines and guardrails to explain the outcome that is potentially going to be utilized for any decisioning (Ideally, refrain from it). Onboard GRC criteria and feed those into your plans.
You will not be able to create an AI-based solution efficient enough to positively impact your Opex, while satisfying security and compliance, without building an industry-top level AI/ML team.
Instead, you may wish to re-think what kind of problematics you indend to address - and leverage appropriate COTS technology. Matching outcomes to OKR/KPI structure within the organization involved.
Thank you very much.
You will need at least the following infrastructures:
-> RAG (Retrieval-Augmented Generation) is the default: do not fine-tune initially.
->Orchestration: LangChain, LlamaIndex, Semantic Kernel, or Guidance
--> Route queries (FAQ vs policy vs procedure).
--> Generate queries for multi-step retrieval (expansion, re-ranking, de-dup).
--> Insert citations and quoted snippets in answers.
-> Models (pick 1–2, keep pluggable):
--> Hosted API (enterprise controls): Azure OpenAI, OpenAI, Anthropic on AWS Bedrock, Google Vertex.
--> Self/Private-host (compliance/data residency): Llama 3.1/3.2 variants, Mistral Large, Qwen, etc. via NVIDIA NIM, vLLM, or TGI.
->Embedding model: use a modern embedding that supports multilingual + long context. Keep an embeddings layer you can re-run offline if you swap models.
Plus you have to add all the application layers ( depending on your use cases ).
Costly wise
**** NB: it really depends about the application type and on how many users will use it. ****
based on my persona experiences:
CAPEX ( infrastructure setup, data ingestion and cleaning, vectorization and embeddings, APP development, Security and COmpliance setup) -- it really depends on in house skills and the type of applications, roughly from $50k to $150k.
OPEX - recurring - (hosting for LLM, Vector database and storage, data pipeline ingestion, monitoring infrastructure, human oversight), roughly $30k - $50k /month
Thank you.
When you say “it’s own ChatGPT” that is a bit ambiguous. If you’d like to host your own ChatGPT there is an option to do this thru Ollama using an offline model of ChagGPT but it is somewhat limited to GPT 4 mini level. It was just made available a few months ago.
If you are looking for an AI solution in a regulated industry (I am in insurance - similar to banking) I suggest utilizing a secure AI environment like Copilot which guarantees data safety (when used appropriately). We implemented a solution with grounded AI (policies/procedures) and depending on the way it is implemented you would pay for each query. Costs vary depending on the input of the query and the models used GPT 4 mini for instance is about 100 times cheaper compared to GPT 5 models.