AI projects are increasing and with it comes some risks that are or not related to this technology.
Some organizations are jumping straight into “plug the LLM into a vector database” ( like Pinecone, Qdrant, pgvector…) at the risk of replaying the lift-and-shift mistakes made on the cloud projects.
This time, hiding years of data-quality, governance, code, architectural debt behind a nice new chatbot…
Impact being explosion of data, hallucinations, security gaps, and non-linear operational costs.
LLM-and-shift
The main error that was made in the cloud projects was to re-host monolithic architectures in the cloud and being surprised that the cost would explode. Production workloads where never optimized because most of the resource was anyway available and refactoring was costly in management eyes. Legacy debt was declared as the number 1 blocker to cloud ROI.
The AI trend is repeating the same pattern and some studies are showing that most of the organization are running pilot projects but less than 5% actually reach scale because old data plumbing can’t keep up.
Unless we treat AI adoption as an evolution and a transformation, not a simple relocation, we will pile fresh layers of debt on top of the old ones.
New pitfalls
Debt layer | How it breaks AI | Example failure |
---|---|---|
Siloed or undocumented data | RAG can’t retrieve ground-truth; embeddings miss whole domains | Stale answers in customer-support bots |
Batch ETL & slow refresh | Vectors drift away from source truth | “Zombie” docs in vector index (bad RAG) |
Monolithic DBs | No horizontal scaling for ANN search – resource contentions | Query timeouts |
Legacy access control | Embeddings leak sensitive text | vector row-level breaches, prompting vendors to add queryable encryption |
Thin observability | Drift or skew hides for weeks | Model accuracy falls silently |
Ad-hoc GPU spend | Run-time costs dwarf ROI | Surprise six-figure invoices from SaaS vendors. |
Real-world & Technical problems
LLM hallucinations can mainly be solved with advanced RAG Search techniques like Hybrid Sparse-Dense implementation which as come a long way since its firsts iterations. Organizations that implemented what we call now “Naive RAG” encountered legal liabilities when information provided by their chatbot was “invented”. Lawsuits on the matter are flourishing (ex. Air Canada ordered to pay customer who was misled by airline’s chatbot | Canada | The Guardian).
From a DBA point of view, old design choices are also fighting back. Without CDC or event streams, every content update can trigger a full-table re-embedding or generate consistency bugs, because you never intended to trace all changes in the first place. Security gaps is also a source of concern, DBAs have historically being the gate keepers but sending data to a cloud hosted LLM is pushing the boundaries of the security wall and forcing DBA to find technical solution without necessarily the adequate guidelines from internal policy as governance fails towards technical applications of such solutions.
Some tools are offering DLP solution to connect to your LLM provider (like VARIOS AI by Sequotech) but new query-time encryption layers can represent a challenge at scale.
Some advice on how to avoid the data storm.
First ask yourself some important questions :
- What debt are we sweeping under the RAG carpet?
- Do we need another vector store or can we govern one Postgres leveraging pgvector and pgvectorscale with DiskANN indexes?
- How will we roll embeddings when source data changes?
- Who owns prompt security reviews?
- Can we quantify GPU and index costs at 10× scale? Meaning : is our architecture relevant to our use case ?
- How do we prevent ACL bypass through retrieval?
- What metrics shout “drift” before users notice?
- What’s the rollback plan after the data hits the fan?
Then maybe look into the following :
- Tackle the beast and modernize – you might wanna transition from an nightly ETL into a streaming CDC, implement stagging areas or tools allowing you to track schema changes that can trigger auto embedding jobs. Understand the likelihood that a field might get refreshed and at which level.
- Refactor the storage layer – splitting hot vectors into dedicated partitions/tablespaces or host to avoid to decouple from OLTP locks.
- Implement the proper RAG Search techniques for your use case. Yes there might be several ones.
- Secure the vector path and benchmark to know your limits. Encryption of chunks and embeddings and live data masking can be a solution to avoid data leaks (this requires having an up-to-date adequate security policy).
- Plan time in your sprints to actually avoid technological debt.
Depending on your environment, you might have other points to address, but those are first steps to ask yourself more questions and maybe succeed where other might fail. I would also add that usually, it might be a good time to call a consultant on the matter 😉 (they sometimes have their use cases).
The geek in me says that AI and pgvector/pgvectorscale is a really fun toy. The professional says it should not be used blindly, it is a helpful tool that can be leverage but not relevant for all use cases.
Happy hallucinations !
Hallucination : the experience of seeing, hearing, feeling, or smelling something that does not exist, usually because of a health condition or because you have taken a drug.