
End-to-End Data Quality Management Framework (DQMF) in Banking with GenAI Integration
Data quality is mission-critical in banking, as poor data can erode trust and even impact revenue (businesses reported an average 31% revenue loss due to bad data in 2023). Banks handle diverse data (customer info, transactions, risk metrics, etc.), and regulators demand ( BCBS239, GDPR etc) that this data be accurate, complete, timely, and well-governed.
Generative AI (GenAI) offers new ways to automate and enhance data quality management across these phases. Modern AI can summarize and generate documents, extract and classify information, and even assist in detecting data issues, thereby accelerating data governance and compliance tasks. Below, we break down the key DQMF phases – from data creation, storage, processing, usage, to archival and deletion – highlighting critical activities and how GenAI can realistically improve or streamline outcomes in each. We then present a structured table summarizing GenAI applications for each phase, implementation steps, and example prompt templates.
1. Data Creation
DQMF Phase/Activity:
Input Validation Ensuring data entered (e.g. customer sign-ups, transaction details) is complete and correct at source.
GenAI Application:
Intelligent Data Entry Assistant – LLM validates and cleans entries in real-time, and suggests corrections or completions.
Type: Word/Excel automation (text validation)
Steps to Implement:
1. Integrate LLM API with front-end forms or data intake pipeline.
2. Define rules/prompts for common checks (format, required fields, consistency).
3. On data entry, send field values to LLM with prompt asking for validation/errors.
4. Receive suggestions (e.g. “Did you mean…?”) and present to user or auto-correct if high confidence.
5. Log issues flagged by AI for review (to improve rules or for audit trail).
Prompt:
“Review the new customer input for errors: Name: ‘J@n3 Doe’, DOB: ‘32/13/1980’, Address: ‘123 Main St, ???’. Identify issues and suggest corrections.”
DQMF Phase/Activity:
Metadata & Minimal Capture – Documenting new data elements and avoiding unnecessary data.
GenAI Application:
Auto-Documentation & Minimization Checker – GenAI generates metadata descriptions and flags extraneous data collection.
Type: Compliance check (policy vs input)
Steps to Implement:
1. Provide LLM with context: data definitions, and privacy policy (or GDPR guidelines for data minimization).
2. When designing forms or new fields, prompt AI to describe the field and justify its necessity.
3. AI output used to populate data catalog (field descriptions) and alert if a field seems excessive (e.g. collecting Social Security Number without clear need).
4. Review AI suggestions with Data Governance Council before finalizing the data collection design.
Prompt:
“We plan to collect ‘Favorite Color’ during account opening. Our policy says only collect necessary data. Explain if ‘Favorite Color’ is needed for banking services, or if it should be excluded under data minimization.”
2. Storage
DQMF Phase/Activity:
Data Classification & Cataloging – Tagging data by sensitivity and type; updating data catalog.
GenAI Application:
Automated Data Classifier – LLM reads schema or sample data to classify PII, financial data, etc., and generate catalog entries.
Type: Data extraction & classification
Steps to Implement:
1. Compile schema info: table/column names, sample values, and existing glossary definitions.
2. Prompt LLM to classify each field (e.g. “personal identifier”, “transaction detail”, “public info”) based on name and content.
3. LLM also produces a brief description of each field for the data catalog.
4. Integrate with data catalog tool: populate the classifications and descriptions, then have data stewards review/approve.
5. Use classification to apply controls (e.g. encryption for sensitive fields).
Prompt:
“Here are database fields: Name (Jane Doe), Account_No (123-456), Balance (1000.00). Classify each as personal data, financial, etc., and draft a one-line description.”
DQMF Phase/Activity:
Metadata & Lineage Management – Documenting where data comes from and how it’s transformed; keeping metadata updated.
GenAI Application:
GenAI for Lineage Documentation – AI “harvests” metadata from code and logs to map data flow and generate lineage docs.
Type: Document analysis (technical logs/code)
Steps to Implement:
1. Gather metadata inputs: ETL scripts, data flow diagrams, log files describing jobs.
2. Feed to LLM with prompts to trace lineage (e.g. “trace Customer_ID from system A to report B”).
3. LLM outputs a structured lineage (source → intermediate → output), including transformations in plain language.
4. Embed AI in data governance tools to auto-update lineage when processes change (LLM reads new script versions and highlights lineage changes).
5. Data owners/stewards validate the lineage drafts and finalize in the lineage repository.
Prompt:
“Analyze the following ETL script. Summarize the data lineage for ‘Risk_Exposure’ data: where does it originate and how is it transformed before the final report?”
3. Data Processing
DQMF Phase/Activity:
Data Processing – Data Cleansing & Standardization Cleaning data (remove errors, standardize formats) and handling missing or duplicate data.
GenAI Application:
AI-Powered Data Cleaner – GenAI suggests and automates data cleansing steps (deduplication, filling blanks, format fixes).
Type: Excel/Code automation (error detection & correction)
Steps to Implement:
1. Select target dataset (e.g. an Excel sheet or database table) and identify known issues (nulls, dupes, etc.).
2. Prompt LLM with dataset summary or sample and ask for cleaning steps or even code. (E.g. “How to standardize address formats and remove duplicates in this data?”)
3. LLM returns a list of actions or a script (Python/SQL).
4. Execute the suggested script in a test environment; review changes (ensure no over-correction).
5. Incorporate into pipeline: e.g. embed the AI generation in an ETL job – the job sends data profile to LLM, gets cleaning code, and runs it (with human oversight on updates).
Prompt:
“We have a dataset of customer addresses with inconsistent formats (some all-caps, abbreviations like ‘St.’ vs ‘Street’). Provide steps or code to standardize these and remove any exact duplicate addresses.”
DQMF Phase/Activity:
Anomaly Detection & Reconciliation – Identifying data errors or mismatches during integration (e.g. unmatched records, outliers).
GenAI Application:
Intelligent Anomaly Detector – LLM analyzes data profiles or reconciliation reports and points out likely issues and reasons.
Type: Data summarization & analysis
Steps to Implement:
1. Generate data summary: e.g. stats from profiling (min/max, counts) or a diff report between two systems.
2. Ask LLM to interpret these results: find anomalies (out-of-range values, unexpected gaps) and suggest potential causes.
3. LLM responds with human-like analysis: e.g. “5% of records have null credit score – likely a recent feed failure” or “System A has 100 more records than System B; possibly missing entries in B”.
4. Alert data engineers/stewards with AI’s findings for investigation.
5. Optionally, feed corrective suggestions from AI into issue management (e.g. “recommend reloading feed for dates X-Y”).
Prompt:
“Dataset profile: ‘Transaction Amount’ field – min: -$5, max: $100K (5 records have negative values). Identify any anomalies and likely causes.”
4. Data Usage
DQMF Phase/Activity:
Report Generation – Producing reports (risk reports, KPI dashboards, etc.) with narrative explanations.
GenAI Application:
Automated Report Writer – GenAI creates executive summaries, explanations, and even slide content from raw data and analytics.
Type: Word/PPT automation (summarization & generation)
Steps to Implement:
1. Compile report inputs: e.g. a set of metrics or a table of results that need explanation.
2. Design prompts for each section of the report. (E.g. “Summarize key changes in risk metrics quarter-over-quarter and their causes.”)
3. Run LLM to draft each section’s text. It will turn data into narrative (ensuring it’s accurate – possibly using few-shot examples to ground it).
4. Generate visual aids: One can also ask AI to suggest charts or create slide outlines (which can be fed into PowerPoint AI or done via tools).
5. Review by analysts: human experts edit the AI-generated content for correctness and tone. Incorporate into final Word report or PowerPoint deck.
Prompt:
“Using the quarterly risk data below (credit risk up 10%, market risk steady), draft a 200-word summary for the executive risk report. Highlight why credit risk increased and any action plans.”
DQMF Phase/Activity:
Compliance & Document Analysis – Ensuring data use (sharing, new analytics) complies with policies & contracts.
GenAI Application:
Policy/Contract Analyst (GenAI Copilot) – LLM cross-references usage scenarios with internal policies or legal documents to find constraints or required controls.
Type: Compliance check (RAG: retrieval augmented generation)
Steps to Implement:
1. Index relevant documents: data privacy policy, BCBS 239 guidelines, data-sharing contracts, etc., so the LLM can retrieve exact clauses.
2. Describe the scenario to the LLM: e.g. “We want to send encrypted customer data to a third-party for marketing analysis.”
3. Ask questions via LLM: “Is this allowed under our policy and GDPR? What terms in the contract apply?”
4. LLM with RAG finds relevant text (policy says “no direct marketing without consent”, contract says “third party must delete data after use”). It summarizes compliance requirements and any red flags.
5. Implement recommendations: e.g. ensure consent is obtained, have third party sign required clauses – as identified by the AI. Document the AI’s findings in approval records.
Prompt:
“Our bank plans to share transaction data with Fintech XYZ for analysis. Internal Policy excerpt: ‘customer data cannot be shared externally without anonymization unless under contract & customer consent’. Contract excerpt: ‘All customer data must be deleted post-analysis’. Question: Identify compliance requirements for this data sharing arrangement.”
DQMF Phase/Activity:
Sensitive Data Leakage – Check Preventing unauthorized personal data exposure in reports or outputs.
GenAI Application:
Output Privacy Scanner – LLM scans reports, documents, or datasets to detect personal/sensitive information that shouldn’t be there.
Type: Document analysis (PII detection)
Steps to Implement:
1. Define PII patterns & examples for the LLM (names, emails, account numbers, etc., as well as contextual clues like “Mr./Ms.”).
2. Before publishing a report or dataset, feed its text or a sample to the LLM asking to highlight any personal data or confidential info.
3. LLM returns with flagged items: e.g. “Contains what looks like 5 customer names and 3 account numbers in section 2.”
4. Remediate: Analyst either removes or anonymizes those entries, or justifies their inclusion if necessary (and allowed).
5. Final check: Optionally, run the LLM check again on the revised output for any missed items.
Prompt:
“Review the following report draft for any personal or sensitive data. Flag anything like individual names, addresses, account #s, etc., that appear.”
5. Data Archival
DQMF Phase/Activity:
Retention Compliance & Archiving Plan – Determining what data to archive or delete per retention schedule.
GenAI Application:
Retention Policy Assistant – LLM translates retention rules and data last-used info into an actionable archiving/deletion list.
Type: Document analysis & classification
Steps to Implement:
1. Provide LLM with retention rules (e.g. “Client data: delete 7 years after relationship ends; Transaction data: archive after 1 year, delete after 5 years.”).
2. Provide summary of data inventory with last modified dates or ages.
3. Ask LLM to identify which datasets are due for archival or deletion this cycle, based on the rules.
4. LLM outputs, e.g.: “Archive: Transactions_2017 (5 years old); Delete: ClosedAccounts_2015 (retained 10 years, now expired)” along with reasoning.
5. Implement actions: feed this list to IT workflows for archiving/deletion. Data officers sign off, with the AI report as supporting documentation.
Prompt:
“According to policy, user activity logs should be kept 1 year then deleted. We have logs from Jan 2023. It’s Feb 2025 now. What should be done with these logs to comply with policy?”
DQMF Phase/Activity:
Archive Documentation – Summarizing archived data sets for future reference.
GenAI Application:
Archive Summarizer – GenAI generates brief documentation for archived data (content, date range, any identifiers removed).
Type: Word automation (summarization)
Steps to Implement:
1. Before archiving a dataset, feed a sample or metadata to LLM and prompt for a summary of its contents and significance.
2. LLM produces a concise description (e.g. “Contains 50k retail customer records from 2010-2015, including names and account balances; data frozen upon account closure”).
3. Store this summary in an archive register or in the metadata of the archive file.
4. If needed, include any special instructions (e.g. “Personal identifiers were removed prior to archive” if that was done – AI can note this if informed).
5. Use the summary later to decide if data needs restoration or if queries arise (“What data do we have from 2014?”).
Prompt:
“Summarize the dataset ‘CustAcct_2010_2015’ that we are archiving. Include what it contains, date range, volume, and any data fields of note. This will go into our archive log.”
6. Data Deletion
DQMF Phase/Activity:
Right to Erasure Request – Processing a customer’s request to delete their personal data (GDPR).
GenAI Application:
Erasure Request Orchestrator – GenAI interprets customer deletion requests and helps compile all relevant data sources, then drafts confirmation.
Type: Document analysis & generation (compliance workflow)
Steps to Implement:
1. Input the customer’s request to the LLM (email/letter content) and possibly the customer’s profile info.
2. LLM extracts key details: customer identity, request scope (e.g. delete all data vs specific accounts).
3. Query LLM (with knowledge base): “List all systems where data for Customer X might reside.” The LLM, augmented with the bank’s data map, identifies systems/tables (accounts, loans, CRM, marketing, etc.).
4. Generate deletion plan: LLM outputs a list: e.g. “Core Banking: delete customer ID 12345 record; CRM: delete lead entry; Data Lake: delete analytics records; Backups: flag for removal.”
5. After execution, prompt LLM to draft response to the customer confirming deletion (and noting any exceptions). Have compliance team review the letter, then send.
Prompt:
“Customer message: ‘I withdraw my consent and request deletion of all my personal data. My name is Jane Doe, client ID 556677.’ Task: Identify all places Jane Doe’s data exists in our systems for deletion.”
DQMF Phase/Activity:
Verification & Audit – Verifying that all targeted data was deleted and documenting compliance.
GenAI Application:
Deletion Audit Summarizer – LLM reviews system deletion logs and produces a human-readable verification report.
Type: Document/Excel automation (summarization)
Steps to Implement:
1. Collect logs/results from deletion jobs (e.g. a log file: “Deleted 100 records from DB X, 50 files from archive storage Y…”).
2. Feed log text to LLM and ask for a summary that indicates completeness and any issues.
3. LLM produces a brief report: e.g. “Deletion completed on 2025-06-01 for Jane Doe: removed 3 records in System A, 2 files in System B archive; no remaining references found. One backup file will expire in next cycle (not immediately deletable).”
4. Attach this summary to the internal ticket or compliance file for the deletion request as evidence.
5. In audits or regulatory inquiries, present this AI-generated summary along with raw logs if needed, to demonstrate thorough action.
Prompt:
“Summarize the following deletion log to confirm all personal data for account 556677 was removed and note any exceptions.”
Conclusion
Generative AI brings practical value to DQMF by enhancing data accuracy, streamlining compliance, and enabling smarter governance in banking. Embracing these tools can significantly strengthen data trust, reduce regulatory risk, and unlock business value across the data lifecycle.
Discover more from Debabrata Pruseth
Subscribe to get the latest posts sent to your email.