Naga Prem Sai Pendela — AI Engineer

Work

Featured Projects

Live · AWS EC2 · Docker · CI/CD · May 2026

NexusIQ AI

A 4-agent business intelligence platform that routes questions across SQL, documents, and live web data — then cross-validates answers from different sources and tells you how confident it is.

Live Demo ↗ GitHub ↗

97.7%

RAG Hit@5 retrieval accuracy

0.03%

SQL ↔ PDF cross-validation delta

5–12s

Multi-source validated query time

~60%

Latency cut via parallel agents

The problem

Before

An analyst answering "Why did West region revenue drop?" manually queries the database (45 min), reads PDF reports (2 hrs), checks competitor sites (1.5 hrs), and writes a summary. Total: 2–3 days of work.

→

After

NexusIQ routes the question to all 3 sources simultaneously, cross-validates that database and PDF numbers agree within 1%, and returns a cited, confidence-scored answer in 5–12 seconds.

Architecture

Query router

→

SQL agent

RAG agent

Web agent

→

Fusion agent

→

Cited answer

Agents run in parallel via ThreadPoolExecutor. Fusion cross-validates SQL numbers against PDF text — within 1% = HIGH confidence, deterministic formatting (no LLM call). Results traced to AWS CloudWatch + local JSONL. 100,000-row Supabase DB · 43 PDFs (425 ChromaDB chunks) · 5 live product categories via 4 scraping techniques.

Hard problems — what failed, what worked

Vector search ranked the wrong document #1

❌ Failed

A "Q3 revenue" query returned a "Q4 Outlook" chunk ranked first — the Q4 section mentioned revenue more often, so cosine similarity put it top. Tried ChromaDB metadata filtering ($contains) — not supported. Tried filename-based filtering — missed data inside differently-named files.

✓ What worked

Hybrid BM25 + vector search. BM25 catches exact tokens ("Q3", "2024"), vector catches meaning. Accuracy jumped 33% → 100% on the benchmark set. The hybrid_search() function existed in the codebase but was never called — wiring it in was the fix.

Parallel agents: asyncio vs. multiprocessing vs. ThreadPoolExecutor

❌ asyncio

All agent methods are synchronous. Full async rewrite = 500+ lines changed and introduced a Streamlit event loop conflict.

❌ multiprocessing

Can't pickle LangChain objects. Each process re-initialises SentenceTransformer (80MB, 5–10s per process).

✓ ThreadPoolExecutor

All agents are I/O-bound. GIL doesn't block I/O threads. Zero agent code changes. ~60% latency reduction on multi-source queries.

HIGH confidence answers — skip the LLM entirely

✓ Deterministic formatting

When SQL and PDF numbers match within 1%, the Fusion agent formats the answer directly — 0 LLM calls, ~0.3ms. LLM synthesis only fires for conflicts or low-confidence cases. This made answers faster, more stable, and ~40% cheaper in token cost. The insight: the LLM is for resolving uncertainty, not for presenting facts that are already validated.

Results

Retrieval accuracy

97.7% Hit@5

0.919 Context Recall. Cross-encoder reranker on top of hybrid BM25 + vector search. Benchmarked on 43-query golden eval set.

Cross-validation precision

0.03% delta

SQL $15,399,999 vs PDF "$15.4M" — validated HIGH confidence. Eliminates manual fact-checking across sources.

Test coverage

194 unit tests

+ 12 golden eval cases, 7 offline evals, 43-query RAG benchmark. Tested like a production service, not a demo.

Fault tolerance

Circuit breaker

Gemini → Groq automatic fallback. Zero query failures during recruiter testing. Degraded mode discloses which source was unavailable.

What I'd do next

→Add a FastAPI layer — current Streamlit UI is demo-grade. A REST API makes NexusIQ embeddable in enterprise tools without the Streamlit dependency.

→Compute cost-per-query from the LLM gateway ledger — token data exists, needs a dollar conversion. "$0.003 per multi-source BI query" is a CFO-readable metric.

→OCR support for scanned PDFs — current pipeline handles text-native PDFs only. Adds an entire class of real enterprise documents.

→Migrate ChromaDB into pgvector on Supabase — eliminates a second infrastructure dependency and simplifies the production stack.

Stack

LangGraph ChromaDB Hybrid BM25 + vector Cross-encoder reranker Gemini 2.5 Flash Groq LLaMA 3.3-70B PostgreSQL · Supabase Python · Pandas AWS EC2 · ECR · S3 Docker GitHub Actions CI/CD Caddy · HTTPS CloudWatch LLM gateway telemetry JSONL traces

Live · Render · Streamlit Cloud · March 2026

RevenueIQ AI

An ML-powered decision intelligence platform that transforms 534K e-commerce transactions into actionable retention, growth, and forecasting insights — in 30 seconds instead of 4 hours.

Live Demo ↗ GitHub ↗

95%

Churn prediction F1-score

$4M+

Revenue opportunities surfaced

30s

Full executive report time

7.74×

Query speedup over Pandas

The problem

Before

E-commerce teams waste 4–6 hours every week manually building revenue reports in spreadsheets. By the time the report is written, the opportunities it identifies have already been missed.

→

After

RevenueIQ runs 5 ML models on 534K transactions, identifies 978 at-risk customers, forecasts next 30 days revenue, and generates a plain-English executive summary — in 30 seconds.

Architecture

534K transactions

→

DataCleaner

→

5 ML models

→

DuckDB

→

Groq LLM

→

Streamlit

5 models: Random Forest churn (95% F1) · KMeans segmentation (k=5) · Isolation Forest anomalies · Prophet + ARIMA/ETS forecasting · RFM scoring. ML results stored back to DuckDB as queryable SQL tables — the entire team can query segments without touching Python. Groq LLM (LLaMA 3.3-70B) auto-generates 5 executive insight types from model outputs.

Hard problems — what failed, what worked

Churn model showed 100% accuracy — which was wrong

❌ First version

Initial Random Forest hit 100% accuracy on the test set. Immediately suspicious. Investigation found data leakage: recency features were computed using the full dataset before the train/test split, so the model saw future information during training.

✓ Fixed version

Recomputed recency features strictly from training-period data. Accuracy dropped to 95% F1 — still strong, now trustworthy. The lesson: a model that's too good is wrong, not exceptional. 95% on a clean split is worth more than 100% on a leaky one.

KMeans over RFM — and why k=5 not k=7

Traditional RFM requires manually setting thresholds and uses only 3 features. KMeans discovers natural clusters using 6 features: recency, frequency, monetary value, AOV, lifespan, quantity. This separated 18 ultra-VIP customers worth $96,614 each — customers RFM would have grouped with ordinary Champions.

k=5 rationale

k=7 had a 5.5% better silhouette score. But k=2 → k=5 gave a 50% improvement. Diminishing returns. Five segments map cleanly to actionable personas: VIPs, Loyal, Potential Growers, At-Risk, Lost. Seven creates splits between similar groups that no marketing team will act on distinctly.

DuckDB over Pandas — and why it mattered

❌ Pandas on repeat dashboard loads

Every filter or refresh re-read and re-aggregated the full 534K row CSV. Dashboard cold load: 45 seconds. No columnar storage, no query cache, no optimisation.

✓ DuckDB + st.cache_data

DuckDB is OLAP-optimised — columnar storage, vectorised execution, embedded (zero server). Combined with @st.cache_data, repeat loads dropped from 45s to ~3s. 7.74× faster queries. ML results stored in DuckDB so the whole team can run SQL against segment data without Python.

Results

Customer segmentation

5 KMeans personas

18 VIP Champions ($1.74M revenue, $96K each avg). 1,692 Potential Growers. 978 At-Risk flagged for win-back.

Revenue forecasting

$1.61M / 30 days

Prophet model, 9% MAPE (91% accuracy). Exponential Smoothing MAE $18,365. CFO-ready 4-week revenue outlook.

Anomaly detection

5,249 flagged

Isolation Forest. Anomalies averaged 230 units vs. normal 8 — identified as B2B/wholesale. Wholesale pricing opportunity surfaced.

Report generation

2 hrs → 30 sec

Groq LLM auto-generates executive summary, customer personas, forecast commentary, anomaly explanation, and churn playbook.

What I'd do next

→Connect to a live Shopify or WooCommerce API — current pipeline reads from a static CSV. One file-path change connects it to live data.

→Add cohort-based revenue forecasting — current Prophet model forecasts total revenue. Per-segment forecasting shows which cohort is driving growth vs. decline.

→Build a campaign ROI tracker — feed win-back campaign results back into the churn model to measure actual recovery vs. predicted.

→Replace Streamlit with FastAPI + React frontend — makes the platform embeddable in existing business tool stacks.

Stack

Random Forest KMeans Isolation Forest Prophet ARIMA / ETS Scikit-learn Python · Pandas · NumPy DuckDB Groq LLaMA 3.3-70B Streamlit Plotly Render

Naga Prem Sai
Pendela

Featured Projects

Experience

Education

Certifications

Skills

Let's Connect