NirajanKhadka/RetailIQ

> Natural language analytics assistant over 500K+ UK retail transactions —

RAG-based retail analytics assistant — LangChain, FAISS, Qwen2.5, Azure

★0⑂ 0Jupyter NotebookPush 17d agoListed 7d agoNo license on GitHub

retailiq-app.lemonwater-fbf339a7.eastus.azurecontainerapps.io

No GitHub topics on this repo.

Jupyter Notebook98.1%
Python1.9%
Dockerfile0.0%

View on GitHub

Report a problem

1 Review

thejaycampbell7d ago

RetailIQ is a focused and well-scoped RAG project that does a good job showing the difference between generic document retrieval and analytics-oriented retrieval. The strongest part is the design choice in src/data_prep.py: instead of expecting FAISS to magically answer aggregate questions, the project pre-computes product summaries, top revenue lists, country rankings, return summaries, and dataset KPIs before indexing them. That makes the README’s examples, like “top 5 products by revenue” or “best performing month,” feel technically grounded rather than just demo prompts.

The Streamlit app is also practical: it includes example questions, source chunk visibility, simple input validation, session history, and MLflow logging for latency and retrieved chunks. Those details make the project easier to evaluate as a working assistant rather than just a notebook experiment. The README is unusually complete, with architecture, setup, Docker, Azure deployment steps, screenshots, and an explanation of why pre-computed aggregate chunks matter.

The main thing I would improve is reproducibility polish. The README and source currently show some encoding artifacts, the live demo link text appears to point to a placeholder, and the repo does not include a license, CI workflow, or real automated tests beyond evaluation scripts. src/test.py and src/evaluation.py could become a small pytest suite that verifies data prep outputs, retrieval behavior for known aggregate chunks, and failure handling when the FAISS index or Hugging Face token is missing. I’d also consider documenting exactly which files are generated and which are intentionally excluded, since the Dockerfile expects faiss_index/ and data/processed/chunks.json to exist.

Overall, this is a strong portfolio-style applied ML project. It shows clear thinking about RAG limitations in business analytics and makes the architecture understandable to someone reviewing the code. A license, CI, cleaner encoding, and a few deterministic tests would make it much easier for other developers to trust, run, and extend.