On this page
Assignment #3 - Capstone Project Proposal
View this page on GitHub
Homework #3
Brief Notes
- See the zip for the the proposal text (
.md) and system context diagram
Zach AI Agent feedback
The feedback is detailed and may be helpful in the future
Dan — strong submission with a real, personal use case, a concrete current pipeline, and a thoughtful set of enhancements. You clearly understand the data flow, external dependencies, and asynchronous processing you’ll need. The safety section is unusually thorough for a capstone; nice work.
### Rubric check (pass criteria)
* Diagram present as an image: Yes. Clear context diagram with many subsystems (Web Client, App Server, App DB/Postgres+pgvector, S3/Minio, Async Job Engine, Redis, Job Server, Deepgram, OpenAI, Kaspa Node/Explorer). More than 3 subsystems: Pass.
* Diagram shows subsystems: Pass.
* AI guardrails for reliability included: Your Markdown includes a well-scoped Guardrails & Safety Controls section: Pass.
* “How you will make your AI reliable” explicitly in the diagram: Not explicitly shown. The diagram does not label guardrail/safety components or their data flow. The architecture includes them in text, but they should be visualized. See “Required changes” below.
* Markdown includes:
* Why someone should use it: Yes (students, knowledge workers, consultants; recurring problem of unstructured, agenda-less meetings and poor follow-up).
* Business problem: Yes (capture/recall of decisions, owners, follow-ups; search/query over meeting content).
* Next steps: Yes (Web app, richer transcription, summaries, PII suppression, embeddings in Postgres+pgvector, follow-up generation, collaboration, experimental crypto payments).
### What’s strong
* Architecture clarity and completeness. You covered server, DB, object storage, queues, worker processes, and third-party APIs.
* Asynchrony and reliability. Async job engine, Redis, retries, dead-letter, timeouts.
* Data persistence and retrieval with pgvector; RAG stability guardrails and similarity thresholds are good moves.
* Safety/guardrail depth: PII scrubbing, toxicity filtering, role-based access control, secret management, cost controls, fallback behaviors, CI evaluation with a standard test set.
* Thoughtful experimental track for payments, with transaction guardrails.
### Gaps and risks to address
1. Reliability/guardrails not visible in the diagram
* The rubric specifically asks for “How you are going to make your AI reliable” in the diagram. You have this in text, but it needs to be diagrammed.
* Add and label:
* Safety middleware in the App Server (Prompt builder + policy engine).
* Output Filter service (PII scrubber, toxicity filter).
* Retrieval service with similarity-threshold gate before LLM calls.
* Secrets/Vault component for keys.
* Observability stack (structured logging, metrics, alerts).
* Dead-letter queue and retry policies attached to job stages.
2. Differentiation and value clarity
* This space is crowded (Zoom AI Companion, Otter, Fireflies, Fathom). Your proposal is good, but you should sharpen “why this vs. incumbents.”
* Suggestions to improve value:
* Domain-specific meeting packs: e.g., Agile sprint reviews, sales calls, research interviews, legal consults—each with tuned prompts, artifacts, and benchmarks.
* Action item auto-sync to systems of record (Jira, Linear, Asana, HubSpot) with two-way status updates.
* Evidence-grounded answers: show citations with speaker/timecode snippets and confidence scores.
* Meeting hygiene scores (agenda adherence, owner/next steps coverage) and trend analytics across a team.
* Consent-first workflows: capture, store, and surface consent for all participants. This is a competitive and compliance differentiator.
3. Crypto payments scope and fit
* Interesting experiment; however, it risks distracting from the core product. Ensure a clean abstraction so the main experience works perfectly without crypto.
* If kept, clarify:
* Why Kaspa vs. Stripe/usage metering or other L2s; what specific capability does Kaspa unlock for agent-to-agent?
* Compliance (jurisdictions, KYC/AML considerations if you ever touch fiat, ToS restrictions).
* Strict key isolation (HSM/KMS, signing service), observable transaction lifecycle, and strong refund/failure stories.
4. Privacy, consent, and governance
* You mention redaction and RBAC, which is great. Add:
* Per-meeting consent capture and storage.
* Org/tenant boundaries and default retention windows.
* Audit trails that log who viewed/downloaded transcripts.
* Optional on-prem or VPC isolation story for sensitive clients.
5. Evaluation and SLOs
* You list a 20-meeting test set—great start. Add:
* Concrete target metrics: WER threshold from Deepgram; RAG hit rate/precision@k; summary factuality score; latency SLO per stage; cost per processed hour.
* Prompt/versioning and canary releases for model changes.
* Robust prompt-injection defenses (e.g., delimiter enforcement, instruction segregation, context hashing).
6. Product surface
* Add UI details: speaker diarization labels, searchable timeline with audio-linked captions, selectable redaction levels, and chat answers with citations.
* Collaboration: Roles (owner, editor, viewer), share links with expiry.
### Suggested “next steps” milestones (with acceptance criteria)
* Milestone 1: Reliable ingestion pipeline v1
* Accept uploads to S3, validate audio, Deepgram WER < X% on test set, diarization working for ≥2 speakers.
* Milestone 2: RAG QA with citations
* Embeddings in pgvector, similarity threshold gating, answers show top-3 citations with timecodes.
* Milestone 3: Safety middleware
* Prompt guard + output filters (PII, toxicity), redaction modes (strict/standard), end-to-end tests for injection attacks.
* Milestone 4: Action items to system-of-record
* User-confirmed tasks pushed to Jira/Asana with links to transcript segments; round-trip status visible in app.
* Milestone 5: Observability and ops
* Logs/metrics/dashboards, dead-letter handling, alerting on error-rate spikes and cost ceilings.
* Milestone 6: Optional payments experiment
* Abstract billing interface; Kaspa path as a plugin behind a policy engine; keys in KMS/HSM; spend caps and rate limits enforced.
### What to update to fully satisfy the rubric and strengthen your grade
* Update the system diagram to explicitly show the reliability and safety components and their data flows:
* Safety middleware/policy engine in the App Server.
* Output filtering pipeline.
* Retrieval service with similarity threshold.
* Secrets vault.
* Observability components.
* Retry/backoff and dead-letter queue.
* Include one sequence diagram (upload → ASR → embed → RAG QA → filters → persist) with failure paths and idempotency strategy.
### If anything in this review setup was unclear and you need to resubmit
* Provide: updated diagram image(s), your revised Markdown, and optional sequence diagram. If you want targeted feedback on differentiation, include a 1-paragraph positioning statement against a named competitor (e.g., Otter) and your top 3 differentiators.
* If you want validation of your evaluation plan, share sample metrics and a snippet of your 20-meeting test set outline.
### Overall
This is a strong, thoughtfully engineered proposal that hits the main rubric items and demonstrates mature thinking about safety, reliability, and operations. The main change needed for completeness is to make the reliability/guardrail path explicit in the diagram and tighten the differentiation story.
View this page on GitHub