View this page on GitHub

Homework #3

Brief Notes

  • See the zip for the the proposal text (.md) and system context diagram

Zach AI Agent feedback

The feedback is detailed and may be helpful in the future

  Dan — strong submission with a real, personal use case, a concrete current pipeline, and a thoughtful set of enhancements. You clearly understand the data flow, external dependencies, and asynchronous processing you’ll need. The safety section is unusually thorough for a capstone; nice work.

### Rubric check (pass criteria)

* Diagram present as an image: Yes. Clear context diagram with many subsystems (Web Client, App Server, App DB/Postgres+pgvector, S3/Minio, Async Job Engine, Redis, Job Server, Deepgram, OpenAI, Kaspa Node/Explorer). More than 3 subsystems: Pass.
* Diagram shows subsystems: Pass.
* AI guardrails for reliability included: Your Markdown includes a well-scoped Guardrails & Safety Controls section: Pass.
* “How you will make your AI reliable” explicitly in the diagram: Not explicitly shown. The diagram does not label guardrail/safety components or their data flow. The architecture includes them in text, but they should be visualized. See “Required changes” below.
* Markdown includes:
* Why someone should use it: Yes (students, knowledge workers, consultants; recurring problem of unstructured, agenda-less meetings and poor follow-up).
* Business problem: Yes (capture/recall of decisions, owners, follow-ups; search/query over meeting content).
* Next steps: Yes (Web app, richer transcription, summaries, PII suppression, embeddings in Postgres+pgvector, follow-up generation, collaboration, experimental crypto payments).

### What’s strong

* Architecture clarity and completeness. You covered server, DB, object storage, queues, worker processes, and third-party APIs.
* Asynchrony and reliability. Async job engine, Redis, retries, dead-letter, timeouts.
* Data persistence and retrieval with pgvector; RAG stability guardrails and similarity thresholds are good moves.
* Safety/guardrail depth: PII scrubbing, toxicity filtering, role-based access control, secret management, cost controls, fallback behaviors, CI evaluation with a standard test set.
* Thoughtful experimental track for payments, with transaction guardrails.

### Gaps and risks to address

1. Reliability/guardrails not visible in the diagram
  * The rubric specifically asks for “How you are going to make your AI reliable” in the diagram. You have this in text, but it needs to be diagrammed.
  * Add and label:
    * Safety middleware in the App Server (Prompt builder + policy engine).
    * Output Filter service (PII scrubber, toxicity filter).
    * Retrieval service with similarity-threshold gate before LLM calls.
    * Secrets/Vault component for keys.
    * Observability stack (structured logging, metrics, alerts).
    * Dead-letter queue and retry policies attached to job stages.
2. Differentiation and value clarity
  * This space is crowded (Zoom AI Companion, Otter, Fireflies, Fathom). Your proposal is good, but you should sharpen “why this vs. incumbents.”
  * Suggestions to improve value:
    * Domain-specific meeting packs: e.g., Agile sprint reviews, sales calls, research interviews, legal consults—each with tuned prompts, artifacts, and benchmarks.
    * Action item auto-sync to systems of record (Jira, Linear, Asana, HubSpot) with two-way status updates.
    * Evidence-grounded answers: show citations with speaker/timecode snippets and confidence scores.
    * Meeting hygiene scores (agenda adherence, owner/next steps coverage) and trend analytics across a team.
    * Consent-first workflows: capture, store, and surface consent for all participants. This is a competitive and compliance differentiator.
3. Crypto payments scope and fit
  * Interesting experiment; however, it risks distracting from the core product. Ensure a clean abstraction so the main experience works perfectly without crypto.
  * If kept, clarify:
    * Why Kaspa vs. Stripe/usage metering or other L2s; what specific capability does Kaspa unlock for agent-to-agent?
    * Compliance (jurisdictions, KYC/AML considerations if you ever touch fiat, ToS restrictions).
    * Strict key isolation (HSM/KMS, signing service), observable transaction lifecycle, and strong refund/failure stories.
4. Privacy, consent, and governance
  * You mention redaction and RBAC, which is great. Add:
    * Per-meeting consent capture and storage.
    * Org/tenant boundaries and default retention windows.
    * Audit trails that log who viewed/downloaded transcripts.
    * Optional on-prem or VPC isolation story for sensitive clients.
5. Evaluation and SLOs
  * You list a 20-meeting test set—great start. Add:
    * Concrete target metrics: WER threshold from Deepgram; RAG hit rate/precision@k; summary factuality score; latency SLO per stage; cost per processed hour.
    * Prompt/versioning and canary releases for model changes.
    * Robust prompt-injection defenses (e.g., delimiter enforcement, instruction segregation, context hashing).
6. Product surface
  * Add UI details: speaker diarization labels, searchable timeline with audio-linked captions, selectable redaction levels, and chat answers with citations.
  * Collaboration: Roles (owner, editor, viewer), share links with expiry.


### Suggested “next steps” milestones (with acceptance criteria)

* Milestone 1: Reliable ingestion pipeline v1
  * Accept uploads to S3, validate audio, Deepgram WER < X% on test set, diarization working for ≥2 speakers.
* Milestone 2: RAG QA with citations
  * Embeddings in pgvector, similarity threshold gating, answers show top-3 citations with timecodes.
* Milestone 3: Safety middleware
  * Prompt guard + output filters (PII, toxicity), redaction modes (strict/standard), end-to-end tests for injection attacks.
* Milestone 4: Action items to system-of-record
  * User-confirmed tasks pushed to Jira/Asana with links to transcript segments; round-trip status visible in app.
* Milestone 5: Observability and ops
  * Logs/metrics/dashboards, dead-letter handling, alerting on error-rate spikes and cost ceilings.
* Milestone 6: Optional payments experiment
  * Abstract billing interface; Kaspa path as a plugin behind a policy engine; keys in KMS/HSM; spend caps and rate limits enforced.


### What to update to fully satisfy the rubric and strengthen your grade

* Update the system diagram to explicitly show the reliability and safety components and their data flows:
  * Safety middleware/policy engine in the App Server.
  * Output filtering pipeline.
  * Retrieval service with similarity threshold.
  * Secrets vault.
  * Observability components.
  * Retry/backoff and dead-letter queue.
* Include one sequence diagram (upload → ASR → embed → RAG QA → filters → persist) with failure paths and idempotency strategy.

### If anything in this review setup was unclear and you need to resubmit

* Provide: updated diagram image(s), your revised Markdown, and optional sequence diagram. If you want targeted feedback on differentiation, include a 1-paragraph positioning statement against a named competitor (e.g., Otter) and your top 3 differentiators.
* If you want validation of your evaluation plan, share sample metrics and a snippet of your 20-meeting test set outline.

### Overall

This is a strong, thoughtfully engineered proposal that hits the main rubric items and demonstrates mature thinking about safety, reliability, and operations. The main change needed for completeness is to make the reliability/guardrail path explicit in the diagram and tighten the differentiation story.
  

View this page on GitHub