The Dev's Story Behind InterviewPilot
The Feedback Problem
Preparing for technical interviews is a strangely feedback-starved process. You can find thousands of practice questions online, but what you can't find is someone to listen to your answer, tell you what you missed, and then push deeper with a follow-up — the way a real interviewer would. I wanted to build a platform that closes that gap: not just a question bank, but a full interview loop that listens, evaluates, and adapts.
That's how InterviewPilot started. The idea was simple — describe a target role, get a tailored set of interview questions with model answers generated by AI, then actually do the interview in your browser with speech, video, and real-time feedback. After each answer, the system generates a contextual follow-up question that probes vague claims or explores trade-offs, simulating the back-and-forth rhythm of a real interview. When you're done, you get a structured review with video playback and a downloadable PDF report.
What It Actually Does
InterviewPilot is a full-stack AI mock interview platform built with Next.js, React, OpenAI, and Supabase. Users pick an interview type — behavioral, technical, or system design — set a difficulty level, and optionally upload a resume or reference material so the AI can personalize its questions. The platform reads each question aloud using text-to-speech, starts a visual countdown, captures the candidate's spoken response via the Web Speech API, and records webcam video for every question. Each answer gets scored by GPT-4o-mini across four competency dimensions, and the AI immediately generates a follow-up question based on what you said. Results are stored per session, displayed in a collapsible review interface with video playback, and exportable as a formatted PDF with embedded QR codes linking to each recording.
Fire-and-Forget Video Uploads
The trickiest engineering problem was keeping the interview flow fast while doing expensive work in the background. After each answer, the system needs to do three things: submit the response for AI scoring, upload the recorded video to Supabase Storage, and generate a follow-up question. The video upload alone can take one to five seconds depending on recording length, and blocking on it would create an awkward pause between questions. Worse, a network hiccup during upload would break the entire interview flow for a non-critical feature.
The solution was a three-phase pipeline. First, submit the answer synchronously to get immediate AI feedback and a database record ID. Second, kick off the video upload as an unlinked promise chain that patches the video URL column on the already-inserted row when it eventually resolves — no awaiting, no blocking. Third, generate the follow-up question in parallel with the upload. If the upload fails, it fails silently; the answer and feedback are already saved. On the review page, questions without a video URL simply don't show a player, and the PDF generator only renders QR codes where a URL exists. This fire-and-forget pattern trades strict consistency for perceived speed — the user might briefly see a question without its video if they navigate to the review page before uploads finish, but the interview itself never stalls.
Teaching AI to Grade on a Curve
The second challenge was making feedback feel fair. A junior candidate who mentions the right keywords in a loosely structured answer deserves encouragement, not the same harsh rubric applied to a senior engineer. On top of that, answers come in through speech recognition, which means they're full of filler words, grammar artifacts, and repeated phrases — noise from the input channel, not from the candidate's actual knowledge.
I solved this with a leniency tier system baked into the AI prompt. Each difficulty level — junior, mid, senior — maps to a calibration label with specific instructions. The junior tier is marked “supportive” and sets a floor: if the answer is on-topic and shows basic understanding, the minimum overall rating is three out of five. The senior tier is “strict” and demands depth, precision, and real-world examples. Orthogonally, the prompt separates technical knowledge from communication clarity and explicitly instructs the model that poor grammar or filler words from speech recognition must not reduce either score. Feedback follows the sandwich method — praise, then correction, then an actionable tip — framed as coaching rather than judgment. The trade-off is that floor constraints compress the scoring range at the lower end, but the alternative — no floors — produced discouraging scores for entry-level candidates who gave reasonable but incomplete answers, which defeats the purpose of a practice tool.
Speaking Two Languages
Bilingual support turned out to be more than just translating UI strings. The entire AI pipeline — question generation prompts, feedback rubrics, follow-up generation — needed parallel versions in English and Korean. Text-to-speech required a dual pipeline: browser-native synthesis works well for English at zero cost, but Korean voices in most browsers are unreliable or nonexistent, so Korean TTS routes through OpenAI's cloud voices with gendered voice selection. The PDF export needed CJK font injection to render Korean characters correctly. Each of these was a small problem, but together they touched nearly every layer of the stack.
Where It Stands
InterviewPilot runs end-to-end today: create an interview, execute it with speech and video, receive difficulty-calibrated AI feedback with multi-turn follow-ups, and export a polished PDF report. The architecture leans on Next.js server actions for all AI and database mutations, Drizzle ORM with PostgreSQL for type-safe data access, Clerk for authentication with defense-in-depth checks, and Supabase Storage for video hosting. Every layer is stateless and independently evolvable — swapping the AI provider, adding new competency dimensions, or extending the speech pipeline doesn't cascade into rewrites elsewhere. The project sits at the intersection of a real product and an engineering exercise, and building it reinforced a lesson I keep relearning: the hardest problems in user-facing AI are not the AI calls themselves, but the UX choreography around them.
Castory