Ittybit

In the last article, we built the processing pipeline - creating tasks, handling webhooks, tracking progress. But I glossed over something important: we told Ittybit to "transcode this video to 1080p MP4," and I didn't really explain why those settings, or what other options you might want.

See, here's the thing about video: it's complicated. I mean really complicated. There are codecs (H.264, H.265, VP9), containers (MP4, MOV, MKV), resolutions (720p, 1080p, 4K), frame rates (24fps, 30fps, 60fps), bitrates (constant, variable, adaptive), color spaces... it goes on forever. And every platform has different requirements. YouTube wants one thing. Twitter wants another. Your website wants something else entirely.

When I first started working with video, I spent weeks reading FFmpeg documentation, trying to figure out the right incantation of flags to get a video that looked good but didn't bloat to 10GB. I'd transcode something, upload it, and YouTube would just... re-encode it anyway because I got the profile wrong. Or the audio would be out of sync. Or it would look pixelated on mobile. Video encoding is one of those things where you think you understand it, and then you realize you don't, and then you spend three days debugging why 23.976fps is different from 24fps.

This is exactly the kind of complexity that Ittybit abstracts away. And in this article, we're going to dive deep into video transcoding options and AI-powered transcription. Let's get into it.

Understanding Video Transcoding: Why It Matters

Before we write any code, I want you to understand what transcoding actually does and why you need it.

The problem: Your user records a podcast video in StreamYard. StreamYard gives them a 2.5GB MP4 file at 1920x1080, H.264 codec, 8Mbps bitrate. Sounds great, right?

Wrong. That file is:

Too big for web streaming - 2.5GB for 60 minutes means long load times
Wrong bitrate for YouTube - YouTube recommends 5Mbps for 1080p, not 8Mbps
Possibly wrong container - Some players prefer fragmented MP4
Unoptimized audio - Might have 256kbps AAC when 192kbps would sound identical

You need to transcode: take the input video and re-encode it with optimal settings for its destination.

The YouTube Optimization Strategy

YouTube is our primary video destination, so let's start there. YouTube has official encoding recommendations, but here's what actually matters in practice:

For 1080p uploads:

Container: MP4
Video codec: H.264 (not H.265, despite what you might think)
Resolution: 1920x1080
Frame rate: Match source (typically 30fps or 24fps)
Bitrate: 8Mbps for 30fps, 12Mbps for 60fps
Audio codec: AAC-LC
Audio bitrate: 192kbps
Color space: BT.709

Why H.264 and not the newer H.265 (HEVC)? Because H.265 requires more CPU to decode, and not all devices support it. H.264 is universally supported and YouTube will happily accept it.

Let's update our IttybitService with more sophisticated video transcoding:

<?php

namespace App\Services;

use Illuminate\Support\Facades\Http;
use Illuminate\Support\Facades\Log;

class IttybitService
{
    // ... previous methods ...

    /**
     * Create a YouTube-optimized video transcoding task
     */
    public function createYouTubeVideoTask(
        string $fileId,
        array $options = []
    ): array {
        // Get source file info to match frame rate
        $sourceFile = $this->getFile($fileId);
        $sourceFps = $sourceFile['fps'] ?? 30;

        // Determine optimal bitrate based on frame rate
        $bitrate = $sourceFps > 40
            ? 12000000  // 12Mbps for 60fps content
            : 8000000;  // 8Mbps for 30fps content

        $taskData = [
            'file_id' => $fileId,
            'kind' => 'video',

            // Container and codec
            'format' => 'mp4',
            'codec' => 'h264',
            'profile' => 'high',  // H.264 profile for best quality
            'preset' => 'slow',   // Encoding speed vs quality tradeoff

            // Resolution
            'width' => $options['width'] ?? 1920,
            'height' => $options['height'] ?? 1080,
            'scale' => 'fit',  // Maintain aspect ratio

            // Frame rate - match source unless overridden
            'fps' => $options['fps'] ?? $sourceFps,

            // Bitrate
            'bitrate' => $options['bitrate'] ?? $bitrate,
            'maxrate' => ($options['bitrate'] ?? $bitrate) * 1.5,  // Peak bitrate
            'bufsize' => ($options['bitrate'] ?? $bitrate) * 2,    // Buffer size

            // Audio
            'audio_codec' => 'aac',
            'audio_bitrate' => 192000,  // 192kbps
            'audio_channels' => 2,       // Stereo
            'audio_sample_rate' => 48000, // 48kHz

            // Reference for later retrieval
            'ref' => $options['ref'] ?? 'youtube_video',
        ];

        if ($webhookUrl = config('services.ittybit.webhook_url')) {
            $taskData['webhook_url'] = $webhookUrl;
        }

        return $this->createTask($taskData);
    }

    /**
     * Create a web-optimized video (for embedding on websites)
     */
    public function createWebVideoTask(
        string $fileId,
        array $options = []
    ): array {
        $taskData = [
            'file_id' => $fileId,
            'kind' => 'video',

            // Web video optimizations
            'format' => 'mp4',
            'codec' => 'h264',
            'profile' => 'main',  // Slightly lower than 'high' for better compatibility
            'preset' => 'medium',  // Faster encoding

            // Typically smaller for web
            'width' => $options['width'] ?? 1280,
            'height' => $options['height'] ?? 720,
            'scale' => 'fit',

            // Web video settings
            'fps' => 30,
            'bitrate' => 3500000,  // 3.5Mbps - good quality, reasonable size

            // Important: faststart flag for web streaming
            'movflags' => '+faststart',  // Enables progressive download

            // Audio
            'audio_codec' => 'aac',
            'audio_bitrate' => 128000,  // 128kbps is fine for web
            'audio_channels' => 2,

            'ref' => $options['ref'] ?? 'web_video',
        ];

        if ($webhookUrl = config('services.ittybit.webhook_url')) {
            $taskData['webhook_url'] = $webhookUrl;
        }

        return $this->createTask($taskData);
    }

    /**
     * Create multiple video versions (for adaptive streaming)
     */
    public function createMultiQualityVideoTasks(
        string $fileId,
        string $webhookUrl = null
    ): array {
        $qualities = [
            '1080p' => ['width' => 1920, 'height' => 1080, 'bitrate' => 8000000],
            '720p'  => ['width' => 1280, 'height' => 720,  'bitrate' => 5000000],
            '480p'  => ['width' => 854,  'height' => 480,  'bitrate' => 2500000],
            '360p'  => ['width' => 640,  'height' => 360,  'bitrate' => 1000000],
        ];

        $tasks = [];

        foreach ($qualities as $label => $settings) {
            $taskData = [
                'file_id' => $fileId,
                'kind' => 'video',
                'format' => 'mp4',
                'codec' => 'h264',
                'profile' => 'main',
                'width' => $settings['width'],
                'height' => $settings['height'],
                'scale' => 'fit',
                'fps' => 30,
                'bitrate' => $settings['bitrate'],
                'audio_codec' => 'aac',
                'audio_bitrate' => 128000,
                'audio_channels' => 2,
                'ref' => "video_{$label}",
            ];

            if ($webhookUrl) {
                $taskData['webhook_url'] = $webhookUrl;
            }

            $tasks[$label] = $this->createTask($taskData);
        }

        return $tasks;
    }
}

Let me break down what's happening in these methods:

The createYouTubeVideoTask() method - This is YouTube-specific. Notice how we:

Check the source frame rate and match it (you don't want to convert 24fps to 30fps)
Adjust bitrate based on frame rate (60fps needs more bits)
Use the 'high' H.264 profile for maximum quality
Set maxrate and bufsize for better quality control

The createWebVideoTask() method - For website embedding, we optimize differently:

Lower resolution (720p is fine for most websites)
The movflags: '+faststart' is crucial - it moves metadata to the beginning of the file so videos can start playing before fully downloading
Lower bitrate (web viewers are more sensitive to load times)
Faster encoding preset (medium vs slow)

The createMultiQualityVideoTasks() method - This is for advanced use cases. If you're building a video platform with adaptive streaming, you'd generate multiple quality versions. A user on mobile gets 360p. A user on desktop WiFi gets 1080p. But for our podcast automation platform, we probably don't need this. I'm showing it to illustrate the flexibility.

Updating ProcessEpisodeJob: Smarter Video Handling

Now let's update our processing job to use these new methods and handle different video scenarios:

<?php

namespace App\Jobs;

use App\Models\Episode;
use App\Models\ProcessingTask;
use App\Services\IttybitService;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Log;

class ProcessEpisodeJob implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public $tries = 3;
    public $timeout = 120;

    public function __construct(
        public Episode $episode
    ) {}

    public function handle(IttybitService $ittybit): void
    {
        Log::info("Starting episode processing", [
            'episode_id' => $this->episode->id,
        ]);

        try {
            DB::beginTransaction();

            // Step 1: Get source file info to inform our decisions
            $sourceFile = $ittybit->getFile($this->episode->ittybit_file_id);

            Log::info("Source file details", [
                'episode_id' => $this->episode->id,
                'width' => $sourceFile['width'] ?? null,
                'height' => $sourceFile['height'] ?? null,
                'duration' => $sourceFile['duration'] ?? null,
                'fps' => $sourceFile['fps'] ?? null,
                'filesize' => $sourceFile['filesize'] ?? null,
            ]);

            // Step 2: Create media object if needed
            if (!$this->episode->ittybit_media_id) {
                $media = $ittybit->createMediaFromFile(
                    fileId: $this->episode->ittybit_file_id,
                    metadata: [
                        'title' => $this->episode->title,
                        'description' => $this->episode->description,
                        'episode_id' => $this->episode->id,
                    ]
                );

                $this->episode->update([
                    'ittybit_media_id' => $media['id'],
                ]);
            }

            // Step 3: Audio extraction
            $audioTask = $ittybit->createAudioTask(
                fileId: $this->episode->ittybit_file_id,
                options: [
                    'format' => 'mp3',
                    'bitrate' => 192000,
                    'channels' => 2,
                    'sample_rate' => 44100,  // Standard for MP3
                    'ref' => 'podcast_audio',
                ]
            );

            ProcessingTask::create([
                'episode_id' => $this->episode->id,
                'ittybit_task_id' => $audioTask['id'],
                'type' => 'audio_extraction',
                'status' => $audioTask['status'],
            ]);

            // Step 4: YouTube video - use dedicated method
            $videoTask = $ittybit->createYouTubeVideoTask(
                fileId: $this->episode->ittybit_file_id,
                options: [
                    'ref' => 'youtube_video',
                ]
            );

            ProcessingTask::create([
                'episode_id' => $this->episode->id,
                'ittybit_task_id' => $videoTask['id'],
                'type' => 'video_transcode',
                'status' => $videoTask['status'],
            ]);

            // Step 5: Transcript with speech recognition
            $transcriptTask = $ittybit->createTranscriptTask(
                fileId: $this->episode->ittybit_file_id,
                options: [
                    'language' => 'en',
                    'format' => 'vtt',  // WebVTT format with timestamps
                    'ref' => 'transcript',
                ]
            );

            ProcessingTask::create([
                'episode_id' => $this->episode->id,
                'ittybit_task_id' => $transcriptTask['id'],
                'type' => 'transcript_generation',
                'status' => $transcriptTask['status'],
            ]);

            // Step 6: Update episode
            $this->episode->update([
                'status' => 'processing',
                'processing_started_at' => now(),
            ]);

            DB::commit();

            Log::info("Episode processing initiated", [
                'episode_id' => $this->episode->id,
                'tasks_created' => 3,
            ]);

        } catch (\Exception $e) {
            DB::rollBack();

            Log::error("Failed to initiate processing", [
                'episode_id' => $this->episode->id,
                'error' => $e->getMessage(),
                'trace' => $e->getTraceAsString(),
            ]);

            $this->episode->update(['status' => 'failed']);

            throw $e;
        }
    }
}

The key change here is that we're now fetching the source file information before creating tasks. This lets us make smarter decisions. For example, if the source is 720p, we don't upscale to 1080p. If it's 60fps, we match that frame rate.

AI-Powered Transcription: Speech to Text

Now let's talk about transcription. This is where things get really cool.

Ittybit uses advanced speech recognition models (think Whisper-level quality) to generate transcripts. But transcripts aren't just a wall of text - they come with timestamps, speaker detection, and word-level accuracy scores.

Here's what a transcript task can give you:

WebVTT format (subtitles):

WEBVTT

00:00:00.000 --> 00:00:03.420
Welcome to episode 42 of the podcast.

00:00:03.420 --> 00:00:07.890
Today we're talking about API design and why it matters.

00:00:08.120 --> 00:00:12.450
My guest today is Sarah, who's built APIs for the last decade.

JSON format (structured data):

{
  "transcript": "Welcome to episode 42 of the podcast. Today we're talking about...",
  "segments": [
    {
      "start": 0.0,
      "end": 3.42,
      "text": "Welcome to episode 42 of the podcast."
    },
    {
      "start": 3.42,
      "end": 7.89,
      "text": "Today we're talking about API design and why it matters."
    }
  ],
  "words": [
    { "word": "Welcome", "start": 0.0, "end": 0.42, "confidence": 0.98 },
    { "word": "to", "start": 0.42, "end": 0.54, "confidence": 0.99 }
    // ... more words
  ]
}

Let's enhance our transcript task creation:

<?php

namespace App\Services;

class IttybitService
{
    // ... previous methods ...

    /**
     * Create a transcript task with advanced options
     */
    public function createTranscriptTask(
        string $fileId,
        array $options = []
    ): array {
        $taskData = [
            'file_id' => $fileId,
            'kind' => 'speech',

            // Language detection or explicit language
            'language' => $options['language'] ?? 'en',

            // Output format
            'format' => $options['format'] ?? 'vtt',  // vtt, srt, json, txt

            // Advanced options
            'word_timestamps' => $options['word_timestamps'] ?? true,
            'punctuation' => true,
            'speaker_labels' => $options['speaker_labels'] ?? false,

            'ref' => $options['ref'] ?? 'transcript',
        ];

        if ($webhookUrl = config('services.ittybit.webhook_url')) {
            $taskData['webhook_url'] = $webhookUrl;
        }

        return $this->createTask($taskData);
    }

    /**
     * Create multiple transcript formats at once
     */
    public function createMultiFormatTranscripts(
        string $fileId
    ): array {
        $formats = ['vtt', 'srt', 'json', 'txt'];
        $tasks = [];

        foreach ($formats as $format) {
            $tasks[$format] = $this->createTranscriptTask($fileId, [
                'format' => $format,
                'ref' => "transcript_{$format}",
            ]);
        }

        return $tasks;
    }
}

Why multiple formats?

VTT/SRT: For video subtitles (YouTube, your website)
JSON: For your application (searchable, parseable)
TXT: For blog posts, show notes, AI processing

In practice, I usually request VTT and JSON. VTT for YouTube (which accepts WebVTT subtitles), JSON for everything else.

Parsing and Using Transcripts

When the transcript task completes, we receive a file. Let's build a service to parse and use it:

<?php

namespace App\Services;

class TranscriptService
{
    /**
     * Parse a VTT transcript
     */
    public function parseVtt(string $vttContent): array
    {
        $segments = [];
        $lines = explode("\n", $vttContent);

        $currentSegment = null;

        foreach ($lines as $line) {
            $line = trim($line);

            // Skip header and empty lines
            if (empty($line) || $line === 'WEBVTT') {
                continue;
            }

            // Timestamp line
            if (preg_match('/(\d{2}:\d{2}:\d{2}\.\d{3}) --> (\d{2}:\d{2}:\d{2}\.\d{3})/', $line, $matches)) {
                if ($currentSegment) {
                    $segments[] = $currentSegment;
                }

                $currentSegment = [
                    'start' => $this->timeToSeconds($matches[1]),
                    'end' => $this->timeToSeconds($matches[2]),
                    'text' => '',
                ];
                continue;
            }

            // Text line
            if ($currentSegment) {
                $currentSegment['text'] .= ($currentSegment['text'] ? ' ' : '') . $line;
            }
        }

        if ($currentSegment) {
            $segments[] = $currentSegment;
        }

        return $segments;
    }

    /**
     * Convert VTT timestamp to seconds
     */
    private function timeToSeconds(string $timestamp): float
    {
        $parts = explode(':', $timestamp);
        $hours = (int) $parts[0];
        $minutes = (int) $parts[1];
        $seconds = (float) $parts[2];

        return ($hours * 3600) + ($minutes * 60) + $seconds;
    }

    /**
     * Extract plain text from transcript segments
     */
    public function extractPlainText(array $segments): string
    {
        return implode(' ', array_column($segments, 'text'));
    }

    /**
     * Generate show notes from transcript (basic version)
     */
    public function generateShowNotes(string $transcriptText, int $maxLength = 500): string
    {
        // Split into sentences
        $sentences = preg_split('/[.!?]+/', $transcriptText);
        $sentences = array_filter(array_map('trim', $sentences));

        // Take first few sentences up to max length
        $showNotes = '';
        foreach ($sentences as $sentence) {
            if (strlen($showNotes) + strlen($sentence) > $maxLength) {
                break;
            }
            $showNotes .= $sentence . '. ';
        }

        return trim($showNotes);
    }

    /**
     * Find mentions of specific topics/keywords
     */
    public function findMentions(array $segments, array $keywords): array
    {
        $mentions = [];

        foreach ($segments as $segment) {
            $text = strtolower($segment['text']);

            foreach ($keywords as $keyword) {
                if (str_contains($text, strtolower($keyword))) {
                    $mentions[] = [
                        'keyword' => $keyword,
                        'timestamp' => $segment['start'],
                        'context' => $segment['text'],
                    ];
                }
            }
        }

        return $mentions;
    }

    /**
     * Generate chapter suggestions based on topic changes
     */
    public function suggestChapters(array $segments, int $minChapterLength = 300): array
    {
        // This is a simplified version
        // In production, you'd use NLP or Ittybit's chapter detection task

        $chapters = [];
        $currentChapterStart = 0;
        $currentChapterText = '';

        foreach ($segments as $index => $segment) {
            $currentChapterText .= ' ' . $segment['text'];

            // Look for topic indicators (questions, transitions)
            $hasQuestion = str_contains($segment['text'], '?');
            $hasTransition = preg_match('/(so|now|next|moving on|let\'s talk about)/i', $segment['text']);

            $timeSinceChapterStart = $segment['start'] - $currentChapterStart;

            if (($hasQuestion || $hasTransition) && $timeSinceChapterStart > $minChapterLength) {
                // Extract a chapter title (first few words)
                $words = explode(' ', trim($currentChapterText));
                $title = implode(' ', array_slice($words, 0, 6)) . '...';

                $chapters[] = [
                    'timestamp' => $currentChapterStart,
                    'title' => $title,
                    'duration' => $timeSinceChapterStart,
                ];

                $currentChapterStart = $segment['start'];
                $currentChapterText = '';
            }
        }

        return $chapters;
    }
}

Now let's update our CompleteEpisodeProcessingJob to use this service:

<?php

namespace App\Jobs;

use App\Models\Episode;
use App\Services\IttybitService;
use App\Services\TranscriptService;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Illuminate\Support\Facades\Log;
use Illuminate\Support\Facades\Storage;

class CompleteEpisodeProcessingJob implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public function __construct(
        public Episode $episode
    ) {}

    public function handle(
        IttybitService $ittybit,
        TranscriptService $transcriptService
    ): void {
        Log::info("Completing episode processing", [
            'episode_id' => $this->episode->id,
        ]);

        // Get completed tasks
        $audioTask = $this->episode->processingTasks()
            ->where('type', 'audio_extraction')
            ->where('status', 'completed')
            ->first();

        $videoTask = $this->episode->processingTasks()
            ->where('type', 'video_transcode')
            ->where('status', 'completed')
            ->first();

        $transcriptTask = $this->episode->processingTasks()
            ->where('type', 'transcript_generation')
            ->where('status', 'completed')
            ->first();

        $updates = [
            'status' => 'processed',
            'processing_completed_at' => now(),
        ];

        // Store audio URL
        if ($audioTask) {
            $updates['audio_file_id'] = $audioTask->output_file_id;
            $updates['audio_url'] = $audioTask->output_url;
        }

        // Store video URL
        if ($videoTask) {
            $updates['video_file_id'] = $videoTask->output_file_id;
            $updates['video_url'] = $videoTask->output_url;
        }

        // Process transcript
        if ($transcriptTask) {
            try {
                // Download transcript file
                $transcriptContent = file_get_contents($transcriptTask->output_url);

                // Parse VTT format
                $segments = $transcriptService->parseVtt($transcriptContent);

                // Extract plain text
                $plainText = $transcriptService->extractPlainText($segments);

                // Generate basic show notes
                $showNotes = $transcriptService->generateShowNotes($plainText, 500);

                // Store in database
                $updates['transcript'] = $plainText;
                $updates['description'] = $this->episode->description ?? $showNotes;

                // Store structured transcript as JSON file
                $transcriptData = [
                    'segments' => $segments,
                    'plain_text' => $plainText,
                    'duration' => end($segments)['end'] ?? 0,
                    'word_count' => str_word_count($plainText),
                ];

                Storage::put(
                    "transcripts/{$this->episode->id}.json",
                    json_encode($transcriptData, JSON_PRETTY_PRINT)
                );

                Log::info("Transcript processed", [
                    'episode_id' => $this->episode->id,
                    'segments' => count($segments),
                    'word_count' => $transcriptData['word_count'],
                ]);

            } catch (\Exception $e) {
                Log::error("Failed to process transcript", [
                    'episode_id' => $this->episode->id,
                    'error' => $e->getMessage(),
                ]);
            }
        }

        $this->episode->update($updates);

        Log::info("Episode processing completed", [
            'episode_id' => $this->episode->id,
            'audio_ready' => isset($updates['audio_file_id']),
            'video_ready' => isset($updates['video_file_id']),
            'transcript_ready' => isset($updates['transcript']),
        ]);
    }
}

Advanced Feature: Chapter Detection

Here's something really cool: Ittybit can automatically detect chapters in your video using scene detection and content analysis. This is perfect for podcasts where topics change throughout the episode.

Let's add chapter detection to our pipeline:

<?php

namespace App\Services;

class IttybitService
{
    // ... previous methods ...

    /**
     * Create a chapter detection task
     */
    public function createChaptersTask(
        string $fileId,
        array $options = []
    ): array {
        $taskData = [
            'file_id' => $fileId,
            'kind' => 'chapters',

            // Minimum chapter length in seconds
            'min_duration' => $options['min_duration'] ?? 60,

            // Use transcript for better chapter naming
            'use_transcript' => $options['use_transcript'] ?? true,

            'ref' => $options['ref'] ?? 'chapters',
        ];

        if ($webhookUrl = config('services.ittybit.webhook_url')) {
            $taskData['webhook_url'] = $webhookUrl;
        }

        return $this->createTask($taskData);
    }
}

Update the ProcessEpisodeJob to include chapter detection:

// After creating transcript task, add:

// Step 6: Chapter detection
$chaptersTask = $ittybit->createChaptersTask(
    fileId: $this->episode->ittybit_file_id,
    options: [
        'min_duration' => 60,  // 1-minute minimum
        'use_transcript' => true,
        'ref' => 'chapters',
    ]
);

ProcessingTask::create([
    'episode_id' => $this->episode->id,
    'ittybit_task_id' => $chaptersTask['id'],
    'type' => 'chapter_detection',
    'status' => $chaptersTask['status'],
]);

Don't forget to update your migration to include the new task type:

Schema::create('processing_tasks', function (Blueprint $table) {
    // ...
    $table->enum('type', [
        'audio_extraction',
        'video_transcode',
        'transcript_generation',
        'chapter_detection',  // Add this
    ]);
    // ...
});

When the chapter detection task completes, you'll get a JSON file with chapters:

{
  "chapters": [
    {
      "timestamp": 0,
      "title": "Introduction and Welcome",
      "duration": 180
    },
    {
      "timestamp": 180,
      "title": "API Design Principles",
      "duration": 420
    },
    {
      "timestamp": 600,
      "title": "Authentication Best Practices",
      "duration": 360
    }
  ]
}

These chapters are perfect for:

YouTube chapter markers
Podcast apps that support chapters (Overcast, Pocket Casts)
Your website's episode page
Allowing users to jump to specific topics

Enhanced Episode Status Endpoint

Let's update our episode show endpoint to include all this rich data:

public function show(Request $request, string $id): JsonResponse
{
    $episode = Episode::with('processingTasks')
        ->where('id', $id)
        ->where('user_id', $request->user()->id)
        ->firstOrFail();

    // Load structured transcript if available
    $transcriptData = null;
    if (Storage::exists("transcripts/{$episode->id}.json")) {
        $transcriptData = json_decode(
            Storage::get("transcripts/{$episode->id}.json"),
            true
        );
    }

    // Load chapters if available
    $chaptersTask = $episode->processingTasks()
        ->where('type', 'chapter_detection')
        ->where('status', 'completed')
        ->first();

    $chapters = null;
    if ($chaptersTask && $chaptersTask->output_url) {
        $chaptersContent = file_get_contents($chaptersTask->output_url);
        $chapters = json_decode($chaptersContent, true);
    }

    return response()->json([
        'id' => $episode->id,
        'title' => $episode->title,
        'description' => $episode->description,
        'status' => $episode->status,
        'created_at' => $episode->created_at,
        'processing_started_at' => $episode->processing_started_at,
        'processing_completed_at' => $episode->processing_completed_at,

        'outputs' => [
            'audio' => [
                'url' => $episode->audio_url,
                'file_id' => $episode->audio_file_id,
            ],
            'video' => [
                'url' => $episode->video_url,
                'file_id' => $episode->video_file_id,
            ],
            'transcript' => $transcriptData ? [
                'text' => $transcriptData['plain_text'],
                'segments' => $transcriptData['segments'],
                'word_count' => $transcriptData['word_count'],
                'duration' => $transcriptData['duration'],
            ] : null,
            'chapters' => $chapters,
        ],

        'tasks' => $episode->processingTasks->map(function ($task) {
            return [
                'type' => $task->type,
                'status' => $task->status,
                'error' => $task->error_message,
                'updated_at' => $task->updated_at,
            ];
        }),
    ]);
}

Now when you hit GET /episodes/{id}, you get everything:

{
  "id": "9b1f4c4e-...",
  "title": "How to Build APIs That Don't Suck",
  "status": "processed",
  "outputs": {
    "audio": {
      "url": "https://you.ittybit.net/.../audio.mp3"
    },
    "video": {
      "url": "https://you.ittybit.net/.../video.mp4"
    },
    "transcript": {
      "text": "Welcome to episode 42...",
      "segments": [...],
      "word_count": 8420,
      "duration": 3600
    },
    "chapters": [
      {
        "timestamp": 0,
        "title": "Introduction",
        "duration": 180
      }
    ]
  }
}

Beautiful.

The DIY Reality Check

Let me tell you what this would look like if you were doing it yourself:

Video transcoding:

Set up FFmpeg servers (install, configure, secure)
Write wrapper scripts for different use cases
Handle edge cases (corrupt files, unsupported codecs, etc.)
Implement progress tracking
Manage compute resources (don't let one job starve others)
Test on dozens of input formats
Update when YouTube changes recommendations

Time investment: 2-3 weeks, plus ongoing maintenance.

Speech-to-text:

Choose a provider (AWS Transcribe, Google Speech-to-Text, Azure, OpenAI Whisper)
Integrate their API
Handle audio pre-processing (some services require specific formats)
Parse their output formats (they're all different)
Implement retry logic for failures
Manage costs (some charge per minute)
Handle language detection

Time investment: 1-2 weeks, plus API cost management.

Chapter detection:

Scene detection algorithms (comparing frames)
Content analysis (what's changing?)
NLP on transcript (topic modeling)
Heuristics for good chapter breaks
Testing and tuning

Time investment: 2-4 weeks if you're good. More if you're not.

Total DIY investment: 5-9 weeks of development, ongoing maintenance, infrastructure costs.

With Ittybit: The code we've written today. Maybe 2-3 days of work. And it's production-ready.

The time savings are obvious. But here's the part people often miss: the cognitive load savings. I don't have to think about FFmpeg flags. I don't have to debug why scene detection is failing on certain videos. I don't have to worry about keeping up with YouTube's evolving recommendations. I just tell Ittybit what I want, and it handles the complexity.

What We've Built

Let's take stock:

✅ YouTube-optimized video transcoding - Proper bitrates, codecs, frame rates ✅ Web-optimized video - Smaller, faster, with progressive download ✅ Multi-quality support - Ready for adaptive streaming if needed ✅ AI-powered transcription - Speech-to-text with timestamps ✅ Transcript parsing - VTT to structured data ✅ Show notes generation - Automatic from transcript ✅ Chapter detection - AI-powered topic segmentation ✅ Rich API responses - Everything a client needs

We now have processed videos, pristine audio, accurate transcripts, and even chapters. In the next article, we'll take all this goodness and distribute it to the world: uploading to Transistor and YouTube, handling OAuth, managing metadata, and making sure everything publishes correctly.

What's Next

In Part 4, we'll tackle automated distribution:

Uploading audio to Transistor (podcast hosting)
Uploading video to YouTube with chapters and subtitles
OAuth flows for user authentication
Handling rate limits and quotas
Retry strategies for failed uploads
Notification systems

We're almost there. Our episodes are processed. Now we just need to publish them. See you in the next one.

Building a Podcast Automation Platform with Ittybit