david dada logodavid dada logodavid dada logodavid dada logo
    • Home
    • Projects
      • Wildfire Analytics Dashboard
      • MSc. Thesis – Lightweight Video Skimming
      • Movie Review Sentiment
      • WalletWatch – Mobile Finance App
      • Customer Behaviour Prediction
      • Detecting AI-Generated Text
      • Speech to Code
      • Web-App Portfolio
      • Airplane Simulator
    • About
    • Contact
      Wildfire Analytics Dashboard
      May 20, 2024
      Published by David on November 26, 2025
      Categories
      • Uncategorized
      Tags

      MSc. Thesis – Multimodal Video Skimming in Low Production Educational Environments

      ♢
      Summary

      I created the MMVSkim architecture, a lightweight ML system that automatically trims educational videos by detecting the most important moments using both audio and visual cues.

      It generates accurate transcripts, scores each sentence for relevance, and removes filler sections, therefore making long lecture videos shorter while keeping the key learning content intact. 

      This helps educators save time on editing and deliver clearer, more focused video lessons.

      Skills

      Machine Learning • MSc. Thesis • Video Skimming • Video Processing

      Key Highlights

      Abstract

      The explosive growth of educational video content has created both opportunities and challenges for educators and learners. While educational videos have become a cornerstone of modern education, particularly in asynchronous environments, the time-intensive nature of video editing and the cognitive load of processing lengthy content remain significant barriers. This thesis introduces MMVSkim, a novel multimodal video skimming architecture specifically designed for low-production educational environments, where critical information is often conveyed verbally rather than through sophisticated visual production. 

      The MMVSkim architecture employs a dual-metric relevance assessment approach that integrates both textual and visual features. The system leverages advanced speech recognition for accurate transcript generation, deep learning-based text summarization as a reference point for sentence-level relevance evaluation, and shot detection for structural analysis. 

      By assigning weighted importance scores to each sentence based on its semantic significance and visual context, the MMVSkim architecture identifies and preserves essential educational content while eliminating low-information segments such as filler speech, tangential discussions, and extended pauses. Empirical evaluation on the TVSum50 dataset, while not specifically suited for the educational domain, demonstrates the effectiveness of our approach, achieving an average AUC-ROC score of 0.6 for lecture-style videos. 

      This performance, while modest in absolute terms, represents meaningful discrimination ability in educational contexts compared to random selection (0.5). Further analysis reveals that MMVSkim’s specialized design favours educational content, performing significantly better on lecture-style videos than on other video categories.
       
      The primary contributions of this research include:
      1. A lightweight multimodal architecture tailored specifically to educational video content
      2. An adaptive weighting mechanism that balances textual and visual features according to content characteristics
      3. A flexible deletion metric that allows for variable compression ratios while maintaining instructional coherence By providing both automation and customizable parameters, MMVSkim eliminates the time-consuming task of manual video editing while empowering educators to optimize the framework for diverse educational video styles. 

      This contribution to educational technology allows instructors to focus on content creation and student engagement rather than post-production tasks, addressing a critical pain point in increasingly video-centric educational environments.

      Interested in the full paper ?

      View Publication
      Project Demo
      Share

      Related posts

      May 20, 2024

      Wildfire Analytics Dashboard


      Read more
      December 14, 2023

      NYC – Data Ingestion Pipeline


      Read more
      December 12, 2023

      WalletWatch – Mobile Finance App


      Read more

      Reach Me:
      Contact Form

      Find Me:
      Linkedin
      Github