loading page

Sapphire: An NLP-based YouTube video scoring model
  • Srujan Murthy
Srujan Murthy
Author Profile

Abstract

This whitepaper introduces Sapphire, a novel ranking model designed to evaluate YouTube videos based on the comprehensive analysis of their transcripts' corpus. By employing regex operations and identifying the most unique and significant keywords throughout the video content, Sapphire offers a more analytical approach to evaluation, considering the relative importance of individual terms. The primary objective of Sapphire is to address the challenges associated with ranking transcripts based on their rigor, independently of video viewership, which is the conventional approach adopted by the YouTube Watch Time algorithm[1]. Additionally, Sapphire includes transcription based on unique identifier keyword weighting strategies. This paper details Sapphire, exploring key components such as YouTube transcription, text preprocessing, Term Frequency-Inverse Document Frequency (TF-IDF) evaluators, and score assessments.
05 Jun 2024Submitted to Advance
13 Jun 2024Published in Advance