LogoLogo
English
English
  • Welcome
  • About
    • About the CVSA Project
    • Scope of Inclusion
  • Architecture
    • Overview
    • Crawler
    • Database Structure
      • Type of a Song
    • Artificial Intelligence
  • API Doc
    • Catalog
    • Songs
Powered by GitBook

Contents are licensed under CC BY 4.0 if not specified.

On this page
  • The Filter (codename Akari)
  • The Predictor
  • Lyrics Alignment

Was this helpful?

Edit on GitHub
  1. Architecture

Artificial Intelligence

PreviousType of a SongNextCatalog

Last updated 1 month ago

Was this helpful?

CVSA's automated workflow relies heavily on artificial intelligence for information extraction and classification.

The AI ​​systems we currently use are:

The Filter (codename Akari)

Located at /ml/filter/ under project root dir, it classifies a video in the into the following categories:

  • 0: Not related to Chinese vocal synthesis

  • 1: A original song with Chinese vocal synthesis

  • 2: A cover/remix song with Chinese vocal synthesis

We also have some experimental work that is not yet in production:

The Predictor

Located at /ml/pred/under the project root dir, it predicts the future views of a video. This is a regression model that takes historical view trends of a video, other contextual information (such as the current time), and future time points to be predicted as feature inputs, and outputs the increment in the video's view count from "now" to the specified future time point.

Lyrics Alignment

Located at /ml/lab/under the project root dir, it uses and models for phoneme-level and line-level alignment, respectively. The original purpose of this work is to drive the live lyrics feature in our other project: .

MMS wav2vec
Whisper
AquaVox
category 30