Large-Scale Video Classification Challenge

27 Oct. 2017, ACM Multimedia, Mountain View, CA, USA

Welcome to the website of the Large-Scale Video Classification Challenge workshop. This workshop and challenge aims at exploring new challenges and approaches for large-scale video classification with large number of classes from open source videos in a realistic setting, based upon an extension of Fudan-Columbia Video Dataset (FCVID).

This newly collected dataset contains over 8000 hours of video data from YouTube and Flicker, annotated into 500 categories. The categories cover a wide range of popular topics like social events (e.g., “tailgate party”), procedural events (e.g., “making cake”), objects (e.g., “panda”), scenes (e.g., “beach”), etc. Compared with FCVID, new categories are added to enrich the original hierarchy. For example, 76 new categories are added to "cooking" totaling 93 classes, and 75 new classes are added to "sports". During annotation, multiple labels have been considered as much as possible for each video. When labeling a particular category, categories that are not likely to co-occur are filtered out manually with the remaining labels considered for annotation.

The following components will be publicly available under this challenge:

  • Training Set: over 62,000 temporally untrimmed videos from 500 classes. We also provide pre-extracted features and frames (1 fps).
  • Validation Set: around 15,000 videos with annotations of classes.
  • Test Set: over 78,000 temporally untrimmed videos with withheld ground truth.

We will evaluate the success of the proposed methods based on mean Average Precision (mAP) across all categories. Participants may either submit a notebook paper that briefly describes their system, or a research paper detailing their approach. Notebook papers submitted before Aug. 15 will be included in the workshop proceedings.

Awards will be given based on evaluation performance, and the top performers will be invited to give oral presentations at the workshop.

We greatly appreciate Kwai for providing the awards for our challenge.


14:00 – 18:00, Oct. 27, 2017, Room Lovelace, Computer History Museum

14:00 Overview of the data and challenge results
14:10 Keynote: Zero-example multimedia event search by Prof. Chong-Wah Ngo
15:00 Champion: iDST VENUS
15:15 Team: SARI & QINIU
15:30 Tea break
16:00 Team: ECNU
16:15 Team: CUHK
16:30 Team: Sogang University
16:45 Conclusion and Award Ceremony

Invited Talk: Zero-example multimedia event search

Chong-Wah Ngo, City University of Hong Kong

Different from supervised learning, the queries of zero-example search come with no visual training examples. The queries are described in text with few keywords or a paragraph. Zero-example search depends heavily on the scale and accuracy of concept classifiers in interpreting the semantic content of videos. The general idea is to annotate and index videos with concepts during offline processing, and then to retrieve videos with relevant concepts match to query description. Zero-example search starts since the very beginning of TRECVid in year 2003. Since then, the search systems have grown from indexing around twenty concepts (high-level features) to today’s more than ten of thousands of classifiers. The queries also evolve from finding a specific thing (e.g., find shots of an airplane taking off) to detecting a complex and generic event (e.g., wedding shower), while dataset size expands yearly from less than 200 hours to more than 5,000 hours of videos. This talk present an overview of zero-example search for multimedia events. Interesting problems include (i) how to determine the number of concepts for searching multimedia events, (ii) how to identify query-relevant video fragments for feature pooling, and (iii) whether the result of zero-example search can complement supervised learning.


Chong-Wah Ngo is a professor in the Dept. of Computer Science at the City University of Hong Kong. He received PhD in Computer Science from Hong Kong University of Science & Technology, and MSc and BSc, both in Computer Engineering, from Nanyang Technological University of Singapore. Before joining City University of Hong Kong, he was a postdoctoral scholar in Beckman Institute at the University of Illinois in Urbana‐Champaign. His main research interests include large‐scale multimedia information retrieval, video computing, multimedia mining and visualization. He is the founding leader of video retrieval group (VIREO) at City University, a research team that releases open source softwares, tools and datasets widely used in the multimedia community. He was the associate editor of IEEE Trans. on Multimedia, and has served as guest editor of IEEE MultiMedia, Multimedia Systems, and Multimedia Tools and Applications. Chong-Wah has organized and served as program committee member of numerous international conferences in the area of multimedia. He is on the steering committee of TRECVid and ICMR (Int. Conf. on Multimedia Retrieval). He was conference co-chair of ICMR 2015, program co-chairs of ACM MM 2019, MMM 2018, ICMR 2012, MMM 2012 and PCM 2013. He also served as the chairman of ACM (Hong Kong Chapter) during 2008-2009. He was awarded ACM Distinguished Scientist 2016.


Evaluation Server is now online! Please follow the instructions to submit the results. You are strongly advised to get familiar with the submission process using validation set in the model development phase.

Downloading links have been sent out.

Please email us (zxwu AT your affiliation information in order to receive the link for downloading frames and pre-extracted features. If you need original videos (around 3T) for training, please attach a scanned copy of the signed Agreement form in the email, we will then send you the download instructions at our discretion.

If you want to cite the challenge, please use the following reference:

   author = "Wu, Zuxuan and Jiang, Y.-G. and Davis, Larry S and Chang, Shih-Fu",
   title = "{LSVC2017}: Large-Scale Video Classification Challenge",
   howpublished = "\url{}",
   Year = {2017}} 

If you find the data useful, please use the following reference:

  title={Exploiting feature and class relationships in video categorization with regularized deep neural networks},
  author={Jiang, Yu-Gang and Wu, Zuxuan and Wang, Jun and Xue, Xiangyang and Chang, Shih-Fu},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},

Important dates

23 July Development kit, training and validation subsets, Testing data without ground-truth annotations will be available.
15 August Evaluation server will be available. Notebook submission deadline (optional).
18 September Submission Deadline
20 September Winners Announcement
27 October Workshop Presentation

Paper submission

Submissions may be up to 8 pages long, plus one additional page for references (i.e., the references-only page is not counted to the page limit of 8 pages), formatted according to ACM MM 2017 guidelines for regular papers (using the acm-sigconf template which can be obtained from the  ACM proceedings style page).

Submission site Please select the track "Large-Scale Video Classification Challenge"

About external data

External data (except FCVID) can be used to train the algorithms, however each submission should explicitly cite the data used for model training.


Zuxuan Wu University of Maryland
Yu-Gang Jiang Fudan University
Larry Davis University of Maryland
Shih-Fu Chang Columbia University