1: Overview

UQLIPS is a near-duplicate video clip detection and retrieval system. It is designed and implemented to support real-time near-duplicate retrieval from large-scale video databases and online near-duplicate detection from continuous video streams accurately. With recent advances of video processing technologies in both hardware (e.g., wide availability of webcam) and software (e.g., video editing software), the amount of video data has grown rapidly in many fields, such as broadcasting, advertising, filming, personal video archives, and scientific video repositories. In addition, as massive Internet audience are emerging rapidly and technologies become more video friendly (e.g., increasing bandwidth and near-ubiquitous Internet access), video publishing and sharing in various forms, such as social networking websites, blogs, IPTV, mobile TV, and so on, have become popular. The mainstream media is moving to the Web, which is forging many novel applications that produce and consume videos in creative ways. The open nature of online video publishing and sharing as well as its huge popularity give rise to the existence of a sizeable percentage of near-duplicate videos which have tolerable content variations introduced during video acquisition, transformation, and edition. Such a phenomenon imposes heavy demands on near-duplicate detection/retrieval for many novel applications, such as online copyright enforcement, broadcast monitoring/filtering, content tracking/management, Web search results cleansing, etc. For example, online near-duplicate detection can alarm possible copyright infringement immediately for intelligence property protection while the video is streaming. In broadcast domain, advertising clients can monitor TV channels or Web streaming videos to check whether their commercials are actually broadcasted as contracted in the right time slots with right frequencies. Offending content can also be filtered based on near-duplicate detection.

2: Definitions

2.1 Near-duplicate vs. Copy

We define the near-duplicate video clip as follow: a video Vi is a near-duplicate of another video Vj if both are nearly duplicates of each other, but appear differently due to various changes introduced during acquisitions, transformations, and editions.

The definition of near-duplicate video is not completely unambiguous. The judgement of near-duplicate could be subjective. For example, the decision on two videos taken from the same venue but at different time could be ambiguous since the background and foreground scenes could change greatly. Meanwhile, it is also possible that a video becomes more visually similar to a non-duplicate than its near-duplicate after substantial video transformations and editions. However, in many applications, such as TV commercials and news videos, ambiguity can be reduced largely. For example, TV commercials for the same product (or breaking news on the same event) often show strong visual similarity. Figure 1 shows two different versions of a TV commercial, where the acquisition foregrounds are different and contact numbers are also different. They are near-duplicates, but not copies.

In practice, there is always a subjective limit to the tolerable difference between two video clips. The content difference of two near-duplicate videos can be caused by various reasons which can be generalized into the following categories:

(1) Acquisition with different camera setting, view point, light condition, background, foreground, etc; (2) Transformation at sequence level with different encoding parameters, such as frame format, frame rate, resize, shift, flip, crop, color (contrast, brightness, situation, etc), effect (blur, sharpen, etc); (3) Edition at frame level with insertion, deletion, swap and content modification; (4) Mixture of (1), (2) and (3).

Category (1) introduces near-duplicate of different acquisitions/takes. In Category (2), transformation typically happens at video sequence level, i.e., apply change to the whole video. For example, it introduces different encoding schemes which is usually applied at sequence level uniformly. Category (3) usually occurs at frame level, e.g., delete some frames from a TV commercial to fit into required time frame and modify the contact number on related frames for the same TV commercial to be broadcasted in other countries. Category (4) combines all three categories and can populate very large and diverse collections of near-duplicates.

As defined, copies must be transformed from the same origin. Different from copy, near-duplicate is a more general concept which is a superset of copy, by considering greater diversity of variations introduced during capturing time (view point and setting, background, foreground, etc) and editing operations (frame insertion, deletion, swap and modification), in addition to standard video transformations (format, resize, shift, crop, etc). For example, breaking news on the same scene and event taken by different broadcasting corporations are not copies, but near-duplicates without much controversy. Furthermore, existing work mainly focus on transformations on video level, i.e., category (2). The effect of edition operations (e.g., frame deletion for a shorter version for a TV commercial or news) at frame level has not been tested. As a result, we regard copies as a subset of near-duplicates. We will test our detection methods on various near-duplicates with much greater diversity of variations.

2.2 Detection vs. Retrieval

Near-duplicate retrieval aims to retrieve results from an existing database which can be pre-processed and indexed, such as YouTube database. Detection, however, aims to detect near-duplicate subsequences on-the-fly. It has, but does not limit to, the following main distinctive characteristics. First, online video streams are usually coming continuously and not maintained in the database. Only those detected subsequences are maintained for future analysis. However, retrieval is performed on an existing database. Second, online detection requires video signatures to be generated on-the-fly and compared in real-time over continuous video streams. However, all video signatures in a database can be pre-generated and indexed for retrieval. Third, detection is a binary problem which says 1 or 0 (i.e., a near-duplicate is identified if the similarity exceeds a predetermined tolerance value). However, retrieval often involves ranking to return the top-K results. Clearly, online detection addresses more on sequence matching, where compact signature, efficient measure and pruning technique play most important roles.

3: UQLIPS system features

UQLIPS has 3 major functionalities: Upload Search, Query-by-clip Search, and Online Detection.


Fig 2: UQLIPS system.

3.1 Upload Search: Users can upload their query video clips into the system and search their near-duplicates from the database. Query videos' features and signatures are generated while they are being loaded into the system, followed by search function. Ranked results are displayed page-by-page and users can also compare two videos by clicking the videos to view the original videos (Fig 3).  Typically, loading time and feature extraction time depend on the network bandwidth and video lengths/resolutions. Searching the database takes milliseconds - the real time.  

3.2 Query-by-clip Search: Users can click on sample query videos randomly selected from the database to find their near-duplicates from the database. Video features and signatures are pre-processed and the search is performed based on signature comparisons. Ranked results are displayed page-by-page and users can also compare two videos by clicking the videos to view the original videos (Fig 4). Searching the database takes milliseconds only - the real time.

3.3 Online Detection: Users can upload their query videos into the system and provide video streams for the system to detect the near-duplicates of query videos from the video streams while videos are being streamed continuously. Near-duplicate candidates for each query video are returned and can be compared (Fig 5). The system currently supports online detection of multiple query videos for a single stream.

Upload Search
Fig 3: Upload search.

Query-by-clip Search
Fig 4: Query-by-clip search.

Online detection
Fig 5: Online detection.

4: Further notes

(1) The system is tested in both IE and Firefox . The above shown interface is an old version. Users can experience the new version of the interface by trying the system.
(2) Testing database in this version contains thousands of TV commercials and many YouTube clips from few seconds to a few minutes.
(3) The video signatures used for detection (named Video Distance Trajectory) and retrieval (named Bounded Coordinate System) are different. But both are extracted from frames' color histogram features. Other feature spaces can also be used.
(4) Recently, new features, such as online video edition and diverse views have been added in the system.
(5) The technologies behind this system have been protected by US patents.

5: References

1: Zi Huang, Heng Tao Shen, Jie Shao, Xiaofang Zhou and Bin Cui. " Bounded Coordinate System Indexing for Real-time Video Clip Search". ACM Transaction on Information System (TOIS), 2009.
2: Heng Tao Shen, Jie Shao, Zi Huang and Xiaofang Zhou. " Effective and Efficient Query Processing for Video Subsequence Identification ", IEEE Transaction on Knowledge and Data Engineering (TKDE), 2009.
3: Jie Shao, Zi Huang, Heng Tao Shen, Xiaofang Zhou, Ee-Peng Lim and Yijun Li. "Batch Nearest Neighbor Search for Video Retrieval". IEEE Transaction on Multimedia (TMM), 10(3):409-420, 2008. 
4: Jie Shao, Zi Huang, Heng Tao Shen, Jialie Shen and Xiaofang Zhou. "Distribution-based Similarity Measures for Multi-dimensional Point Set Retrieval Applications". In Proceedings of 16th ACM International Conference on Multimedia,(ACM MM) 2008.
5: Heng Tao Shen, Xiaofang Zhou, Zi Huang, and Jie Shao, "Statistical Summarization of Content Features for Fast Near-duplicate Video Detection". In Proceedings of 15th ACM International Conference on Multimedia (ACM MM), pages 164-165, 2007. (demo)
6: Heng Tao Shen, Xiaofang Zhou, Zi Huang, Jie Shao, and Emily Zhou, "UQLIPS: A Real-time Near-duplicate Video Clip Detection System", In Proceedings of 33rd VLDB, pages 849-850, 2007. (demo)
7: Jie Shao, Zi Huang, Heng Tao Shen, Xiaofang Zhou and Yijun Li, "Dynamic Batch Nearest Neighbour Search in Video Retrieval". In Proceedings of 23rd IEEE International Conference on Data Engineering (ICDE), pages 1395-1399, 2007. (Poster)
8: Heng Tao Shen, Beng Chin OOi,  Xiaofang Zhou and Zi Huang. "Towards Effective Indexing for Very Large Video Sequence Database". In Proceedings of 24th ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 730-741, 2005.

Experience the UQLIPS now!