How I Created a Smart Video Clip Extractor

How I Created a Smart Video Clip Extractor

Travel and life vlogs are popular among app users: Those videos are telling, covering all the most attractive parts in a journey or a day. To create such a video first requires great editing efforts to cut out the trivial and meaningless segments in the original video, which used to be a thing of video editing pros.

This is no longer the case. Now we have an array of intelligent mobile apps that can help us automatically extract highlights from a video, so we can focus more on spicing up the video by adding special effects, for example. I opted to use the highlight capability from Video Editor Kit to create my own vlog editor.

How It Works

This capability assesses how appealing video frames are and then extracts the most suitable ones. To this end, it is said that the capability takes into consideration the video properties most concerned by users, a conclusion that is drawn from survey and experience assessment from users. On the basis of this, the highlight capability develops a comprehensive frame assessment scheme that covers various aspects. For example:

Aesthetics evaluation. This aspect is a data set built upon composition, lighting, color, and more, which is the essential part of the capability.

Tags and facial expressions. They represent the frames that are detected and likely to be extracted by the highlight capability, such as frames that contain people, animals, and laughter.

Frame quality and camera movement mode. The capability discards low-quality frames that are blurry, out-of-focus, overexposed, or shaky, to ensure such frames will not impact the quality of the finished video. Amazingly, despite all of these, the highlight capability is able to complete the extraction process in just 2 seconds.

See for yourself how the finished video by the highlight capability compares with the original video.

Demo.gif

Backing Technology

The highlight capability stands out from the crowd by adopting models and a frame assessment scheme that are iteratively optimized. Technically and specifically speaking:

The capability introduces AMediaCodec for hardware decoding and Open Graphics Library (OpenGL) for rendering frames and automatically adjusting the frame dimensions according to the screen dimensions. The capability algorithm uses multiple neural network models. In this way, the capability checks the device model where it runs and then automatically chooses to run on NPU, CPU, or GPU. Consequently, the capability delivers a higher running performance.

To provide the extraction result more quickly, the highlight capability uses the two-stage algorithm of sparse sampling to dense sampling, checks how content distributed among numerous videos, and adopts the frame buffer. All these contribute to a higher efficiency of determining the most attractive video frames. To ensure high performance of the algorithm, the capability adopts the thread pool scheduling and producer-consumer model, to ensure that the video decoder and models can run at the same time.

During the sparse sampling stage, the capability decodes and processes some (up to 15) key frames in a video. The interval between the key frames is no less than 2 seconds. During the dense sampling stage, the algorithm picks out the best key frame and then extracts frames before and after to further analyze the highlighted part of the video.

The extraction result is closely related to the key frame position. The processing result of the highlight capability will not be ideal when the sampling points are not dense enough because, for example, the video does not have enough key frames or the duration is too long (greater than 1 minute). For the capability to deliver optimal performance, it recommends that the duration of the input video be less than 60 seconds.

Let's now move on to how this capability can be integrated.

Integration Process

Preparations

Make necessary preparations before moving on to the next part. Required steps include:

i. Configure the app information in AppGallery Connect.

ii. Integrate the SDK of HMS Core.

iii. Configure obfuscation scripts.

iv. Declare necessary permissions.

Setting up the Video Editing Project

i. Configure the app authentication information by using either an access token or API key.

  • Method 1: Call setAccessToken to set an access token, which is required only once during app startup.
MediaApplication.getInstance().setAccessToken("your access token");
  • Method 2: Call setApiKey to set an API key, which is required only once during app startup.
MediaApplication.getInstance().setApiKey("your ApiKey");

ii. Set a License ID.

This ID is used to manage the usage quotas of Video Editor Kit and must be unique.

MediaApplication.getInstance().setLicenseId("License ID");
  • Initialize the runtime environment of HuaweiVideoEditor.

When creating a video editing project, we first need to create an instance of HuaweiVideoEditor and initialize its runtime environment. When you exit the project, the instance shall be released.

  • Create an instance of HuaweiVideoEditor.
HuaweiVideoEditor editor = HuaweiVideoEditor.create(getApplicationContext());
  • Determine the layout of the preview area.

Such an area renders video images, and this is implemented by SurfaceView within the fundamental capability SDK. Before the area is created, we need to specify its layout.

<LinearLayout    
    android:id="@+id/video_content_layout"    
    android:layout_width="0dp"    
    android:layout_height="0dp"    
    android:background="@color/video_edit_main_bg_color"    
    android:gravity="center"    
    android:orientation="vertical" />
// Specify a preview area.
LinearLayout mSdkPreviewContainer = view.findViewById(R.id.video_content_layout);

// Design the layout of the area.
editor.setDisplay(mSdkPreviewContainer);
  • Initialize the runtime environment. If the license verification fails, LicenseException will be thrown.

​​​​​​​After the HuaweiVideoEditor instance is created, it will not use any system resources, and we need to manually set the initialization time for the runtime environment. Then, the fundamental capability SDK will internally create necessary threads and timers.

try {
        editor.initEnvironment();
   } catch (LicenseException error) { 
        SmartLog.e(TAG, "initEnvironment failed: " + error.getErrorMsg());    
        finish();
        return;
   }

Integrating the Highlight Capability

// Create an object that will be processed by the highlight capability.
HVEVideoSelection hveVideoSelection = new HVEVideoSelection();
// Initialize the engine of the highlight capability.
hveVideoSelection.initVideoSelectionEngine(new HVEAIInitialCallback() {
        @Override
        public void onProgress(int progress) {
        // Callback when the initialization progress is received.
        }
        @Override
        public void onSuccess() {
            // Callback when the initialization is successful.
        }

        @Override
        public void onError(int errorCode, String errorMessage) {
            // Callback when the initialization failed.
        }
});

// After the initialization is successful, extract the highlighted video. filePath indicates the video file path, and duration indicates the desired duration for the highlighted video.
hveVideoSelection.getHighLight(filePath, duration, new HVEVideoSelectionCallback() {
        @Override
        public void onResult(long start) {
            // The highlighted video is successfully extracted.
        }
});

// Release the highlight engine.
hveVideoSelection.releaseVideoSelectionEngine();

Conclusion

The vlog has been playing a vital part in this we-media era since its appearance. In the past, there were just a handful of people who could create a vlog, because the process of picking out the most interesting part from the original video could be so demanding.

Thanks to smart mobile app technology, even video editing amateurs can now create a vlog because much of the process can be completed automatically by an app with the function of highlighted video extraction.

The highlight capability from the Video Editor Kit is one such function. This capability introduces a set of features to deliver incredible results, such as AMediaCodec, OpenGL, neural networks, a two-stage algorithm (sparse sampling to dense sampling), and more. This capability can help create either a highlighted video extractor or build a highlighted video extraction feature in an app.