Tag Index

2D keypoint detection (1)

(OpenPose) Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

December 12, 2023

2D pose estimation (1)

(OpenPose) Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

December 12, 2023

3D deep learning (7)

(BakedSDF) Meshing Neural SDFs for Real-Time View Synthesis

December 18, 2023

(Re-ReND) Real-time Rendering of NeRFs across Devices

November 22, 2023

(MobileNeRF) Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures

September 14, 2023

(Instant-NGP) Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

July 11, 2023

Real-Time Neural Light Field on Mobile Devices

June 30, 2023

(R2L) Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis

June 29, 2023

(NeRF) Representing Scenes as Neural Radiance Fields for View Synthesis

June 27, 2023

AIOps (4)

Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies

January 15, 2025

(Penetrative AI) Making LLMs Comprehend the Physical World

January 14, 2025

(LLMSense) Harnessing LLMs for High-level Reasoning Over Spatiotemporal Sensor Traces

January 14, 2025

(LogParser-LLM) Advancing Efficient Log Parsing with Large Language Models

January 13, 2025

AWQ (1)

(AWQ) ACTIVATION-AWARE WEIGHT QUANTIZATION FOR ON-DEVICE LLM COMPRESSION AND ACCELERATION

December 6, 2024

Artifacts (1)

Systematic Analysis and Removal of Circular Artifacts for StyleGAN

December 2, 2022

Attention Module (1)

Coordinate Attention for Efficient Mobile Network Design

December 5, 2022

Audio Fingerprinting (2)

MUSIC AUGMENTATION AND DENOISING FOR PEAK-BASED AUDIO FINGERPRINTING

February 19, 2024

(SpectroMap) Peak detection algorithm for audio fingerprinting

February 19, 2024

CLIP (1)

(SemPLeS) Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation

December 13, 2024

Conditional GAN (1)

(GAN Compression) Efficient Architectures for Interactive Conditional GANs

June 7, 2022

Consistency Regularization (1)

Improved Consistency Regularization for GANs

December 13, 2022

Continual Learning (1)

(Continual Learning of Large Language Models) A Comprehensive Survey

December 3, 2024

DAPT (2)

(Reuse, Don’t Retrain) A Recipe for Continued Pretraining of Language Models

December 2, 2024

CONTINUAL PRE-TRAINING OF LANGUAGE MODELS

November 30, 2024

DPO (1)

Compile Domain Adaptation LLMs Papers (2024)

November 28, 2024

Data augmentation (1)

Image Augmentations for GAN Training

December 12, 2022

Deep Fake (1)

GHOST — A New Face Swap Approach for Image and Video Domains

September 20, 2022

Deep Fakes (1)

Region-Aware Face Swapping

December 26, 2022

Diffusion Model (2)

High-Resolution Image Synthesis with Latent Diffusion Models

February 3, 2023

(InstructPix2Pix) Learning to Follow Image Editing Instructions

January 30, 2023

Discriminator (1)

Improving GANs with A Dynamic Discriminator

October 4, 2022

Distillation (1)

(GAN Compression) Efficient Architectures for Interactive Conditional GANs

June 7, 2022

E5 (1)

Text Embeddings by Weakly-Supervised Contrastive Pre-training

December 15, 2024

Face Swap (3)

Region-Aware Face Swapping

December 26, 2022

(Smooth-Swap) A Simple Enhancement for Face-Swapping with Smoothness

July 25, 2022

(A new face swap method for image and video domains) a technical report

July 20, 2022

Face Swapping (1)

(SimSwap) An Efficient Framework For High Fidelity Face Swapping

March 31, 2022

Face attribute editing (1)

Adaptive semantic attribute decoupling for precise face image editing

September 6, 2021

FaceSwap (2)

(MIGRATING FACE SWAP TO MOBILE DEVICES) A LIGHTWEIGHT FRAMEWORK AND A SUPERVISED TRAINING SOLUTION

June 28, 2022

(MobileFaceSwap) A Lightweight Framework for Video Face Swapping

June 13, 2022

Facial Animation (2)

(SadTalker) Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

October 31, 2023

Thin-Plate Spline Motion Model for Image Animation

October 26, 2022

Facial Attribute Editing (1)

(Arbitrary Facial Attribute Editing) Only Change What You Want

July 5, 2022

FastGAN (1)

Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis

September 27, 2021

GAN (29)

Null-text Inversion for Editing Real Images using Guided Diffusion Models

February 22, 2023

High-Resolution Image Synthesis with Latent Diffusion Models

February 3, 2023

(InstructPix2Pix) Learning to Follow Image Editing Instructions

January 30, 2023

Region-Aware Face Swapping

December 26, 2022

(GraN-GAN) Piecewise Gradient Normalization for Generative Adversarial Networks

December 23, 2022

Improved Consistency Regularization for GANs

December 13, 2022

Image Augmentations for GAN Training

December 12, 2022

Systematic Analysis and Removal of Circular Artifacts for StyleGAN

December 2, 2022

(Pros and Cons of GAN Evaluation Measures) New Developments

November 18, 2022

(Teachers Do More Than Teach) Compressing Image-to-Image Models

November 4, 2022

(Wavelet Knowledge Distillation) Towards Efficient Image-to-Image Translation

October 31, 2022

Thin-Plate Spline Motion Model for Image Animation

October 26, 2022

Improving GANs with A Dynamic Discriminator

October 4, 2022

GHOST — A New Face Swap Approach for Image and Video Domains

September 20, 2022

(Smooth-Swap) A Simple Enhancement for Face-Swapping with Smoothness

July 25, 2022

(A new face swap method for image and video domains) a technical report

July 20, 2022

(Arbitrary Facial Attribute Editing) Only Change What You Want

July 5, 2022

(MIGRATING FACE SWAP TO MOBILE DEVICES) A LIGHTWEIGHT FRAMEWORK AND A SUPERVISED TRAINING SOLUTION

June 28, 2022

(MobileFaceSwap) A Lightweight Framework for Video Face Swapping

June 13, 2022

(GAN Compression) Efficient Architectures for Interactive Conditional GANs

June 7, 2022

(DaGAN) Depth-Aware Generative Adversarial Network for Talking Head Video Generation

April 26, 2022

(SimSwap) An Efficient Framework For High Fidelity Face Swapping

March 31, 2022

(Scaled-YOLOv4) Scaling Cross Stage Partial Network

March 18, 2022

First Order Motion Model for Image Animation

February 16, 2022

(MakeItTalk) Speaker-Aware Talking-Head Animation Rendering

October 26, 2021

(PIRenderer) Controllable Portrait Image Generation via Semantic Neural Rendering

October 7, 2021

Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis

September 27, 2021

Adaptive semantic attribute decoupling for precise face image editing

September 6, 2021

Spatially-invariant Style-codes Controlled Makeup Transfer

September 6, 2021

GAN Compression (3)

(Teachers Do More Than Teach) Compressing Image-to-Image Models

November 4, 2022

(Wavelet Knowledge Distillation) Towards Efficient Image-to-Image Translation

October 31, 2022

(GAN Compression) Efficient Architectures for Interactive Conditional GANs

June 7, 2022

GAN Evaluation (1)

(Pros and Cons of GAN Evaluation Measures) New Developments

November 18, 2022

GPT (1)

(InstructPix2Pix) Learning to Follow Image Editing Instructions

January 30, 2023

Generated Image (1)

Systematic Analysis and Removal of Circular Artifacts for StyleGAN

December 2, 2022

Gradient Normalization (1)

(GraN-GAN) Piecewise Gradient Normalization for Generative Adversarial Networks

December 23, 2022

Image Animation (1)

Thin-Plate Spline Motion Model for Image Animation

October 26, 2022

Image Classification (1)

(Background Splitting) Finding Rare Classes in a Sea of Background

April 5, 2022

Image Editing (1)

Null-text Inversion for Editing Real Images using Guided Diffusion Models

February 22, 2023

Image Generation (3)

Null-text Inversion for Editing Real Images using Guided Diffusion Models

February 22, 2023

High-Resolution Image Synthesis with Latent Diffusion Models

February 3, 2023

(InstructPix2Pix) Learning to Follow Image Editing Instructions

January 30, 2023

Image Synthesis (2)

High-Resolution Image Synthesis with Latent Diffusion Models

February 3, 2023

(InstructPix2Pix) Learning to Follow Image Editing Instructions

January 30, 2023

Image Translation (1)

(SimSwap) An Efficient Framework For High Fidelity Face Swapping

March 31, 2022

Image-based rendering (1)

(ARShoe) Real-Time Augmented Reality Shoe Try-on System on Smartphones

November 22, 2023

Image-to-Image Translation (2)

(Teachers Do More Than Teach) Compressing Image-to-Image Models

November 4, 2022

(Wavelet Knowledge Distillation) Towards Efficient Image-to-Image Translation

October 31, 2022

Image-to-image Translation (1)

(GAN Compression) Efficient Architectures for Interactive Conditional GANs

June 7, 2022

Knowledge Distillation (2)

(Teachers Do More Than Teach) Compressing Image-to-Image Models

November 4, 2022

(Wavelet Knowledge Distillation) Towards Efficient Image-to-Image Translation

October 31, 2022

LLM (29)

(DeepRAG) Thinking to Retrieval Step by Step for Large Language Models

February 12, 2025

(Towards Large Reasoning Models) A Survey of Reinforced Reasoning with Large Language Models

February 10, 2025

(DeepSeek-R1) Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

February 2, 2025

DeepSeek-V3 Technical Report

January 17, 2025

Speculative Decoding

January 16, 2025

Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies

January 15, 2025

(Penetrative AI) Making LLMs Comprehend the Physical World

January 14, 2025

(LLMSense) Harnessing LLMs for High-level Reasoning Over Spatiotemporal Sensor Traces

January 14, 2025

(LogParser-LLM) Advancing Efficient Log Parsing with Large Language Models

January 13, 2025

Nemotron-4 340B Technical Report

January 8, 2025

(Pix2Struct) Screenshot Parsing as Pretraining for Visual Language Understanding

January 2, 2025

(MATCHA) Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

December 30, 2024

(DEPLOT) One-shot visual language reasoning by plot-to-table translation

December 30, 2024

(Retrieval-Augmented Generation for Natural Language Processing) A Survey

December 23, 2024

(DRAGIN) Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models

December 22, 2024

(Retrieval Augmented Generation (RAG) and Beyond) A Comprehensive Survey on How to Make your LLMs use External Data More Wisely

December 16, 2024

Text Embeddings by Weakly-Supervised Contrastive Pre-training

December 15, 2024

(Template Matters) Understanding the Role of Instruction Templatesin Multimodal Language Model Evaluation and Training

December 13, 2024

(AWQ) ACTIVATION-AWARE WEIGHT QUANTIZATION FOR ON-DEVICE LLM COMPRESSION AND ACCELERATION

December 6, 2024

(Continual Learning of Large Language Models) A Comprehensive Survey

December 3, 2024

(Reuse, Don’t Retrain) A Recipe for Continued Pretraining of Language Models

December 2, 2024

CONTINUAL PRE-TRAINING OF LANGUAGE MODELS

November 30, 2024

Compile Domain Adaptation LLMs Papers (2024)

November 28, 2024

(ChipNeMo) Domain-Adapted LLMs for Chip DesignBefore Projection

November 24, 2024

(Large Language Models) A Survey Models

November 18, 2024

A Survey of Resource-Efficient LLM and Multimodal Foundation Models

June 17, 2024

An Introduction to Vision-Language Modeling

June 13, 2024

(Video-LLaVA) Learning United Visual Representation by Alignment Before Projection

March 18, 2024

A Survey on Large Language Models for Recommendation

March 7, 2024

LLM. SteerLM (1)

(SteerLM) Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF

December 10, 2024

LLaVA (1)

(VILA) On Pre-training for Visual Language Models

December 6, 2024

LVLM (3)

A Survey of Resource-Efficient LLM and Multimodal Foundation Models

June 17, 2024

An Introduction to Vision-Language Modeling

June 13, 2024

(Video-LLaVA) Learning United Visual Representation by Alignment Before Projection

March 18, 2024

Language Model (4)

(NaturalSpeech) End-to-End Text to Speech Synthesis with Human-Level Quality

May 19, 2023

(Natural Speech 2) Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

May 12, 2023

(VITS) Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

May 2, 2023

(YourTTS) Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

April 21, 2023

Large Language Model (6)

(Continual Learning of Large Language Models) A Comprehensive Survey

December 3, 2024

(Large Language Models) A Survey Models

November 18, 2024

A Survey of Resource-Efficient LLM and Multimodal Foundation Models

June 17, 2024

An Introduction to Vision-Language Modeling

June 13, 2024

(Video-LLaVA) Learning United Visual Representation by Alignment Before Projection

March 18, 2024

A Survey on Large Language Models for Recommendation

March 7, 2024

Large Vision Language Model (1)

(Video-LLaVA) Learning United Visual Representation by Alignment Before Projection

March 18, 2024

Latent Diffusion (1)

High-Resolution Image Synthesis with Latent Diffusion Models

February 3, 2023

Light Weight (1)

Coordinate Attention for Efficient Mobile Network Design

December 5, 2022

Light weight (1)

(GAN Compression) Efficient Architectures for Interactive Conditional GANs

June 7, 2022

Light weight model (2)

(MIGRATING FACE SWAP TO MOBILE DEVICES) A LIGHTWEIGHT FRAMEWORK AND A SUPERVISED TRAINING SOLUTION

June 28, 2022

(MobileFaceSwap) A Lightweight Framework for Video Face Swapping

June 13, 2022

Lip Sync (2)

(SadTalker) Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

October 31, 2023

Thin-Plate Spline Motion Model for Image Animation

October 26, 2022

Lipsync (1)

(DaGAN) Depth-Aware Generative Adversarial Network for Talking Head Video Generation

April 26, 2022

Log Parsing (1)

(LogParser-LLM) Advancing Efficient Log Parsing with Large Language Models

January 13, 2025

Logs (2)

Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies

January 15, 2025

(Penetrative AI) Making LLMs Comprehend the Physical World

January 14, 2025

MLM (1)

(Template Matters) Understanding the Role of Instruction Templatesin Multimodal Language Model Evaluation and Training

December 13, 2024

Mobile Network (3)

Coordinate Attention for Efficient Mobile Network Design

December 5, 2022

(MIGRATING FACE SWAP TO MOBILE DEVICES) A LIGHTWEIGHT FRAMEWORK AND A SUPERVISED TRAINING SOLUTION

June 28, 2022

(MobileFaceSwap) A Lightweight Framework for Video Face Swapping

June 13, 2022

Music denosing (1)

MUSIC AUGMENTATION AND DENOISING FOR PEAK-BASED AUDIO FINGERPRINTING

February 19, 2024

Music recognition (2)

MUSIC AUGMENTATION AND DENOISING FOR PEAK-BASED AUDIO FINGERPRINTING

February 19, 2024

(SpectroMap) Peak detection algorithm for audio fingerprinting

February 19, 2024

Neural Architecture Search (1)

(GAN Compression) Efficient Architectures for Interactive Conditional GANs

June 7, 2022

Penetrative AI (1)

(Penetrative AI) Making LLMs Comprehend the Physical World

January 14, 2025

Quantitative control (1)

Adaptive semantic attribute decoupling for precise face image editing

September 6, 2021

Quantization (1)

(AWQ) ACTIVATION-AWARE WEIGHT QUANTIZATION FOR ON-DEVICE LLM COMPRESSION AND ACCELERATION

December 6, 2024

RAG (2)

(DeepRAG) Thinking to Retrieval Step by Step for Large Language Models

February 12, 2025

(DRAGIN) Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models

December 22, 2024

Reasoning Models (1)

(Towards Large Reasoning Models) A Survey of Reinforced Reasoning with Large Language Models

February 10, 2025

Reinforced Reasoning (1)

(Towards Large Reasoning Models) A Survey of Reinforced Reasoning with Large Language Models

February 10, 2025

Retrieval Augmented Generation (1)

(DRAGIN) Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models

December 22, 2024

Retrieval Augmented Generation (RAG) (2)

(Retrieval-Augmented Generation for Natural Language Processing) A Survey

December 23, 2024

(Retrieval Augmented Generation (RAG) and Beyond) A Comprehensive Survey on How to Make your LLMs use External Data More Wisely

December 16, 2024

Retrieval-based Language Models (1)

Retrieval-based Language Models and Applications

December 13, 2023

SCGAN (1)

Spatially-invariant Style-codes Controlled Makeup Transfer

September 6, 2021

SFT (1)

(SteerLM) Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF

December 10, 2024

Sensor (2)

(Penetrative AI) Making LLMs Comprehend the Physical World

January 14, 2025

(LLMSense) Harnessing LLMs for High-level Reasoning Over Spatiotemporal Sensor Traces

January 14, 2025

Singing Synthesizers (1)

(Natural Speech 2) Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

May 12, 2023

Speculative Decoding (1)

Speculative Decoding

January 16, 2025

Speech Synthesis (3)

(QuickVC) Any-To-Many Voice Conversion Using Inverse Short-Time Fourier Transform for Faster Conversion

November 7, 2023

LOW-LATENCY REAL-TIME VOICE CONVERSION ON CPU

November 6, 2023

Voice Conversion With Just Nearest Neighbors

June 14, 2023

Stable Diffusion (3)

Null-text Inversion for Editing Real Images using Guided Diffusion Models

February 22, 2023

High-Resolution Image Synthesis with Latent Diffusion Models

February 3, 2023

(InstructPix2Pix) Learning to Follow Image Editing Instructions

January 30, 2023

Style GAN (1)

Systematic Analysis and Removal of Circular Artifacts for StyleGAN

December 2, 2022

Synthetic data (1)

Systematic Analysis and Removal of Circular Artifacts for StyleGAN

December 2, 2022

Talking Face Generation (1)

(SadTalker) Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

October 31, 2023

Talking head video generation (1)

(DaGAN) Depth-Aware Generative Adversarial Network for Talking Head Video Generation

April 26, 2022

Text Embedding (1)

Text Embeddings by Weakly-Supervised Contrastive Pre-training

December 15, 2024

Text-guided Diffusion model (1)

Null-text Inversion for Editing Real Images using Guided Diffusion Models

February 22, 2023

Text-to-Speech (4)

(NaturalSpeech) End-to-End Text to Speech Synthesis with Human-Level Quality

May 19, 2023

(Natural Speech 2) Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

May 12, 2023

(VITS) Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

May 2, 2023

(YourTTS) Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

April 21, 2023

VILA (2)

(NVILA) Efficient Frontier Visual Language Models

December 10, 2024

(VILA) On Pre-training for Visual Language Models

December 6, 2024

VLM (8)

(Pix2Struct) Screenshot Parsing as Pretraining for Visual Language Understanding

January 2, 2025

(MATCHA) Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

December 30, 2024

(DEPLOT) One-shot visual language reasoning by plot-to-table translation

December 30, 2024

(Template Matters) Understanding the Role of Instruction Templatesin Multimodal Language Model Evaluation and Training

December 13, 2024

(NVILA) Efficient Frontier Visual Language Models

December 10, 2024

(VILA) On Pre-training for Visual Language Models

December 6, 2024

A Survey of Resource-Efficient LLM and Multimodal Foundation Models

June 17, 2024

An Introduction to Vision-Language Modeling

June 13, 2024

ViT (1)

Sigmoid Loss for Language Image Pre-Training

December 9, 2024

Video Generation (1)

(SadTalker) Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

October 31, 2023

Video Synthesis (1)

(SadTalker) Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

October 31, 2023

Virtual try-on (1)

(ARShoe) Real-Time Augmented Reality Shoe Try-on System on Smartphones

November 22, 2023

Vision Encoder (1)

Sigmoid Loss for Language Image Pre-Training

December 9, 2024

Vision Language Model (2)

A Survey of Resource-Efficient LLM and Multimodal Foundation Models

June 17, 2024

An Introduction to Vision-Language Modeling

June 13, 2024

Visual Language Reasoning (3)

(Pix2Struct) Screenshot Parsing as Pretraining for Visual Language Understanding

January 2, 2025

(MATCHA) Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

December 30, 2024

(DEPLOT) One-shot visual language reasoning by plot-to-table translation

December 30, 2024

Voice Conversion (7)

(QuickVC) Any-To-Many Voice Conversion Using Inverse Short-Time Fourier Transform for Faster Conversion

November 7, 2023

LOW-LATENCY REAL-TIME VOICE CONVERSION ON CPU

November 6, 2023

Voice Conversion With Just Nearest Neighbors

June 14, 2023

(NaturalSpeech) End-to-End Text to Speech Synthesis with Human-Level Quality

May 19, 2023

(Natural Speech 2) Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

May 12, 2023

(VITS) Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

May 2, 2023

(YourTTS) Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

April 21, 2023

WSSS (1)

(SemPLeS) Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation

December 13, 2024

Zero-shot speech (1)

(Natural Speech 2) Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

May 12, 2023

ai (104)

(DeepRAG) Thinking to Retrieval Step by Step for Large Language Models

February 12, 2025

(Towards Large Reasoning Models) A Survey of Reinforced Reasoning with Large Language Models

February 10, 2025

(DeepSeek-R1) Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

February 2, 2025

DeepSeek-V3 Technical Report

January 17, 2025

Speculative Decoding

January 16, 2025

Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies

January 15, 2025

(Penetrative AI) Making LLMs Comprehend the Physical World

January 14, 2025

(LLMSense) Harnessing LLMs for High-level Reasoning Over Spatiotemporal Sensor Traces

January 14, 2025

(LogParser-LLM) Advancing Efficient Log Parsing with Large Language Models

January 13, 2025

Nemotron-4 340B Technical Report

January 8, 2025

(Pix2Struct) Screenshot Parsing as Pretraining for Visual Language Understanding

January 2, 2025

(MATCHA) Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

December 30, 2024

(DEPLOT) One-shot visual language reasoning by plot-to-table translation

December 30, 2024

(Retrieval-Augmented Generation for Natural Language Processing) A Survey

December 23, 2024

(DRAGIN) Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models

December 22, 2024

(Retrieval Augmented Generation (RAG) and Beyond) A Comprehensive Survey on How to Make your LLMs use External Data More Wisely

December 16, 2024

Text Embeddings by Weakly-Supervised Contrastive Pre-training

December 15, 2024

(Template Matters) Understanding the Role of Instruction Templatesin Multimodal Language Model Evaluation and Training

December 13, 2024

(SemPLeS) Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation

December 13, 2024

(NVILA) Efficient Frontier Visual Language Models

December 10, 2024

(SteerLM) Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF

December 10, 2024

Sigmoid Loss for Language Image Pre-Training

December 9, 2024

(AWQ) ACTIVATION-AWARE WEIGHT QUANTIZATION FOR ON-DEVICE LLM COMPRESSION AND ACCELERATION

December 6, 2024

(VILA) On Pre-training for Visual Language Models

December 6, 2024

(Continual Learning of Large Language Models) A Comprehensive Survey

December 3, 2024

(Reuse, Don’t Retrain) A Recipe for Continued Pretraining of Language Models

December 2, 2024

CONTINUAL PRE-TRAINING OF LANGUAGE MODELS

November 30, 2024

Compile Domain Adaptation LLMs Papers (2024)

November 28, 2024

(ChipNeMo) Domain-Adapted LLMs for Chip DesignBefore Projection

November 24, 2024

(Large Language Models) A Survey Models

November 18, 2024

Torch To TensorRT using Dynamic Batch Size

August 27, 2024

A Survey of Resource-Efficient LLM and Multimodal Foundation Models

June 17, 2024

An Introduction to Vision-Language Modeling

June 13, 2024

(Video-LLaVA) Learning United Visual Representation by Alignment Before Projection

March 18, 2024

A Survey on Large Language Models for Recommendation

March 7, 2024

MUSIC AUGMENTATION AND DENOISING FOR PEAK-BASED AUDIO FINGERPRINTING

February 19, 2024

(SpectroMap) Peak detection algorithm for audio fingerprinting

February 19, 2024

LLM Prompt Engineering

January 3, 2024

(BakedSDF) Meshing Neural SDFs for Real-Time View Synthesis

December 18, 2023

Retrieval-based Language Models and Applications

December 13, 2023

(OpenPose) Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

December 12, 2023

(ARShoe) Real-Time Augmented Reality Shoe Try-on System on Smartphones

November 22, 2023

(Re-ReND) Real-time Rendering of NeRFs across Devices

November 22, 2023

(QuickVC) Any-To-Many Voice Conversion Using Inverse Short-Time Fourier Transform for Faster Conversion

November 7, 2023

LOW-LATENCY REAL-TIME VOICE CONVERSION ON CPU

November 6, 2023

(SadTalker) Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

October 31, 2023

(MobileNeRF) Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures

September 14, 2023

(WhisperX) Time-Accurate Speech Transcription of Long-Form Audio

September 13, 2023

(Whisper) Robust Speech Recognition via Large-Scale Weak Supervision

September 10, 2023

(Instant-NGP) Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

July 11, 2023

Real-Time Neural Light Field on Mobile Devices

June 30, 2023

(R2L) Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis

June 29, 2023

(NeRF) Representing Scenes as Neural Radiance Fields for View Synthesis

June 27, 2023

Voice Conversion With Just Nearest Neighbors

June 14, 2023

(NaturalSpeech) End-to-End Text to Speech Synthesis with Human-Level Quality

May 19, 2023

(Natural Speech 2) Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

May 12, 2023

(VITS) Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

May 2, 2023

(YourTTS) Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

April 21, 2023

Null-text Inversion for Editing Real Images using Guided Diffusion Models

February 22, 2023

High-Resolution Image Synthesis with Latent Diffusion Models

February 3, 2023

(InstructPix2Pix) Learning to Follow Image Editing Instructions

January 30, 2023

Region-Aware Face Swapping

December 26, 2022

(GraN-GAN) Piecewise Gradient Normalization for Generative Adversarial Networks

December 23, 2022

Improved Consistency Regularization for GANs

December 13, 2022

Image Augmentations for GAN Training

December 12, 2022

Coordinate Attention for Efficient Mobile Network Design

December 5, 2022

Systematic Analysis and Removal of Circular Artifacts for StyleGAN

December 2, 2022

(Pros and Cons of GAN Evaluation Measures) New Developments

November 18, 2022

(Teachers Do More Than Teach) Compressing Image-to-Image Models

November 4, 2022

(Wavelet Knowledge Distillation) Towards Efficient Image-to-Image Translation

October 31, 2022

Thin-Plate Spline Motion Model for Image Animation

October 26, 2022

Improving GANs with A Dynamic Discriminator

October 4, 2022

GHOST — A New Face Swap Approach for Image and Video Domains

September 20, 2022

Supervised Contrastive Learning

July 25, 2022

(Smooth-Swap) A Simple Enhancement for Face-Swapping with Smoothness

July 25, 2022

(A new face swap method for image and video domains) a technical report

July 20, 2022

(Arbitrary Facial Attribute Editing) Only Change What You Want

July 5, 2022

(MIGRATING FACE SWAP TO MOBILE DEVICES) A LIGHTWEIGHT FRAMEWORK AND A SUPERVISED TRAINING SOLUTION

June 28, 2022

(MobileFaceSwap) A Lightweight Framework for Video Face Swapping

June 13, 2022

(GAN Compression) Efficient Architectures for Interactive Conditional GANs

June 7, 2022

(DaGAN) Depth-Aware Generative Adversarial Network for Talking Head Video Generation

April 26, 2022

(PP-LiteSeg) A Superior Real-Time Semantic Segmentation Model

April 11, 2022

(Background Splitting) Finding Rare Classes in a Sea of Background

April 5, 2022

(SimSwap) An Efficient Framework For High Fidelity Face Swapping

March 31, 2022

(Scaled-YOLOv4) Scaling Cross Stage Partial Network

March 18, 2022

First Order Motion Model for Image Animation

February 16, 2022

(MakeItTalk) Speaker-Aware Talking-Head Animation Rendering

October 26, 2021

(PIRenderer) Controllable Portrait Image Generation via Semantic Neural Rendering

October 7, 2021

Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis

September 27, 2021

Adaptive semantic attribute decoupling for precise face image editing

September 6, 2021

Spatially-invariant Style-codes Controlled Makeup Transfer

September 6, 2021

(A Continual Learning Survey) Defying forgetting in classification tasks

August 24, 2021

(Image Classification with Deep Learning in the Presence of Noisy Labels) A Survey

June 16, 2021

(Federated Learning in Mobile Edge Networks) A Comprehensive Survey

April 21, 2021

Searching for MobileNetV3

April 14, 2021

A Survey on Visual Transformer

April 2, 2021

Recurrent Feature Reasoning for Image Inpainting

March 3, 2021

(PiiGAN) Generative Adversarial Networks for Pluralistic Image Inpainting

March 2, 2021

(SINet) Extreme Lightweight Portrait Segmentation Networks with Spatial Squeeze Modules and Information Blocking Decoder

February 26, 2021

(TTVOS) Lightweight Video Object Segmentation with Adaptive Template Attention Module and Temporal Consistency Loss

February 22, 2021

Real-time Hair Segmentation and Recoloring on Mobile GPUs

February 19, 2021

(PortraitNet) Real-time Portrait Segmentation Network for Mobile Device

February 18, 2021

Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness

February 17, 2021

Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement

February 16, 2021

augmented reality (1)

(ARShoe) Real-Time Augmented Reality Shoe Try-on System on Smartphones

November 22, 2023

automatic speech recognition (2)

(WhisperX) Time-Accurate Speech Transcription of Long-Form Audio

September 13, 2023

(Whisper) Robust Speech Recognition via Large-Scale Weak Supervision

September 10, 2023

background classification (1)

(Background Splitting) Finding Rare Classes in a Sea of Background

April 5, 2022

computer vision (34)

Null-text Inversion for Editing Real Images using Guided Diffusion Models

February 22, 2023

High-Resolution Image Synthesis with Latent Diffusion Models

February 3, 2023

(InstructPix2Pix) Learning to Follow Image Editing Instructions

January 30, 2023

Region-Aware Face Swapping

December 26, 2022

(GraN-GAN) Piecewise Gradient Normalization for Generative Adversarial Networks

December 23, 2022

Improved Consistency Regularization for GANs

December 13, 2022

Image Augmentations for GAN Training

December 12, 2022

Coordinate Attention for Efficient Mobile Network Design

December 5, 2022

Systematic Analysis and Removal of Circular Artifacts for StyleGAN

December 2, 2022

(Pros and Cons of GAN Evaluation Measures) New Developments

November 18, 2022

(Teachers Do More Than Teach) Compressing Image-to-Image Models

November 4, 2022

(Wavelet Knowledge Distillation) Towards Efficient Image-to-Image Translation

October 31, 2022

Thin-Plate Spline Motion Model for Image Animation

October 26, 2022

Improving GANs with A Dynamic Discriminator

October 4, 2022

GHOST — A New Face Swap Approach for Image and Video Domains

September 20, 2022

Supervised Contrastive Learning

July 25, 2022

(Smooth-Swap) A Simple Enhancement for Face-Swapping with Smoothness

July 25, 2022

(A new face swap method for image and video domains) a technical report

July 20, 2022

(Arbitrary Facial Attribute Editing) Only Change What You Want

July 5, 2022

(MIGRATING FACE SWAP TO MOBILE DEVICES) A LIGHTWEIGHT FRAMEWORK AND A SUPERVISED TRAINING SOLUTION

June 28, 2022

(MobileFaceSwap) A Lightweight Framework for Video Face Swapping

June 13, 2022

(GAN Compression) Efficient Architectures for Interactive Conditional GANs

June 7, 2022

(DaGAN) Depth-Aware Generative Adversarial Network for Talking Head Video Generation

April 26, 2022

(PP-LiteSeg) A Superior Real-Time Semantic Segmentation Model

April 11, 2022

(Background Splitting) Finding Rare Classes in a Sea of Background

April 5, 2022

(SimSwap) An Efficient Framework For High Fidelity Face Swapping

March 31, 2022

(Scaled-YOLOv4) Scaling Cross Stage Partial Network

March 18, 2022

First Order Motion Model for Image Animation

February 16, 2022

(MakeItTalk) Speaker-Aware Talking-Head Animation Rendering

October 26, 2021

(PIRenderer) Controllable Portrait Image Generation via Semantic Neural Rendering

October 7, 2021

(A Continual Learning Survey) Defying forgetting in classification tasks

August 24, 2021

(Image Classification with Deep Learning in the Presence of Noisy Labels) A Survey

June 16, 2021

(Federated Learning in Mobile Edge Networks) A Comprehensive Survey

April 21, 2021

A Survey on Visual Transformer

April 2, 2021

content-based music source retrieve (2)

MUSIC AUGMENTATION AND DENOISING FOR PEAK-BASED AUDIO FINGERPRINTING

February 19, 2024

(SpectroMap) Peak detection algorithm for audio fingerprinting

February 19, 2024

continual learning (3)

(Reuse, Don’t Retrain) A Recipe for Continued Pretraining of Language Models

December 2, 2024

CONTINUAL PRE-TRAINING OF LANGUAGE MODELS

November 30, 2024

(A Continual Learning Survey) Defying forgetting in classification tasks

August 24, 2021

continued pretraining (2)

(Reuse, Don’t Retrain) A Recipe for Continued Pretraining of Language Models

December 2, 2024

Compile Domain Adaptation LLMs Papers (2024)

November 28, 2024

contrastive learning (1)

Supervised Contrastive Learning

July 25, 2022

data privacy (1)

(Federated Learning in Mobile Edge Networks) A Comprehensive Survey

April 21, 2021

domain adaption (3)

CONTINUAL PRE-TRAINING OF LANGUAGE MODELS

November 30, 2024

Compile Domain Adaptation LLMs Papers (2024)

November 28, 2024

(ChipNeMo) Domain-Adapted LLMs for Chip DesignBefore Projection

November 24, 2024

dynamic batch (1)

Torch To TensorRT using Dynamic Batch Size

August 27, 2024

face generation (4)

(Scaled-YOLOv4) Scaling Cross Stage Partial Network

March 18, 2022

First Order Motion Model for Image Animation

February 16, 2022

(MakeItTalk) Speaker-Aware Talking-Head Animation Rendering

October 26, 2021

(PIRenderer) Controllable Portrait Image Generation via Semantic Neural Rendering

October 7, 2021

face swap (1)

GHOST — A New Face Swap Approach for Image and Video Domains

September 20, 2022

facial animation (5)

(DaGAN) Depth-Aware Generative Adversarial Network for Talking Head Video Generation

April 26, 2022

(Scaled-YOLOv4) Scaling Cross Stage Partial Network

March 18, 2022

First Order Motion Model for Image Animation

February 16, 2022

(MakeItTalk) Speaker-Aware Talking-Head Animation Rendering

October 26, 2021

(PIRenderer) Controllable Portrait Image Generation via Semantic Neural Rendering

October 7, 2021

federated learning (1)

(Federated Learning in Mobile Edge Networks) A Comprehensive Survey

April 21, 2021

finetune llm (4)

(Reuse, Don’t Retrain) A Recipe for Continued Pretraining of Language Models

December 2, 2024

CONTINUAL PRE-TRAINING OF LANGUAGE MODELS

November 30, 2024

Compile Domain Adaptation LLMs Papers (2024)

November 28, 2024

(ChipNeMo) Domain-Adapted LLMs for Chip DesignBefore Projection

November 24, 2024

fundamental (1)

Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness

February 17, 2021

generative adversarial networks (2)

Recurrent Feature Reasoning for Image Inpainting

March 3, 2021

(PiiGAN) Generative Adversarial Networks for Pluralistic Image Inpainting

March 2, 2021

hair (1)

Real-time Hair Segmentation and Recoloring on Mobile GPUs

February 19, 2021

image classification (2)

(A Continual Learning Survey) Defying forgetting in classification tasks

August 24, 2021

(Image Classification with Deep Learning in the Presence of Noisy Labels) A Survey

June 16, 2021

image-based rendering (7)

(BakedSDF) Meshing Neural SDFs for Real-Time View Synthesis

December 18, 2023

(Re-ReND) Real-time Rendering of NeRFs across Devices

November 22, 2023

(MobileNeRF) Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures

September 14, 2023

(Instant-NGP) Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

July 11, 2023

Real-Time Neural Light Field on Mobile Devices

June 30, 2023

(R2L) Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis

June 29, 2023

(NeRF) Representing Scenes as Neural Radiance Fields for View Synthesis

June 27, 2023

imbalanced classification (1)

(Background Splitting) Finding Rare Classes in a Sea of Background

April 5, 2022

inpainting (2)

Recurrent Feature Reasoning for Image Inpainting

March 3, 2021

(PiiGAN) Generative Adversarial Networks for Pluralistic Image Inpainting

March 2, 2021

instruction template (1)

(Template Matters) Understanding the Role of Instruction Templatesin Multimodal Language Model Evaluation and Training

December 13, 2024

large language model (2)

LLM Prompt Engineering

January 3, 2024

Retrieval-based Language Models and Applications

December 13, 2023

lifelong learning (1)

(A Continual Learning Survey) Defying forgetting in classification tasks

August 24, 2021

light-weight (1)

Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis

September 27, 2021

lightweight model (1)

Searching for MobileNetV3

April 14, 2021

lip sync (2)

(Scaled-YOLOv4) Scaling Cross Stage Partial Network

March 18, 2022

First Order Motion Model for Image Animation

February 16, 2022

make-up filter (2)

Adaptive semantic attribute decoupling for precise face image editing

September 6, 2021

Spatially-invariant Style-codes Controlled Makeup Transfer

September 6, 2021

ml (104)

(DeepRAG) Thinking to Retrieval Step by Step for Large Language Models

February 12, 2025

(Towards Large Reasoning Models) A Survey of Reinforced Reasoning with Large Language Models

February 10, 2025

(DeepSeek-R1) Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

February 2, 2025

DeepSeek-V3 Technical Report

January 17, 2025

Speculative Decoding

January 16, 2025

Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies

January 15, 2025

(Penetrative AI) Making LLMs Comprehend the Physical World

January 14, 2025

(LLMSense) Harnessing LLMs for High-level Reasoning Over Spatiotemporal Sensor Traces

January 14, 2025

(LogParser-LLM) Advancing Efficient Log Parsing with Large Language Models

January 13, 2025

Nemotron-4 340B Technical Report

January 8, 2025

(Pix2Struct) Screenshot Parsing as Pretraining for Visual Language Understanding

January 2, 2025

(MATCHA) Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

December 30, 2024

(DEPLOT) One-shot visual language reasoning by plot-to-table translation

December 30, 2024

(Retrieval-Augmented Generation for Natural Language Processing) A Survey

December 23, 2024

(DRAGIN) Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models

December 22, 2024

(Retrieval Augmented Generation (RAG) and Beyond) A Comprehensive Survey on How to Make your LLMs use External Data More Wisely

December 16, 2024

Text Embeddings by Weakly-Supervised Contrastive Pre-training

December 15, 2024

(Template Matters) Understanding the Role of Instruction Templatesin Multimodal Language Model Evaluation and Training

December 13, 2024

(SemPLeS) Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation

December 13, 2024

(NVILA) Efficient Frontier Visual Language Models

December 10, 2024

(SteerLM) Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF

December 10, 2024

Sigmoid Loss for Language Image Pre-Training

December 9, 2024

(AWQ) ACTIVATION-AWARE WEIGHT QUANTIZATION FOR ON-DEVICE LLM COMPRESSION AND ACCELERATION

December 6, 2024

(VILA) On Pre-training for Visual Language Models

December 6, 2024

(Continual Learning of Large Language Models) A Comprehensive Survey

December 3, 2024

(Reuse, Don’t Retrain) A Recipe for Continued Pretraining of Language Models

December 2, 2024

CONTINUAL PRE-TRAINING OF LANGUAGE MODELS

November 30, 2024

Compile Domain Adaptation LLMs Papers (2024)

November 28, 2024

(ChipNeMo) Domain-Adapted LLMs for Chip DesignBefore Projection

November 24, 2024

(Large Language Models) A Survey Models

November 18, 2024

Torch To TensorRT using Dynamic Batch Size

August 27, 2024

A Survey of Resource-Efficient LLM and Multimodal Foundation Models

June 17, 2024

An Introduction to Vision-Language Modeling

June 13, 2024

(Video-LLaVA) Learning United Visual Representation by Alignment Before Projection

March 18, 2024

A Survey on Large Language Models for Recommendation

March 7, 2024

MUSIC AUGMENTATION AND DENOISING FOR PEAK-BASED AUDIO FINGERPRINTING

February 19, 2024

(SpectroMap) Peak detection algorithm for audio fingerprinting

February 19, 2024

LLM Prompt Engineering

January 3, 2024

(BakedSDF) Meshing Neural SDFs for Real-Time View Synthesis

December 18, 2023

Retrieval-based Language Models and Applications

December 13, 2023

(OpenPose) Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

December 12, 2023

(ARShoe) Real-Time Augmented Reality Shoe Try-on System on Smartphones

November 22, 2023

(Re-ReND) Real-time Rendering of NeRFs across Devices

November 22, 2023

(QuickVC) Any-To-Many Voice Conversion Using Inverse Short-Time Fourier Transform for Faster Conversion

November 7, 2023

LOW-LATENCY REAL-TIME VOICE CONVERSION ON CPU

November 6, 2023

(SadTalker) Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

October 31, 2023

(MobileNeRF) Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures

September 14, 2023

(WhisperX) Time-Accurate Speech Transcription of Long-Form Audio

September 13, 2023

(Whisper) Robust Speech Recognition via Large-Scale Weak Supervision

September 10, 2023

(Instant-NGP) Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

July 11, 2023

Real-Time Neural Light Field on Mobile Devices

June 30, 2023

(R2L) Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis

June 29, 2023

(NeRF) Representing Scenes as Neural Radiance Fields for View Synthesis

June 27, 2023

Voice Conversion With Just Nearest Neighbors

June 14, 2023

(NaturalSpeech) End-to-End Text to Speech Synthesis with Human-Level Quality

May 19, 2023

(Natural Speech 2) Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

May 12, 2023

(VITS) Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

May 2, 2023

(YourTTS) Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

April 21, 2023

Null-text Inversion for Editing Real Images using Guided Diffusion Models

February 22, 2023

High-Resolution Image Synthesis with Latent Diffusion Models

February 3, 2023

(InstructPix2Pix) Learning to Follow Image Editing Instructions

January 30, 2023

Region-Aware Face Swapping

December 26, 2022

(GraN-GAN) Piecewise Gradient Normalization for Generative Adversarial Networks

December 23, 2022

Improved Consistency Regularization for GANs

December 13, 2022

Image Augmentations for GAN Training

December 12, 2022

Coordinate Attention for Efficient Mobile Network Design

December 5, 2022

Systematic Analysis and Removal of Circular Artifacts for StyleGAN

December 2, 2022

(Pros and Cons of GAN Evaluation Measures) New Developments

November 18, 2022

(Teachers Do More Than Teach) Compressing Image-to-Image Models

November 4, 2022

(Wavelet Knowledge Distillation) Towards Efficient Image-to-Image Translation

October 31, 2022

Thin-Plate Spline Motion Model for Image Animation

October 26, 2022

Improving GANs with A Dynamic Discriminator

October 4, 2022

GHOST — A New Face Swap Approach for Image and Video Domains

September 20, 2022

Supervised Contrastive Learning

July 25, 2022

(Smooth-Swap) A Simple Enhancement for Face-Swapping with Smoothness

July 25, 2022

(A new face swap method for image and video domains) a technical report

July 20, 2022

(Arbitrary Facial Attribute Editing) Only Change What You Want

July 5, 2022

(MIGRATING FACE SWAP TO MOBILE DEVICES) A LIGHTWEIGHT FRAMEWORK AND A SUPERVISED TRAINING SOLUTION

June 28, 2022

(MobileFaceSwap) A Lightweight Framework for Video Face Swapping

June 13, 2022

(GAN Compression) Efficient Architectures for Interactive Conditional GANs

June 7, 2022

(DaGAN) Depth-Aware Generative Adversarial Network for Talking Head Video Generation

April 26, 2022

(PP-LiteSeg) A Superior Real-Time Semantic Segmentation Model

April 11, 2022

(Background Splitting) Finding Rare Classes in a Sea of Background

April 5, 2022

(SimSwap) An Efficient Framework For High Fidelity Face Swapping

March 31, 2022

(Scaled-YOLOv4) Scaling Cross Stage Partial Network

March 18, 2022

First Order Motion Model for Image Animation

February 16, 2022

(MakeItTalk) Speaker-Aware Talking-Head Animation Rendering

October 26, 2021

(PIRenderer) Controllable Portrait Image Generation via Semantic Neural Rendering

October 7, 2021

Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis

September 27, 2021

Adaptive semantic attribute decoupling for precise face image editing

September 6, 2021

Spatially-invariant Style-codes Controlled Makeup Transfer

September 6, 2021

(A Continual Learning Survey) Defying forgetting in classification tasks

August 24, 2021

(Image Classification with Deep Learning in the Presence of Noisy Labels) A Survey

June 16, 2021

(Federated Learning in Mobile Edge Networks) A Comprehensive Survey

April 21, 2021

Searching for MobileNetV3

April 14, 2021

A Survey on Visual Transformer

April 2, 2021

Recurrent Feature Reasoning for Image Inpainting

March 3, 2021

(PiiGAN) Generative Adversarial Networks for Pluralistic Image Inpainting

March 2, 2021

(SINet) Extreme Lightweight Portrait Segmentation Networks with Spatial Squeeze Modules and Information Blocking Decoder

February 26, 2021

(TTVOS) Lightweight Video Object Segmentation with Adaptive Template Attention Module and Temporal Consistency Loss

February 22, 2021

Real-time Hair Segmentation and Recoloring on Mobile GPUs

February 19, 2021

(PortraitNet) Real-time Portrait Segmentation Network for Mobile Device

February 18, 2021

Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness

February 17, 2021

Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement

February 16, 2021

mobile edge networks (1)

(Federated Learning in Mobile Edge Networks) A Comprehensive Survey

April 21, 2021

mobileNetv3 (1)

Searching for MobileNetV3

April 14, 2021

model alignment (1)

(SteerLM) Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF

December 10, 2024

network architecture search (1)

Searching for MobileNetV3

April 14, 2021

noisy label (1)

(Image Classification with Deep Learning in the Presence of Noisy Labels) A Survey

June 16, 2021

pix2pix (1)

(InstructPix2Pix) Learning to Follow Image Editing Instructions

January 30, 2023

portrait (2)

(SINet) Extreme Lightweight Portrait Segmentation Networks with Spatial Squeeze Modules and Information Blocking Decoder

February 26, 2021

(PortraitNet) Real-time Portrait Segmentation Network for Mobile Device

February 18, 2021

prompt engineering (1)

LLM Prompt Engineering

January 3, 2024

real-time segmentation (1)

(PP-LiteSeg) A Superior Real-Time Semantic Segmentation Model

April 11, 2022

recommendation system (1)

A Survey on Large Language Models for Recommendation

March 7, 2024

robustness (1)

Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness

February 17, 2021

scene representation (7)

(BakedSDF) Meshing Neural SDFs for Real-Time View Synthesis

December 18, 2023

(Re-ReND) Real-time Rendering of NeRFs across Devices

November 22, 2023

(MobileNeRF) Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures

September 14, 2023

(Instant-NGP) Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

July 11, 2023

Real-Time Neural Light Field on Mobile Devices

June 30, 2023

(R2L) Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis

June 29, 2023

(NeRF) Representing Scenes as Neural Radiance Fields for View Synthesis

June 27, 2023

segmentation (5)

(SINet) Extreme Lightweight Portrait Segmentation Networks with Spatial Squeeze Modules and Information Blocking Decoder

February 26, 2021

(TTVOS) Lightweight Video Object Segmentation with Adaptive Template Attention Module and Temporal Consistency Loss

February 22, 2021

Real-time Hair Segmentation and Recoloring on Mobile GPUs

February 19, 2021

(PortraitNet) Real-time Portrait Segmentation Network for Mobile Device

February 18, 2021

Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement

February 16, 2021

self-attention (1)

A Survey on Visual Transformer

April 2, 2021

self-supervised learning (1)

Supervised Contrastive Learning

July 25, 2022

semantic segmentation (1)

(PP-LiteSeg) A Superior Real-Time Semantic Segmentation Model

April 11, 2022

speech-to-text (2)

(WhisperX) Time-Accurate Speech Transcription of Long-Form Audio

September 13, 2023

(Whisper) Robust Speech Recognition via Large-Scale Weak Supervision

September 10, 2023

supervised fine-tuning (1)

Compile Domain Adaptation LLMs Papers (2024)

November 28, 2024

task incremental learning (1)

(A Continual Learning Survey) Defying forgetting in classification tasks

August 24, 2021

tensorRT (1)

Torch To TensorRT using Dynamic Batch Size

August 27, 2024

torch to trt (1)

Torch To TensorRT using Dynamic Batch Size

August 27, 2024

transcription (2)

(WhisperX) Time-Accurate Speech Transcription of Long-Form Audio

September 13, 2023

(Whisper) Robust Speech Recognition via Large-Scale Weak Supervision

September 10, 2023

transformer (1)

A Survey on Visual Transformer

April 2, 2021

tts (4)

(NaturalSpeech) End-to-End Text to Speech Synthesis with Human-Level Quality

May 19, 2023

(Natural Speech 2) Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

May 12, 2023

(VITS) Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

May 2, 2023

(YourTTS) Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

April 21, 2023

video (3)

(TTVOS) Lightweight Video Object Segmentation with Adaptive Template Attention Module and Temporal Consistency Loss

February 22, 2021

Real-time Hair Segmentation and Recoloring on Mobile GPUs

February 19, 2021

Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement

February 16, 2021

video generation (4)

(Scaled-YOLOv4) Scaling Cross Stage Partial Network

March 18, 2022

First Order Motion Model for Image Animation

February 16, 2022

(MakeItTalk) Speaker-Aware Talking-Head Animation Rendering

October 26, 2021

(PIRenderer) Controllable Portrait Image Generation via Semantic Neural Rendering

October 7, 2021

video synthesis (4)

(Scaled-YOLOv4) Scaling Cross Stage Partial Network

March 18, 2022

First Order Motion Model for Image Animation

February 16, 2022

(MakeItTalk) Speaker-Aware Talking-Head Animation Rendering

October 26, 2021

(PIRenderer) Controllable Portrait Image Generation via Semantic Neural Rendering

October 7, 2021

view synthesis (7)

(BakedSDF) Meshing Neural SDFs for Real-Time View Synthesis

December 18, 2023

(Re-ReND) Real-time Rendering of NeRFs across Devices

November 22, 2023

(MobileNeRF) Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures

September 14, 2023

(Instant-NGP) Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

July 11, 2023

Real-Time Neural Light Field on Mobile Devices

June 30, 2023

(R2L) Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis

June 29, 2023

(NeRF) Representing Scenes as Neural Radiance Fields for View Synthesis

June 27, 2023

voice recognition (2)

(WhisperX) Time-Accurate Speech Transcription of Long-Form Audio

September 13, 2023

(Whisper) Robust Speech Recognition via Large-Scale Weak Supervision

September 10, 2023

volume rendering (7)

(BakedSDF) Meshing Neural SDFs for Real-Time View Synthesis

December 18, 2023

(Re-ReND) Real-time Rendering of NeRFs across Devices

November 22, 2023

(MobileNeRF) Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures

September 14, 2023

(Instant-NGP) Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

July 11, 2023

Real-Time Neural Light Field on Mobile Devices

June 30, 2023

(R2L) Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis

June 29, 2023

(NeRF) Representing Scenes as Neural Radiance Fields for View Synthesis

June 27, 2023

weakly-supervised semantic segmentation (1)

(SemPLeS) Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation

December 13, 2024