2D keypoint detection (1) 2D pose estimation (1) 3D deep learning (7) Artifacts (1) Attention Module (1) Audio Fingerprinting (2) Conditional GAN (1) Consistency Regularization (1) DAPT (2) DPO (1) Data augmentation (1) Deep Fake (1) Deep Fakes (1) Diffusion Model (2) Discriminator (1) Distillation (1) Face Swap (3) Face Swapping (1) Face attribute editing (1) FaceSwap (2) Facial Animation (2) Facial Attribute Editing (1) FastGAN (1) GAN (29) GAN Compression (3) GAN Evaluation (1) GPT (1) Generated Image (1) Gradient Normalization (1) Image Animation (1) Image Classification (1) Image Editing (1) Image Generation (3) Image Synthesis (2) Image Translation (1) Image-based rendering (1) Image-to-Image Translation (2) Image-to-image Translation (1) Knowledge Distillation (2) LLM (9) LVLM (3) Language Model (4) Large Language Model (5) Large Vision Language Model (1) Latent Diffusion (1) Light Weight (1) Light weight (1) Light weight model (2) Lip Sync (2) Lipsync (1) Mobile Network (3) Music denosing (1) Music recognition (2) Neural Architecture Search (1) Quantitative control (1) Retrieval-based Language Models (1) SCGAN (1) Singing Synthesizers (1) Speech Synthesis (3) Stable Diffusion (3) Style GAN (1) Synthetic data (1) Talking Face Generation (1) Talking head video generation (1) Text-guided Diffusion model (1) Text-to-Speech (4) VLM (2) Video Generation (1) Video Synthesis (1) Virtual try-on (1) Vision Language Model (2) Voice Conversion (7) Zero-shot speech (1) ai (79) augmented reality (1) automatic speech recognition (2) background classification (1) computer vision (34) content-based music source retrieve (2) continual learning (3) continued pretraining (2) contrastive learning (1) data privacy (1) domain adaption (3) dynamic batch (1) face generation (4) face swap (1) facial animation (5) federated learning (1) finetune llm (4) fundamental (1) generative adversarial networks (2) hair (1) image classification (2) image-based rendering (7) imbalanced classification (1) inpainting (2) large language model (2) lifelong learning (1) light-weight (1) lightweight model (1) lip sync (2) make-up filter (2) ml (79) mobile edge networks (1) mobileNetv3 (1) network architecture search (1) noisy label (1) pix2pix (1) portrait (2) prompt engineering (1) real-time segmentation (1) recommendation system (1) robustness (1) scene representation (7) segmentation (5) self-attention (1) self-supervised learning (1) semantic segmentation (1) speech-to-text (2) supervised fine-tuning (1) task incremental learning (1) tensorRT (1) torch to trt (1) transcription (2) transformer (1) tts (4) video (3) video generation (4) video synthesis (4) view synthesis (7) voice recognition (2) volume rendering (7)

 2D keypoint detection (1)

(OpenPose) Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

 2D pose estimation (1)

(OpenPose) Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

 3D deep learning (7)

(BakedSDF) Meshing Neural SDFs for Real-Time View Synthesis
(Re-ReND) Real-time Rendering of NeRFs across Devices
(MobileNeRF) Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures
(Instant-NGP) Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
Real-Time Neural Light Field on Mobile Devices
(R2L) Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis
(NeRF) Representing Scenes as Neural Radiance Fields for View Synthesis

 Artifacts (1)

Systematic Analysis and Removal of Circular Artifacts for StyleGAN

 Attention Module (1)

Coordinate Attention for Efficient Mobile Network Design

 Audio Fingerprinting (2)

MUSIC AUGMENTATION AND DENOISING FOR PEAK-BASED AUDIO FINGERPRINTING
(SpectroMap) Peak detection algorithm for audio fingerprinting

 Conditional GAN (1)

(GAN Compression) Efficient Architectures for Interactive Conditional GANs

 Consistency Regularization (1)

Improved Consistency Regularization for GANs

 DAPT (2)

(Reuse, Don’t Retrain) A Recipe for Continued Pretraining of Language Models
CONTINUAL PRE-TRAINING OF LANGUAGE MODELS

 DPO (1)

Compile Domain Adaptation LLMs Papers (2024)

 Data augmentation (1)

Image Augmentations for GAN Training

 Deep Fake (1)

GHOST — A New Face Swap Approach for Image and Video Domains

 Deep Fakes (1)

Region-Aware Face Swapping

 Diffusion Model (2)

High-Resolution Image Synthesis with Latent Diffusion Models
(InstructPix2Pix) Learning to Follow Image Editing Instructions

 Discriminator (1)

Improving GANs with A Dynamic Discriminator

 Distillation (1)

(GAN Compression) Efficient Architectures for Interactive Conditional GANs

 Face Swap (3)

Region-Aware Face Swapping
(Smooth-Swap) A Simple Enhancement for Face-Swapping with Smoothness
(A new face swap method for image and video domains) a technical report

 Face Swapping (1)

(SimSwap) An Efficient Framework For High Fidelity Face Swapping

 Face attribute editing (1)

Adaptive semantic attribute decoupling for precise face image editing

 FaceSwap (2)

(MIGRATING FACE SWAP TO MOBILE DEVICES) A LIGHTWEIGHT FRAMEWORK AND A SUPERVISED TRAINING SOLUTION
(MobileFaceSwap) A Lightweight Framework for Video Face Swapping

 Facial Animation (2)

(SadTalker) Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
Thin-Plate Spline Motion Model for Image Animation

 Facial Attribute Editing (1)

(Arbitrary Facial Attribute Editing) Only Change What You Want

 FastGAN (1)

Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis

 GAN (29)

Null-text Inversion for Editing Real Images using Guided Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
(InstructPix2Pix) Learning to Follow Image Editing Instructions
Region-Aware Face Swapping
(GraN-GAN) Piecewise Gradient Normalization for Generative Adversarial Networks
Improved Consistency Regularization for GANs
Image Augmentations for GAN Training
Systematic Analysis and Removal of Circular Artifacts for StyleGAN
(Pros and Cons of GAN Evaluation Measures) New Developments
(Teachers Do More Than Teach) Compressing Image-to-Image Models
(Wavelet Knowledge Distillation) Towards Efficient Image-to-Image Translation
Thin-Plate Spline Motion Model for Image Animation
Improving GANs with A Dynamic Discriminator
GHOST — A New Face Swap Approach for Image and Video Domains
(Smooth-Swap) A Simple Enhancement for Face-Swapping with Smoothness
(A new face swap method for image and video domains) a technical report
(Arbitrary Facial Attribute Editing) Only Change What You Want
(MIGRATING FACE SWAP TO MOBILE DEVICES) A LIGHTWEIGHT FRAMEWORK AND A SUPERVISED TRAINING SOLUTION
(MobileFaceSwap) A Lightweight Framework for Video Face Swapping
(GAN Compression) Efficient Architectures for Interactive Conditional GANs
(DaGAN) Depth-Aware Generative Adversarial Network for Talking Head Video Generation
(SimSwap) An Efficient Framework For High Fidelity Face Swapping
(Scaled-YOLOv4) Scaling Cross Stage Partial Network
First Order Motion Model for Image Animation
(MakeItTalk) Speaker-Aware Talking-Head Animation Rendering
(PIRenderer) Controllable Portrait Image Generation via Semantic Neural Rendering
Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis
Adaptive semantic attribute decoupling for precise face image editing
Spatially-invariant Style-codes Controlled Makeup Transfer

 GAN Compression (3)

(Teachers Do More Than Teach) Compressing Image-to-Image Models
(Wavelet Knowledge Distillation) Towards Efficient Image-to-Image Translation
(GAN Compression) Efficient Architectures for Interactive Conditional GANs

 GAN Evaluation (1)

(Pros and Cons of GAN Evaluation Measures) New Developments

 GPT (1)

(InstructPix2Pix) Learning to Follow Image Editing Instructions

 Generated Image (1)

Systematic Analysis and Removal of Circular Artifacts for StyleGAN

 Gradient Normalization (1)

(GraN-GAN) Piecewise Gradient Normalization for Generative Adversarial Networks

 Image Animation (1)

Thin-Plate Spline Motion Model for Image Animation

 Image Classification (1)

(Background Splitting) Finding Rare Classes in a Sea of Background

 Image Editing (1)

Null-text Inversion for Editing Real Images using Guided Diffusion Models

 Image Generation (3)

Null-text Inversion for Editing Real Images using Guided Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
(InstructPix2Pix) Learning to Follow Image Editing Instructions

 Image Synthesis (2)

High-Resolution Image Synthesis with Latent Diffusion Models
(InstructPix2Pix) Learning to Follow Image Editing Instructions

 Image Translation (1)

(SimSwap) An Efficient Framework For High Fidelity Face Swapping

 Image-based rendering (1)

(ARShoe) Real-Time Augmented Reality Shoe Try-on System on Smartphones

 Image-to-Image Translation (2)

(Teachers Do More Than Teach) Compressing Image-to-Image Models
(Wavelet Knowledge Distillation) Towards Efficient Image-to-Image Translation

 Image-to-image Translation (1)

(GAN Compression) Efficient Architectures for Interactive Conditional GANs

 Knowledge Distillation (2)

(Teachers Do More Than Teach) Compressing Image-to-Image Models
(Wavelet Knowledge Distillation) Towards Efficient Image-to-Image Translation

 LLM (9)

(Reuse, Don’t Retrain) A Recipe for Continued Pretraining of Language Models
CONTINUAL PRE-TRAINING OF LANGUAGE MODELS
Compile Domain Adaptation LLMs Papers (2024)
(ChipNeMo) Domain-Adapted LLMs for Chip DesignBefore Projection
(Large Language Models) A Survey Models
A Survey of Resource-Efficient LLM and Multimodal Foundation Models
An Introduction to Vision-Language Modeling
(Video-LLaVA) Learning United Visual Representation by Alignment Before Projection
A Survey on Large Language Models for Recommendation

 LVLM (3)

A Survey of Resource-Efficient LLM and Multimodal Foundation Models
An Introduction to Vision-Language Modeling
(Video-LLaVA) Learning United Visual Representation by Alignment Before Projection

 Language Model (4)

(NaturalSpeech) End-to-End Text to Speech Synthesis with Human-Level Quality
(Natural Speech 2) Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
(VITS) Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
(YourTTS) Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

 Large Language Model (5)

(Large Language Models) A Survey Models
A Survey of Resource-Efficient LLM and Multimodal Foundation Models
An Introduction to Vision-Language Modeling
(Video-LLaVA) Learning United Visual Representation by Alignment Before Projection
A Survey on Large Language Models for Recommendation

 Large Vision Language Model (1)

(Video-LLaVA) Learning United Visual Representation by Alignment Before Projection

 Latent Diffusion (1)

High-Resolution Image Synthesis with Latent Diffusion Models

 Light Weight (1)

Coordinate Attention for Efficient Mobile Network Design

 Light weight (1)

(GAN Compression) Efficient Architectures for Interactive Conditional GANs

 Light weight model (2)

(MIGRATING FACE SWAP TO MOBILE DEVICES) A LIGHTWEIGHT FRAMEWORK AND A SUPERVISED TRAINING SOLUTION
(MobileFaceSwap) A Lightweight Framework for Video Face Swapping

 Lip Sync (2)

(SadTalker) Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
Thin-Plate Spline Motion Model for Image Animation

 Lipsync (1)

(DaGAN) Depth-Aware Generative Adversarial Network for Talking Head Video Generation

 Mobile Network (3)

Coordinate Attention for Efficient Mobile Network Design
(MIGRATING FACE SWAP TO MOBILE DEVICES) A LIGHTWEIGHT FRAMEWORK AND A SUPERVISED TRAINING SOLUTION
(MobileFaceSwap) A Lightweight Framework for Video Face Swapping

 Music denosing (1)

MUSIC AUGMENTATION AND DENOISING FOR PEAK-BASED AUDIO FINGERPRINTING

 Music recognition (2)

MUSIC AUGMENTATION AND DENOISING FOR PEAK-BASED AUDIO FINGERPRINTING
(SpectroMap) Peak detection algorithm for audio fingerprinting
(GAN Compression) Efficient Architectures for Interactive Conditional GANs

 Quantitative control (1)

Adaptive semantic attribute decoupling for precise face image editing

 Retrieval-based Language Models (1)

Retrieval-based Language Models and Applications

 SCGAN (1)

Spatially-invariant Style-codes Controlled Makeup Transfer

 Singing Synthesizers (1)

(Natural Speech 2) Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

 Speech Synthesis (3)

(QuickVC) Any-To-Many Voice Conversion Using Inverse Short-Time Fourier Transform for Faster Conversion
LOW-LATENCY REAL-TIME VOICE CONVERSION ON CPU
Voice Conversion With Just Nearest Neighbors

 Stable Diffusion (3)

Null-text Inversion for Editing Real Images using Guided Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
(InstructPix2Pix) Learning to Follow Image Editing Instructions

 Style GAN (1)

Systematic Analysis and Removal of Circular Artifacts for StyleGAN

 Synthetic data (1)

Systematic Analysis and Removal of Circular Artifacts for StyleGAN

 Talking Face Generation (1)

(SadTalker) Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

 Talking head video generation (1)

(DaGAN) Depth-Aware Generative Adversarial Network for Talking Head Video Generation

 Text-guided Diffusion model (1)

Null-text Inversion for Editing Real Images using Guided Diffusion Models

 Text-to-Speech (4)

(NaturalSpeech) End-to-End Text to Speech Synthesis with Human-Level Quality
(Natural Speech 2) Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
(VITS) Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
(YourTTS) Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

 VLM (2)

A Survey of Resource-Efficient LLM and Multimodal Foundation Models
An Introduction to Vision-Language Modeling

 Video Generation (1)

(SadTalker) Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

 Video Synthesis (1)

(SadTalker) Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

 Virtual try-on (1)

(ARShoe) Real-Time Augmented Reality Shoe Try-on System on Smartphones

 Vision Language Model (2)

A Survey of Resource-Efficient LLM and Multimodal Foundation Models
An Introduction to Vision-Language Modeling

 Voice Conversion (7)

(QuickVC) Any-To-Many Voice Conversion Using Inverse Short-Time Fourier Transform for Faster Conversion
LOW-LATENCY REAL-TIME VOICE CONVERSION ON CPU
Voice Conversion With Just Nearest Neighbors
(NaturalSpeech) End-to-End Text to Speech Synthesis with Human-Level Quality
(Natural Speech 2) Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
(VITS) Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
(YourTTS) Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

 Zero-shot speech (1)

(Natural Speech 2) Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

 ai (79)

(Reuse, Don’t Retrain) A Recipe for Continued Pretraining of Language Models
CONTINUAL PRE-TRAINING OF LANGUAGE MODELS
Compile Domain Adaptation LLMs Papers (2024)
(ChipNeMo) Domain-Adapted LLMs for Chip DesignBefore Projection
(Large Language Models) A Survey Models
Torch To TensorRT using Dynamic Batch Size
A Survey of Resource-Efficient LLM and Multimodal Foundation Models
An Introduction to Vision-Language Modeling
(Video-LLaVA) Learning United Visual Representation by Alignment Before Projection
A Survey on Large Language Models for Recommendation
MUSIC AUGMENTATION AND DENOISING FOR PEAK-BASED AUDIO FINGERPRINTING
(SpectroMap) Peak detection algorithm for audio fingerprinting
LLM Prompt Engineering
(BakedSDF) Meshing Neural SDFs for Real-Time View Synthesis
Retrieval-based Language Models and Applications
(OpenPose) Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
(ARShoe) Real-Time Augmented Reality Shoe Try-on System on Smartphones
(Re-ReND) Real-time Rendering of NeRFs across Devices
(QuickVC) Any-To-Many Voice Conversion Using Inverse Short-Time Fourier Transform for Faster Conversion
LOW-LATENCY REAL-TIME VOICE CONVERSION ON CPU
(SadTalker) Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
(MobileNeRF) Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures
(WhisperX) Time-Accurate Speech Transcription of Long-Form Audio
(Whisper) Robust Speech Recognition via Large-Scale Weak Supervision
(Instant-NGP) Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
Real-Time Neural Light Field on Mobile Devices
(R2L) Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis
(NeRF) Representing Scenes as Neural Radiance Fields for View Synthesis
Voice Conversion With Just Nearest Neighbors
(NaturalSpeech) End-to-End Text to Speech Synthesis with Human-Level Quality
(Natural Speech 2) Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
(VITS) Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
(YourTTS) Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Null-text Inversion for Editing Real Images using Guided Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
(InstructPix2Pix) Learning to Follow Image Editing Instructions
Region-Aware Face Swapping
(GraN-GAN) Piecewise Gradient Normalization for Generative Adversarial Networks
Improved Consistency Regularization for GANs
Image Augmentations for GAN Training
Coordinate Attention for Efficient Mobile Network Design
Systematic Analysis and Removal of Circular Artifacts for StyleGAN
(Pros and Cons of GAN Evaluation Measures) New Developments
(Teachers Do More Than Teach) Compressing Image-to-Image Models
(Wavelet Knowledge Distillation) Towards Efficient Image-to-Image Translation
Thin-Plate Spline Motion Model for Image Animation
Improving GANs with A Dynamic Discriminator
GHOST — A New Face Swap Approach for Image and Video Domains
Supervised Contrastive Learning
(Smooth-Swap) A Simple Enhancement for Face-Swapping with Smoothness
(A new face swap method for image and video domains) a technical report
(Arbitrary Facial Attribute Editing) Only Change What You Want
(MIGRATING FACE SWAP TO MOBILE DEVICES) A LIGHTWEIGHT FRAMEWORK AND A SUPERVISED TRAINING SOLUTION
(MobileFaceSwap) A Lightweight Framework for Video Face Swapping
(GAN Compression) Efficient Architectures for Interactive Conditional GANs
(DaGAN) Depth-Aware Generative Adversarial Network for Talking Head Video Generation
(PP-LiteSeg) A Superior Real-Time Semantic Segmentation Model
(Background Splitting) Finding Rare Classes in a Sea of Background
(SimSwap) An Efficient Framework For High Fidelity Face Swapping
(Scaled-YOLOv4) Scaling Cross Stage Partial Network
First Order Motion Model for Image Animation
(MakeItTalk) Speaker-Aware Talking-Head Animation Rendering
(PIRenderer) Controllable Portrait Image Generation via Semantic Neural Rendering
Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis
Adaptive semantic attribute decoupling for precise face image editing
Spatially-invariant Style-codes Controlled Makeup Transfer
(A Continual Learning Survey) Defying forgetting in classification tasks
(Image Classification with Deep Learning in the Presence of Noisy Labels) A Survey
(Federated Learning in Mobile Edge Networks) A Comprehensive Survey
Searching for MobileNetV3
A Survey on Visual Transformer
Recurrent Feature Reasoning for Image Inpainting
(PiiGAN) Generative Adversarial Networks for Pluralistic Image Inpainting
(SINet) Extreme Lightweight Portrait Segmentation Networks with Spatial Squeeze Modules and Information Blocking Decoder
(TTVOS) Lightweight Video Object Segmentation with Adaptive Template Attention Module and Temporal Consistency Loss
Real-time Hair Segmentation and Recoloring on Mobile GPUs
(PortraitNet) Real-time Portrait Segmentation Network for Mobile Device
Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness
Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement

 augmented reality (1)

(ARShoe) Real-Time Augmented Reality Shoe Try-on System on Smartphones

 automatic speech recognition (2)

(WhisperX) Time-Accurate Speech Transcription of Long-Form Audio
(Whisper) Robust Speech Recognition via Large-Scale Weak Supervision

 background classification (1)

(Background Splitting) Finding Rare Classes in a Sea of Background

 computer vision (34)

Null-text Inversion for Editing Real Images using Guided Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
(InstructPix2Pix) Learning to Follow Image Editing Instructions
Region-Aware Face Swapping
(GraN-GAN) Piecewise Gradient Normalization for Generative Adversarial Networks
Improved Consistency Regularization for GANs
Image Augmentations for GAN Training
Coordinate Attention for Efficient Mobile Network Design
Systematic Analysis and Removal of Circular Artifacts for StyleGAN
(Pros and Cons of GAN Evaluation Measures) New Developments
(Teachers Do More Than Teach) Compressing Image-to-Image Models
(Wavelet Knowledge Distillation) Towards Efficient Image-to-Image Translation
Thin-Plate Spline Motion Model for Image Animation
Improving GANs with A Dynamic Discriminator
GHOST — A New Face Swap Approach for Image and Video Domains
Supervised Contrastive Learning
(Smooth-Swap) A Simple Enhancement for Face-Swapping with Smoothness
(A new face swap method for image and video domains) a technical report
(Arbitrary Facial Attribute Editing) Only Change What You Want
(MIGRATING FACE SWAP TO MOBILE DEVICES) A LIGHTWEIGHT FRAMEWORK AND A SUPERVISED TRAINING SOLUTION
(MobileFaceSwap) A Lightweight Framework for Video Face Swapping
(GAN Compression) Efficient Architectures for Interactive Conditional GANs
(DaGAN) Depth-Aware Generative Adversarial Network for Talking Head Video Generation
(PP-LiteSeg) A Superior Real-Time Semantic Segmentation Model
(Background Splitting) Finding Rare Classes in a Sea of Background
(SimSwap) An Efficient Framework For High Fidelity Face Swapping
(Scaled-YOLOv4) Scaling Cross Stage Partial Network
First Order Motion Model for Image Animation
(MakeItTalk) Speaker-Aware Talking-Head Animation Rendering
(PIRenderer) Controllable Portrait Image Generation via Semantic Neural Rendering
(A Continual Learning Survey) Defying forgetting in classification tasks
(Image Classification with Deep Learning in the Presence of Noisy Labels) A Survey
(Federated Learning in Mobile Edge Networks) A Comprehensive Survey
A Survey on Visual Transformer

 content-based music source retrieve (2)

MUSIC AUGMENTATION AND DENOISING FOR PEAK-BASED AUDIO FINGERPRINTING
(SpectroMap) Peak detection algorithm for audio fingerprinting

 continual learning (3)

(Reuse, Don’t Retrain) A Recipe for Continued Pretraining of Language Models
CONTINUAL PRE-TRAINING OF LANGUAGE MODELS
(A Continual Learning Survey) Defying forgetting in classification tasks

 continued pretraining (2)

(Reuse, Don’t Retrain) A Recipe for Continued Pretraining of Language Models
Compile Domain Adaptation LLMs Papers (2024)

 contrastive learning (1)

Supervised Contrastive Learning

 data privacy (1)

(Federated Learning in Mobile Edge Networks) A Comprehensive Survey

 domain adaption (3)

CONTINUAL PRE-TRAINING OF LANGUAGE MODELS
Compile Domain Adaptation LLMs Papers (2024)
(ChipNeMo) Domain-Adapted LLMs for Chip DesignBefore Projection

 dynamic batch (1)

Torch To TensorRT using Dynamic Batch Size

 face generation (4)

(Scaled-YOLOv4) Scaling Cross Stage Partial Network
First Order Motion Model for Image Animation
(MakeItTalk) Speaker-Aware Talking-Head Animation Rendering
(PIRenderer) Controllable Portrait Image Generation via Semantic Neural Rendering

 face swap (1)

GHOST — A New Face Swap Approach for Image and Video Domains

 facial animation (5)

(DaGAN) Depth-Aware Generative Adversarial Network for Talking Head Video Generation
(Scaled-YOLOv4) Scaling Cross Stage Partial Network
First Order Motion Model for Image Animation
(MakeItTalk) Speaker-Aware Talking-Head Animation Rendering
(PIRenderer) Controllable Portrait Image Generation via Semantic Neural Rendering

 federated learning (1)

(Federated Learning in Mobile Edge Networks) A Comprehensive Survey

 finetune llm (4)

(Reuse, Don’t Retrain) A Recipe for Continued Pretraining of Language Models
CONTINUAL PRE-TRAINING OF LANGUAGE MODELS
Compile Domain Adaptation LLMs Papers (2024)
(ChipNeMo) Domain-Adapted LLMs for Chip DesignBefore Projection

 fundamental (1)

Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness

 generative adversarial networks (2)

Recurrent Feature Reasoning for Image Inpainting
(PiiGAN) Generative Adversarial Networks for Pluralistic Image Inpainting

 hair (1)

Real-time Hair Segmentation and Recoloring on Mobile GPUs

 image classification (2)

(A Continual Learning Survey) Defying forgetting in classification tasks
(Image Classification with Deep Learning in the Presence of Noisy Labels) A Survey

 image-based rendering (7)

(BakedSDF) Meshing Neural SDFs for Real-Time View Synthesis
(Re-ReND) Real-time Rendering of NeRFs across Devices
(MobileNeRF) Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures
(Instant-NGP) Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
Real-Time Neural Light Field on Mobile Devices
(R2L) Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis
(NeRF) Representing Scenes as Neural Radiance Fields for View Synthesis

 imbalanced classification (1)

(Background Splitting) Finding Rare Classes in a Sea of Background

 inpainting (2)

Recurrent Feature Reasoning for Image Inpainting
(PiiGAN) Generative Adversarial Networks for Pluralistic Image Inpainting

 large language model (2)

LLM Prompt Engineering
Retrieval-based Language Models and Applications

 lifelong learning (1)

(A Continual Learning Survey) Defying forgetting in classification tasks

 light-weight (1)

Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis

 lightweight model (1)

Searching for MobileNetV3

 lip sync (2)

(Scaled-YOLOv4) Scaling Cross Stage Partial Network
First Order Motion Model for Image Animation

 make-up filter (2)

Adaptive semantic attribute decoupling for precise face image editing
Spatially-invariant Style-codes Controlled Makeup Transfer

 ml (79)

(Reuse, Don’t Retrain) A Recipe for Continued Pretraining of Language Models
CONTINUAL PRE-TRAINING OF LANGUAGE MODELS
Compile Domain Adaptation LLMs Papers (2024)
(ChipNeMo) Domain-Adapted LLMs for Chip DesignBefore Projection
(Large Language Models) A Survey Models
Torch To TensorRT using Dynamic Batch Size
A Survey of Resource-Efficient LLM and Multimodal Foundation Models
An Introduction to Vision-Language Modeling
(Video-LLaVA) Learning United Visual Representation by Alignment Before Projection
A Survey on Large Language Models for Recommendation
MUSIC AUGMENTATION AND DENOISING FOR PEAK-BASED AUDIO FINGERPRINTING
(SpectroMap) Peak detection algorithm for audio fingerprinting
LLM Prompt Engineering
(BakedSDF) Meshing Neural SDFs for Real-Time View Synthesis
Retrieval-based Language Models and Applications
(OpenPose) Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
(ARShoe) Real-Time Augmented Reality Shoe Try-on System on Smartphones
(Re-ReND) Real-time Rendering of NeRFs across Devices
(QuickVC) Any-To-Many Voice Conversion Using Inverse Short-Time Fourier Transform for Faster Conversion
LOW-LATENCY REAL-TIME VOICE CONVERSION ON CPU
(SadTalker) Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
(MobileNeRF) Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures
(WhisperX) Time-Accurate Speech Transcription of Long-Form Audio
(Whisper) Robust Speech Recognition via Large-Scale Weak Supervision
(Instant-NGP) Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
Real-Time Neural Light Field on Mobile Devices
(R2L) Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis
(NeRF) Representing Scenes as Neural Radiance Fields for View Synthesis
Voice Conversion With Just Nearest Neighbors
(NaturalSpeech) End-to-End Text to Speech Synthesis with Human-Level Quality
(Natural Speech 2) Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
(VITS) Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
(YourTTS) Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Null-text Inversion for Editing Real Images using Guided Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
(InstructPix2Pix) Learning to Follow Image Editing Instructions
Region-Aware Face Swapping
(GraN-GAN) Piecewise Gradient Normalization for Generative Adversarial Networks
Improved Consistency Regularization for GANs
Image Augmentations for GAN Training
Coordinate Attention for Efficient Mobile Network Design
Systematic Analysis and Removal of Circular Artifacts for StyleGAN
(Pros and Cons of GAN Evaluation Measures) New Developments
(Teachers Do More Than Teach) Compressing Image-to-Image Models
(Wavelet Knowledge Distillation) Towards Efficient Image-to-Image Translation
Thin-Plate Spline Motion Model for Image Animation
Improving GANs with A Dynamic Discriminator
GHOST — A New Face Swap Approach for Image and Video Domains
Supervised Contrastive Learning
(Smooth-Swap) A Simple Enhancement for Face-Swapping with Smoothness
(A new face swap method for image and video domains) a technical report
(Arbitrary Facial Attribute Editing) Only Change What You Want
(MIGRATING FACE SWAP TO MOBILE DEVICES) A LIGHTWEIGHT FRAMEWORK AND A SUPERVISED TRAINING SOLUTION
(MobileFaceSwap) A Lightweight Framework for Video Face Swapping
(GAN Compression) Efficient Architectures for Interactive Conditional GANs
(DaGAN) Depth-Aware Generative Adversarial Network for Talking Head Video Generation
(PP-LiteSeg) A Superior Real-Time Semantic Segmentation Model
(Background Splitting) Finding Rare Classes in a Sea of Background
(SimSwap) An Efficient Framework For High Fidelity Face Swapping
(Scaled-YOLOv4) Scaling Cross Stage Partial Network
First Order Motion Model for Image Animation
(MakeItTalk) Speaker-Aware Talking-Head Animation Rendering
(PIRenderer) Controllable Portrait Image Generation via Semantic Neural Rendering
Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis
Adaptive semantic attribute decoupling for precise face image editing
Spatially-invariant Style-codes Controlled Makeup Transfer
(A Continual Learning Survey) Defying forgetting in classification tasks
(Image Classification with Deep Learning in the Presence of Noisy Labels) A Survey
(Federated Learning in Mobile Edge Networks) A Comprehensive Survey
Searching for MobileNetV3
A Survey on Visual Transformer
Recurrent Feature Reasoning for Image Inpainting
(PiiGAN) Generative Adversarial Networks for Pluralistic Image Inpainting
(SINet) Extreme Lightweight Portrait Segmentation Networks with Spatial Squeeze Modules and Information Blocking Decoder
(TTVOS) Lightweight Video Object Segmentation with Adaptive Template Attention Module and Temporal Consistency Loss
Real-time Hair Segmentation and Recoloring on Mobile GPUs
(PortraitNet) Real-time Portrait Segmentation Network for Mobile Device
Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness
Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement

 mobile edge networks (1)

(Federated Learning in Mobile Edge Networks) A Comprehensive Survey

 mobileNetv3 (1)

Searching for MobileNetV3
Searching for MobileNetV3

 noisy label (1)

(Image Classification with Deep Learning in the Presence of Noisy Labels) A Survey

 pix2pix (1)

(InstructPix2Pix) Learning to Follow Image Editing Instructions

 portrait (2)

(SINet) Extreme Lightweight Portrait Segmentation Networks with Spatial Squeeze Modules and Information Blocking Decoder
(PortraitNet) Real-time Portrait Segmentation Network for Mobile Device

 prompt engineering (1)

LLM Prompt Engineering

 real-time segmentation (1)

(PP-LiteSeg) A Superior Real-Time Semantic Segmentation Model

 recommendation system (1)

A Survey on Large Language Models for Recommendation

 robustness (1)

Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness

 scene representation (7)

(BakedSDF) Meshing Neural SDFs for Real-Time View Synthesis
(Re-ReND) Real-time Rendering of NeRFs across Devices
(MobileNeRF) Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures
(Instant-NGP) Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
Real-Time Neural Light Field on Mobile Devices
(R2L) Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis
(NeRF) Representing Scenes as Neural Radiance Fields for View Synthesis

 segmentation (5)

(SINet) Extreme Lightweight Portrait Segmentation Networks with Spatial Squeeze Modules and Information Blocking Decoder
(TTVOS) Lightweight Video Object Segmentation with Adaptive Template Attention Module and Temporal Consistency Loss
Real-time Hair Segmentation and Recoloring on Mobile GPUs
(PortraitNet) Real-time Portrait Segmentation Network for Mobile Device
Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement

 self-attention (1)

A Survey on Visual Transformer

 self-supervised learning (1)

Supervised Contrastive Learning

 semantic segmentation (1)

(PP-LiteSeg) A Superior Real-Time Semantic Segmentation Model

 speech-to-text (2)

(WhisperX) Time-Accurate Speech Transcription of Long-Form Audio
(Whisper) Robust Speech Recognition via Large-Scale Weak Supervision

 supervised fine-tuning (1)

Compile Domain Adaptation LLMs Papers (2024)

 task incremental learning (1)

(A Continual Learning Survey) Defying forgetting in classification tasks

 tensorRT (1)

Torch To TensorRT using Dynamic Batch Size

 torch to trt (1)

Torch To TensorRT using Dynamic Batch Size

 transcription (2)

(WhisperX) Time-Accurate Speech Transcription of Long-Form Audio
(Whisper) Robust Speech Recognition via Large-Scale Weak Supervision

 transformer (1)

A Survey on Visual Transformer

 tts (4)

(NaturalSpeech) End-to-End Text to Speech Synthesis with Human-Level Quality
(Natural Speech 2) Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
(VITS) Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
(YourTTS) Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

 video (3)

(TTVOS) Lightweight Video Object Segmentation with Adaptive Template Attention Module and Temporal Consistency Loss
Real-time Hair Segmentation and Recoloring on Mobile GPUs
Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement

 video generation (4)

(Scaled-YOLOv4) Scaling Cross Stage Partial Network
First Order Motion Model for Image Animation
(MakeItTalk) Speaker-Aware Talking-Head Animation Rendering
(PIRenderer) Controllable Portrait Image Generation via Semantic Neural Rendering

 video synthesis (4)

(Scaled-YOLOv4) Scaling Cross Stage Partial Network
First Order Motion Model for Image Animation
(MakeItTalk) Speaker-Aware Talking-Head Animation Rendering
(PIRenderer) Controllable Portrait Image Generation via Semantic Neural Rendering

 view synthesis (7)

(BakedSDF) Meshing Neural SDFs for Real-Time View Synthesis
(Re-ReND) Real-time Rendering of NeRFs across Devices
(MobileNeRF) Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures
(Instant-NGP) Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
Real-Time Neural Light Field on Mobile Devices
(R2L) Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis
(NeRF) Representing Scenes as Neural Radiance Fields for View Synthesis

 voice recognition (2)

(WhisperX) Time-Accurate Speech Transcription of Long-Form Audio
(Whisper) Robust Speech Recognition via Large-Scale Weak Supervision

 volume rendering (7)

(BakedSDF) Meshing Neural SDFs for Real-Time View Synthesis
(Re-ReND) Real-time Rendering of NeRFs across Devices
(MobileNeRF) Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures
(Instant-NGP) Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
Real-Time Neural Light Field on Mobile Devices
(R2L) Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis
(NeRF) Representing Scenes as Neural Radiance Fields for View Synthesis