SlideShare uma empresa Scribd logo
1 de 1
Baixar para ler offline
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model
Edresson Casanova, Christopher Shulby, Eren Gölge, Nicolas Michael Müller, Frederico Santos de Oliveira, Arnaldo Candido Junior,
Anderson da Silva Soares, Sandra Maria Aluisio, Moacir Antonelli Ponti
1. Introduction p
1.1 Motivation
– Recently, normalizing flows have been successfully applied in the TTS field. When the flow-based models FlowTron (Valle et
al., 2020) and Glow-TTS (Kim et al., 2020) achieved state-of-the-art results. Despite this, current zero-shot multi-speaker
TTS models were heavily based on the Tacotron 2 model.
1.2 Highlights
– As far as we know, this is the first work to explore flow-based models in a zero-shot multi-speaker TTS scenario.
– We show that fine-tuning a GAN-based vocoder with the Mel-spectrograms predicted by the TTS model in the training
speakers can significantly improve speech similarity and quality for new speakers.
– Our approach achieves promising results using only 11 speakers for training.
2. Methodology: Proposed Method and Dataset
2.1 Speaker Encoder
– Stack of 3 LSTM layers with a linear output layer.
– Trained using the Angular Prototypical loss function with approximately 25k speakers.
– Train datasets: LibriSpeech dataset, VoxCeleb V1 and V2, English version of Common Voice and VCTK.
2.2 Vocoder: HiFi-GAN V2
− VCTK dataset for training and validation.
− Fine-tuning with Mel-spectrograms predicted by TTS models
(HiFi-GAN-FT).
2.3 SC-GlowTTS Model: Glow-TTS based
− Phonemes instead of graphemes as input.
− Explore 3 different encoders:
 The original transformer based encoder;
 Residual convolutional based;
 Gated convolutional based.
− External speaker embeddings conditioned in:
 Affine coupling layers in all decoder blocks;
 Duration predictor input.
2.4 Dataset: VCTK
− Training: composed of 97 speakers.
− Development: composed by samples from the 97 training speakers.
− Test: composed of 11 speakers not present in the training set.
Input Text Phonemizer Encoder
Duration Predictor
Conv Projection
Speaker Embedding
Aligment Generation
Ceil
Flow-Based Decoder
UnSqueeze
Affine Coupling Layer
Invertible 1x1 Conv
ActNorm
Squeeze
x 12
Predicted Mel spectrogram
HiFi-GAN
Waveform
3. Experiments: Setup and Results
3.1 Proposed Experiments
1. Tacotron 2 baseline following Jia et al. (2018) and Cooper et al. (2020);
2. SC-GlowTTS with transformer based encoder;
3. SC-GlowTTS with residual convolutional based encoder;
4. SC-GlowTTS with gated convolutional based encoder.
3.2 Experiments Setup
– All experiments were implemented on the Coqui TTS:
github.com/coqui-ai/TTS
– Coqui TTS is an open source TTS framework. Contributions are welcome.
– Audio samples and checkpoints of all experiments are available on:
github.com/Edresson/SC-GlowTTS
3.3 Results
Table 1. Real Time Factor, MOS and Sim-MOS with 95% confidence intervals and the SECS for all our experiments.
Experiment - Model Vocoder RTF (CPU - GPU) SECS MOS Sim-MOS
Ground Truth – – 0.9236 4.12 ± 0.06 4.127 ± 0.06
Attentron ZS (Choi et al., 2020) WaveRNN – (0.731) (3.86 ± 0.05) (3.30 ± 0.06)
1 - Tacotron 2
HiFi-GAN 0.5782 - 0.2485 0.7589 3.57 ± 0.08 3.867 ± 0.08
HiFi-GAN-FT - 0.7791 3.74 ± 0.08 3.951 ± 0.07
2 - SC-GlowTTS-Trans
HiFi-GAN 0.3612 - 0.1557 0.7641 3.65 ± 0.07 3.905 ± 0.07
HiFi-GAN-FT - 0.8046 3.78 ± 0.07 3.999 ± 0.07
3 - SC-GlowTTS-Res
HiFi-GAN 0.3597 - 0.1545 0.7440 3.45 ± 0.09 3.828 ± 0.08
HiFi-GAN-FT - 0.7969 3.70 ± 0.07 3.916 ± 0.07
4 - SC-GlowTTS-Gated
HiFi-GAN 0.3474 - 0.1437 0.7432 3.55 ± 0.08 3.852 ± 0.08
HiFi-GAN-FT - 0.7849 3.82 ± 0.07 3.952 ± 0.07
4. SC-GlowTTS performance with few speakers
– To emulate a scenario with few speakers we selected 11 speakers from the training subset of the VCTK dataset.
– We trained the SC-GlowTTS-Trans model on the single speaker dataset, LJ Speech, after we continued the training, in this
dataset composed of 11 speakers and we calculated the metrics for the test set.
– The model achieved a similarity MOS of 3.93±0.08 and a MOS of 3.71±0.07. These results are comparable to those achieved
by the Tacotron 2 baseline trained with 98 speakers which achieved a similarity MOS of 3.95±0.07 and a MOS of 3.74±0.08.
– We believe that this is an important step forward, especially for zero-shot multi speaker TTS in
low-resource languages.

Mais conteúdo relacionado

Mais procurados

광운대[바람] 3.vhdl test bench
광운대[바람] 3.vhdl test bench광운대[바람] 3.vhdl test bench
광운대[바람] 3.vhdl test benchNAVER D2
 
Saber New broadcast resume
Saber New broadcast resumeSaber New broadcast resume
Saber New broadcast resumeSaber Mohammad
 
VLSI subsystem design processes
VLSI  subsystem design processes VLSI  subsystem design processes
VLSI subsystem design processes Vishal kakade
 
Tcp Congestion Avoidance
Tcp Congestion AvoidanceTcp Congestion Avoidance
Tcp Congestion AvoidanceRam Dutt Shukla
 
Scsi express overview
Scsi express overviewScsi express overview
Scsi express overviewrbeetle
 
L12 programmable+logic+devices+(pld)
L12 programmable+logic+devices+(pld)L12 programmable+logic+devices+(pld)
L12 programmable+logic+devices+(pld)NAGASAI547
 
Implementation of reed solomon codes basics
Implementation of reed solomon codes basicsImplementation of reed solomon codes basics
Implementation of reed solomon codes basicsRam Singh Yadav
 
FEC-Forward Error Correction for Optics Professionals..www.mapyourtech.com
FEC-Forward Error Correction for Optics Professionals..www.mapyourtech.comFEC-Forward Error Correction for Optics Professionals..www.mapyourtech.com
FEC-Forward Error Correction for Optics Professionals..www.mapyourtech.comMapYourTech
 
Signal Integrity - A Crash Course [R Lott]
Signal Integrity - A Crash Course [R Lott]Signal Integrity - A Crash Course [R Lott]
Signal Integrity - A Crash Course [R Lott]Ryan Lott
 
Location Aided Routing (LAR)
Location Aided Routing (LAR) Location Aided Routing (LAR)
Location Aided Routing (LAR) Pradeep Kumar TS
 
Congestion control
Congestion controlCongestion control
Congestion controlNithin Raj
 
OTN for Beginners
OTN for BeginnersOTN for Beginners
OTN for BeginnersMapYourTech
 
The need for Synchronisation in Telecommunications
The need for Synchronisation in TelecommunicationsThe need for Synchronisation in Telecommunications
The need for Synchronisation in Telecommunications3G4G
 
Hardware Implementation Of QPSK Modulator for Satellite Communications
Hardware Implementation Of QPSK Modulator for Satellite CommunicationsHardware Implementation Of QPSK Modulator for Satellite Communications
Hardware Implementation Of QPSK Modulator for Satellite Communicationspradeepps88
 

Mais procurados (20)

Reverse IMD
Reverse IMDReverse IMD
Reverse IMD
 
광운대[바람] 3.vhdl test bench
광운대[바람] 3.vhdl test bench광운대[바람] 3.vhdl test bench
광운대[바람] 3.vhdl test bench
 
Formal verification
Formal verificationFormal verification
Formal verification
 
Saber New broadcast resume
Saber New broadcast resumeSaber New broadcast resume
Saber New broadcast resume
 
1 multiplexing
1 multiplexing1 multiplexing
1 multiplexing
 
VLSI subsystem design processes
VLSI  subsystem design processes VLSI  subsystem design processes
VLSI subsystem design processes
 
Tcp Congestion Avoidance
Tcp Congestion AvoidanceTcp Congestion Avoidance
Tcp Congestion Avoidance
 
Scsi express overview
Scsi express overviewScsi express overview
Scsi express overview
 
Dsdv
DsdvDsdv
Dsdv
 
L12 programmable+logic+devices+(pld)
L12 programmable+logic+devices+(pld)L12 programmable+logic+devices+(pld)
L12 programmable+logic+devices+(pld)
 
Lambda design rule
Lambda design ruleLambda design rule
Lambda design rule
 
Implementation of reed solomon codes basics
Implementation of reed solomon codes basicsImplementation of reed solomon codes basics
Implementation of reed solomon codes basics
 
RSA algorithm
RSA algorithmRSA algorithm
RSA algorithm
 
FEC-Forward Error Correction for Optics Professionals..www.mapyourtech.com
FEC-Forward Error Correction for Optics Professionals..www.mapyourtech.comFEC-Forward Error Correction for Optics Professionals..www.mapyourtech.com
FEC-Forward Error Correction for Optics Professionals..www.mapyourtech.com
 
Signal Integrity - A Crash Course [R Lott]
Signal Integrity - A Crash Course [R Lott]Signal Integrity - A Crash Course [R Lott]
Signal Integrity - A Crash Course [R Lott]
 
Location Aided Routing (LAR)
Location Aided Routing (LAR) Location Aided Routing (LAR)
Location Aided Routing (LAR)
 
Congestion control
Congestion controlCongestion control
Congestion control
 
OTN for Beginners
OTN for BeginnersOTN for Beginners
OTN for Beginners
 
The need for Synchronisation in Telecommunications
The need for Synchronisation in TelecommunicationsThe need for Synchronisation in Telecommunications
The need for Synchronisation in Telecommunications
 
Hardware Implementation Of QPSK Modulator for Satellite Communications
Hardware Implementation Of QPSK Modulator for Satellite CommunicationsHardware Implementation Of QPSK Modulator for Satellite Communications
Hardware Implementation Of QPSK Modulator for Satellite Communications
 

Semelhante a Poster SCGlowTTS Interspeech 2021

Shah Md Zobair(063560056)
Shah Md Zobair(063560056)Shah Md Zobair(063560056)
Shah Md Zobair(063560056)mashiur
 
Slow dancing pdn on memory-controller-packages may-10th_2012_hf_last
Slow dancing pdn on memory-controller-packages may-10th_2012_hf_lastSlow dancing pdn on memory-controller-packages may-10th_2012_hf_last
Slow dancing pdn on memory-controller-packages may-10th_2012_hf_lastHany Fahmy
 
NR_Frame_Structure_and_Air_Interface_Resources.pptx
NR_Frame_Structure_and_Air_Interface_Resources.pptxNR_Frame_Structure_and_Air_Interface_Resources.pptx
NR_Frame_Structure_and_Air_Interface_Resources.pptxBijoy Banerjee
 
Mohammed_Defense_July13th2011
Mohammed_Defense_July13th2011Mohammed_Defense_July13th2011
Mohammed_Defense_July13th2011mohdmohsen
 
College ADSL Presentation
College ADSL PresentationCollege ADSL Presentation
College ADSL Presentationjviviano
 
Orthogonal Frequency Division Multiplexing.ppt
Orthogonal Frequency Division Multiplexing.pptOrthogonal Frequency Division Multiplexing.ppt
Orthogonal Frequency Division Multiplexing.pptStefan Oprea
 
Bluetooth Technology-Introduction to Bluetooth, Technical Specifications, Blu...
Bluetooth Technology-Introduction to Bluetooth, Technical Specifications, Blu...Bluetooth Technology-Introduction to Bluetooth, Technical Specifications, Blu...
Bluetooth Technology-Introduction to Bluetooth, Technical Specifications, Blu...KevinYangYang
 
OIF 112G Panel at DesignCon 2017
OIF 112G Panel at DesignCon 2017OIF 112G Panel at DesignCon 2017
OIF 112G Panel at DesignCon 2017Deborah Porchivina
 
Encrypted Traffic Mining
Encrypted Traffic MiningEncrypted Traffic Mining
Encrypted Traffic MiningHenry Huang
 
Final presentation
Final presentationFinal presentation
Final presentationRohan Lad
 
Ofdm sim-matlab-code-tutorial web for EE students
Ofdm sim-matlab-code-tutorial web for EE studentsOfdm sim-matlab-code-tutorial web for EE students
Ofdm sim-matlab-code-tutorial web for EE studentsMike Martin
 
FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)IRJET Journal
 
Webinar: BlueNRG-LP - Bluetooth 5.2 de longo alcance para aplicações industriais
Webinar: BlueNRG-LP - Bluetooth 5.2 de longo alcance para aplicações industriaisWebinar: BlueNRG-LP - Bluetooth 5.2 de longo alcance para aplicações industriais
Webinar: BlueNRG-LP - Bluetooth 5.2 de longo alcance para aplicações industriaisEmbarcados
 
Automatic Speech Recognition Incorporating Modulation Domain Enhancement
Automatic Speech Recognition Incorporating Modulation Domain EnhancementAutomatic Speech Recognition Incorporating Modulation Domain Enhancement
Automatic Speech Recognition Incorporating Modulation Domain EnhancementIRJET Journal
 

Semelhante a Poster SCGlowTTS Interspeech 2021 (20)

Shah Md Zobair(063560056)
Shah Md Zobair(063560056)Shah Md Zobair(063560056)
Shah Md Zobair(063560056)
 
Speaker Segmentation (2006)
Speaker Segmentation (2006)Speaker Segmentation (2006)
Speaker Segmentation (2006)
 
Slow dancing pdn on memory-controller-packages may-10th_2012_hf_last
Slow dancing pdn on memory-controller-packages may-10th_2012_hf_lastSlow dancing pdn on memory-controller-packages may-10th_2012_hf_last
Slow dancing pdn on memory-controller-packages may-10th_2012_hf_last
 
NR_Frame_Structure_and_Air_Interface_Resources.pptx
NR_Frame_Structure_and_Air_Interface_Resources.pptxNR_Frame_Structure_and_Air_Interface_Resources.pptx
NR_Frame_Structure_and_Air_Interface_Resources.pptx
 
Mohammed_Defense_July13th2011
Mohammed_Defense_July13th2011Mohammed_Defense_July13th2011
Mohammed_Defense_July13th2011
 
College ADSL Presentation
College ADSL PresentationCollege ADSL Presentation
College ADSL Presentation
 
Orthogonal Frequency Division Multiplexing.ppt
Orthogonal Frequency Division Multiplexing.pptOrthogonal Frequency Division Multiplexing.ppt
Orthogonal Frequency Division Multiplexing.ppt
 
Bluetooth Technology-Introduction to Bluetooth, Technical Specifications, Blu...
Bluetooth Technology-Introduction to Bluetooth, Technical Specifications, Blu...Bluetooth Technology-Introduction to Bluetooth, Technical Specifications, Blu...
Bluetooth Technology-Introduction to Bluetooth, Technical Specifications, Blu...
 
LTE Air Interface
LTE Air InterfaceLTE Air Interface
LTE Air Interface
 
OIF 112G Panel at DesignCon 2017
OIF 112G Panel at DesignCon 2017OIF 112G Panel at DesignCon 2017
OIF 112G Panel at DesignCon 2017
 
Encrypted Traffic Mining
Encrypted Traffic MiningEncrypted Traffic Mining
Encrypted Traffic Mining
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Ofdm sim-matlab-code-tutorial web for EE students
Ofdm sim-matlab-code-tutorial web for EE studentsOfdm sim-matlab-code-tutorial web for EE students
Ofdm sim-matlab-code-tutorial web for EE students
 
FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)
 
Finalreport
FinalreportFinalreport
Finalreport
 
Webinar: BlueNRG-LP - Bluetooth 5.2 de longo alcance para aplicações industriais
Webinar: BlueNRG-LP - Bluetooth 5.2 de longo alcance para aplicações industriaisWebinar: BlueNRG-LP - Bluetooth 5.2 de longo alcance para aplicações industriais
Webinar: BlueNRG-LP - Bluetooth 5.2 de longo alcance para aplicações industriais
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
Automatic Speech Recognition Incorporating Modulation Domain Enhancement
Automatic Speech Recognition Incorporating Modulation Domain EnhancementAutomatic Speech Recognition Incorporating Modulation Domain Enhancement
Automatic Speech Recognition Incorporating Modulation Domain Enhancement
 
Speech coding techniques
Speech coding techniquesSpeech coding techniques
Speech coding techniques
 
moip
moipmoip
moip
 

Mais de Bilkent University

Mais de Bilkent University (6)

RNNs for Speech
RNNs for SpeechRNNs for Speech
RNNs for Speech
 
Qualcomm research-imagenet2015
Qualcomm research-imagenet2015Qualcomm research-imagenet2015
Qualcomm research-imagenet2015
 
Fame cvpr
Fame cvprFame cvpr
Fame cvpr
 
Performance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialPerformance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorial
 
Eren_Golge_MS_Thesis_2014
Eren_Golge_MS_Thesis_2014Eren_Golge_MS_Thesis_2014
Eren_Golge_MS_Thesis_2014
 
Cmap presentation
Cmap presentationCmap presentation
Cmap presentation
 

Último

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Último (20)

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

Poster SCGlowTTS Interspeech 2021

  • 1. SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model Edresson Casanova, Christopher Shulby, Eren Gölge, Nicolas Michael Müller, Frederico Santos de Oliveira, Arnaldo Candido Junior, Anderson da Silva Soares, Sandra Maria Aluisio, Moacir Antonelli Ponti 1. Introduction p 1.1 Motivation – Recently, normalizing flows have been successfully applied in the TTS field. When the flow-based models FlowTron (Valle et al., 2020) and Glow-TTS (Kim et al., 2020) achieved state-of-the-art results. Despite this, current zero-shot multi-speaker TTS models were heavily based on the Tacotron 2 model. 1.2 Highlights – As far as we know, this is the first work to explore flow-based models in a zero-shot multi-speaker TTS scenario. – We show that fine-tuning a GAN-based vocoder with the Mel-spectrograms predicted by the TTS model in the training speakers can significantly improve speech similarity and quality for new speakers. – Our approach achieves promising results using only 11 speakers for training. 2. Methodology: Proposed Method and Dataset 2.1 Speaker Encoder – Stack of 3 LSTM layers with a linear output layer. – Trained using the Angular Prototypical loss function with approximately 25k speakers. – Train datasets: LibriSpeech dataset, VoxCeleb V1 and V2, English version of Common Voice and VCTK. 2.2 Vocoder: HiFi-GAN V2 − VCTK dataset for training and validation. − Fine-tuning with Mel-spectrograms predicted by TTS models (HiFi-GAN-FT). 2.3 SC-GlowTTS Model: Glow-TTS based − Phonemes instead of graphemes as input. − Explore 3 different encoders: The original transformer based encoder; Residual convolutional based; Gated convolutional based. − External speaker embeddings conditioned in: Affine coupling layers in all decoder blocks; Duration predictor input. 2.4 Dataset: VCTK − Training: composed of 97 speakers. − Development: composed by samples from the 97 training speakers. − Test: composed of 11 speakers not present in the training set. Input Text Phonemizer Encoder Duration Predictor Conv Projection Speaker Embedding Aligment Generation Ceil Flow-Based Decoder UnSqueeze Affine Coupling Layer Invertible 1x1 Conv ActNorm Squeeze x 12 Predicted Mel spectrogram HiFi-GAN Waveform 3. Experiments: Setup and Results 3.1 Proposed Experiments 1. Tacotron 2 baseline following Jia et al. (2018) and Cooper et al. (2020); 2. SC-GlowTTS with transformer based encoder; 3. SC-GlowTTS with residual convolutional based encoder; 4. SC-GlowTTS with gated convolutional based encoder. 3.2 Experiments Setup – All experiments were implemented on the Coqui TTS: github.com/coqui-ai/TTS – Coqui TTS is an open source TTS framework. Contributions are welcome. – Audio samples and checkpoints of all experiments are available on: github.com/Edresson/SC-GlowTTS 3.3 Results Table 1. Real Time Factor, MOS and Sim-MOS with 95% confidence intervals and the SECS for all our experiments. Experiment - Model Vocoder RTF (CPU - GPU) SECS MOS Sim-MOS Ground Truth – – 0.9236 4.12 ± 0.06 4.127 ± 0.06 Attentron ZS (Choi et al., 2020) WaveRNN – (0.731) (3.86 ± 0.05) (3.30 ± 0.06) 1 - Tacotron 2 HiFi-GAN 0.5782 - 0.2485 0.7589 3.57 ± 0.08 3.867 ± 0.08 HiFi-GAN-FT - 0.7791 3.74 ± 0.08 3.951 ± 0.07 2 - SC-GlowTTS-Trans HiFi-GAN 0.3612 - 0.1557 0.7641 3.65 ± 0.07 3.905 ± 0.07 HiFi-GAN-FT - 0.8046 3.78 ± 0.07 3.999 ± 0.07 3 - SC-GlowTTS-Res HiFi-GAN 0.3597 - 0.1545 0.7440 3.45 ± 0.09 3.828 ± 0.08 HiFi-GAN-FT - 0.7969 3.70 ± 0.07 3.916 ± 0.07 4 - SC-GlowTTS-Gated HiFi-GAN 0.3474 - 0.1437 0.7432 3.55 ± 0.08 3.852 ± 0.08 HiFi-GAN-FT - 0.7849 3.82 ± 0.07 3.952 ± 0.07 4. SC-GlowTTS performance with few speakers – To emulate a scenario with few speakers we selected 11 speakers from the training subset of the VCTK dataset. – We trained the SC-GlowTTS-Trans model on the single speaker dataset, LJ Speech, after we continued the training, in this dataset composed of 11 speakers and we calculated the metrics for the test set. – The model achieved a similarity MOS of 3.93±0.08 and a MOS of 3.71±0.07. These results are comparable to those achieved by the Tacotron 2 baseline trained with 98 speakers which achieved a similarity MOS of 3.95±0.07 and a MOS of 3.74±0.08. – We believe that this is an important step forward, especially for zero-shot multi speaker TTS in low-resource languages.