
Coping with language data sparsity: semantic head mapping for compound words
- Author
- Joris Pelemans, Kris Demuynck (UGent) , Hugo Van hamme and Patrick Wambacq
- Organization
- Abstract
- in this paper we present a novel clustering technique for compound words. By mapping compounds onto their semantic heads, the technique is able to estimate n-gram probabilities for unseen compounds. We argue that compounds are well represented by their heads which allows the clustering of rare words and reduces the risk of over-generalization. The semantic heads arc obtained by a two-step process which consists of constituent generation and best head selection based on corpus statistics. Experiments on Dutch read speech show that our technique is capable of correctly identifying compounds and their semantic heads with a precision of 80.25% and a recall of 85.97%. A class-based language model with compound-head clusters achieves a significant reduction in both perplexity and WER.
- Keywords
- OOV, clustering, sparsity, n-grams, compounds
Downloads
-
(...).pdf
- full text
- |
- UGent only
- |
- |
- 88.88 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-4404017
- MLA
- Pelemans, Joris, et al. “Coping with Language Data Sparsity: Semantic Head Mapping for Compound Words.” International Conference on Acoustics Speech and Signal Processing ICASSP, IEEE, 2014, pp. 141–45.
- APA
- Pelemans, J., Demuynck, K., Van hamme, H., & Wambacq, P. (2014). Coping with language data sparsity: semantic head mapping for compound words. International Conference on Acoustics Speech and Signal Processing ICASSP, 141–145. IEEE.
- Chicago author-date
- Pelemans, Joris, Kris Demuynck, Hugo Van hamme, and Patrick Wambacq. 2014. “Coping with Language Data Sparsity: Semantic Head Mapping for Compound Words.” In International Conference on Acoustics Speech and Signal Processing ICASSP, 141–45. IEEE.
- Chicago author-date (all authors)
- Pelemans, Joris, Kris Demuynck, Hugo Van hamme, and Patrick Wambacq. 2014. “Coping with Language Data Sparsity: Semantic Head Mapping for Compound Words.” In International Conference on Acoustics Speech and Signal Processing ICASSP, 141–145. IEEE.
- Vancouver
- 1.Pelemans J, Demuynck K, Van hamme H, Wambacq P. Coping with language data sparsity: semantic head mapping for compound words. In: International Conference on Acoustics Speech and Signal Processing ICASSP. IEEE; 2014. p. 141–5.
- IEEE
- [1]J. Pelemans, K. Demuynck, H. Van hamme, and P. Wambacq, “Coping with language data sparsity: semantic head mapping for compound words,” in International Conference on Acoustics Speech and Signal Processing ICASSP, Florence, Italy, 2014, pp. 141–145.
@inproceedings{4404017, abstract = {{in this paper we present a novel clustering technique for compound words. By mapping compounds onto their semantic heads, the technique is able to estimate n-gram probabilities for unseen compounds. We argue that compounds are well represented by their heads which allows the clustering of rare words and reduces the risk of over-generalization. The semantic heads arc obtained by a two-step process which consists of constituent generation and best head selection based on corpus statistics. Experiments on Dutch read speech show that our technique is capable of correctly identifying compounds and their semantic heads with a precision of 80.25% and a recall of 85.97%. A class-based language model with compound-head clusters achieves a significant reduction in both perplexity and WER.}}, author = {{Pelemans, Joris and Demuynck, Kris and Van hamme, Hugo and Wambacq, Patrick}}, booktitle = {{International Conference on Acoustics Speech and Signal Processing ICASSP}}, isbn = {{9781479928934}}, issn = {{1520-6149}}, keywords = {{OOV,clustering,sparsity,n-grams,compounds}}, language = {{eng}}, location = {{Florence, Italy}}, pages = {{141--145}}, publisher = {{IEEE}}, title = {{Coping with language data sparsity: semantic head mapping for compound words}}, year = {{2014}}, }