Van: CS2020_lrec2020 Onderwerp: Your CALCS 2020 Submission (Number 5) Datum: 7 april 2020 om 10:39:05 CEST Aan: Antwoord aan: Dear Els Lefever: On behalf of the CS 2020 Program Committee, I am delighted to inform you that the following submission has been accepted to appear at the conference: Sentiment Analysis for Hinglish Code-mixed Tweets by means of Cross-lingual Word Embeddings The Program Committee worked very hard to thoroughly review all the submitted papers. Please repay their efforts, by following their suggestions when you revise your paper. The camera ready deadline is April 10th, to upload your final manuscript at the following site: https://www.softconf.com/lrec2020/CS2020/ You will be prompted to login to your START account. If you do not see your submission, you can access it with the following passcode: 5X-P6E6G9G5P6 Alternatively, you can click on the following URL, which will take you directly to a form to submit your final paper (after logging into your account): https://www.softconf.com/lrec2020/CS2020/user/scmd.cgi?scmd=aLogin&passcode=5X-P6E6G9G5P6 The reviews and comments are attached below. Again, try to follow their advice when you revise your paper. Congratulations on your fine work. If you have any additional questions, please feel free to get in touch. Best Regards, 
Organizers 
CS 2020 

Meta review: The paper investigates various unsupervised approaches towards sentiment analysis in code switched/code mixed data.  Issues with the paper: I don’t see results of the Hindi monolingual baseline. I would have liked to see results of monolingual baselines  vs. bilingual baselines. More error analysis indicating for example the impact of the snippet of intervening code mixed data.  ============================================================================ CS 2020 Reviews for Submission #5 ============================================================================ Title: Sentiment Analysis for Hinglish Code-mixed Tweets by means of Cross-lingual Word Embeddings Authors: Pranaydeep Singh and Els Lefever ============================================================================ REVIEWER #1 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Appropriateness (1-5): 5 Clarity (1-5): 3 Originality / Innovativeness (1-5): 3 Soundness / Correctness (1-5): 3 Meaningful Comparison (1-5): 2 Thoroughness (1-5): 3 Impact of Ideas or Results (1-5): 3 Recommendation (1-5): 2 Reviewer Confidence (1-5): 4 Detailed Comments --------------------------------------------------------------------------- The paper talks about sentiment identification in romanized Hinglish text from SemEval 2020 shared task. The paper is for the most clearly written with reasonable evaluation and error analysis. The authors have validated the system using 5-fold cross validation. However, I strongly believe that author could have take part in the actual SemEval task where the systems are under evaluation (using the actual test data from the shared task) currently and the results are due in 18th March, 2020. Authors have tried a supervised sentiment classification using MUSE cross-lingual embedding using Hinglish data and a transfer learning approach using English sentiment data. It is surprising that authors have not used multilingual BERT for the same which has shown reasonable success across multilingual/cross lingual NLP tasks. Ideally, authors could have use and compare some multilingual embedding and zero-shot learning (tuned only with English sentiment data as done in case of transfer -based approach in the paper). --------------------------------------------------------------------------- Questions for Authors --------------------------------------------------------------------------- The detail of the LSTM-based architecture is missing. Is it 4 layers stacked LSTM with one MLP for classification? What is the input to the classification MLP? Why you have used LSTM instead of transformer network? Standard LSTM or transformer network will not be able to capture spelling variation and other ill-formed token which can be captured by a char CNN. Why you are not leveraging both? It would be good to describe the hyper parameters and other details of the model so that the researchers in future can replicate the process. --------------------------------------------------------------------------- ============================================================================ REVIEWER #2 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Appropriateness (1-5): 5 Clarity (1-5): 4 Originality / Innovativeness (1-5): 2 Soundness / Correctness (1-5): 3 Meaningful Comparison (1-5): 2 Thoroughness (1-5): 1 Impact of Ideas or Results (1-5): 1 Recommendation (1-5): 2 Reviewer Confidence (1-5): 4 Detailed Comments --------------------------------------------------------------------------- This paper presents explorations on constructing a sentiment analyzer for Hindi/English code-mixed tweets. Specifically, the authors project separate English and "Hinglish" word embedding spaces, each induced from Tweet corpora, into a shared embedding space, and then train a sentiment analysis model using either Hinglish or English-only supervised data. Major comments: * It was good to see that the token-level language annotations weren't used by the sentiment model. Not relying on these inputs will make the resulting model less brittle than if it were dependent on the output of a language identification model upstream in the pipeline. * It's confusing to refer to "monolingual Hinglish", since Hinglish is, by definition, the mixing of two languages. Perhaps it would be better to refer to the "separate" English and Hinglish embeddings, to contrast with the "shared" embeddings? * In Section 4.1, the paper says that Hingish and English tweets were collected via the Twitter API. Is specifying a particular language something the API natively supports (as opposed to having to run your own language ID model on the tweets)? If so, presumably the language classification is done automatically? Do we have any idea of the quality of the Hinglish classification? Presumably Hinglish, containing both code-mixing and transliteration, is particularly hard for their models. * The last paragraph of Section 4.2 explains that the "seed dict" approach to embedding alignment uses common tokens like numerals and "common tokens like 'https'", but since the two embedding spaces being combined are English and Hinglish, they must actually have quite a lot of lexical overlap in the form of all the English vocabulary. Presumably using all of those English vocabulary items would be valuable in performing the alignment. Were those really ignored, or is it just not mentioned in the paper? * The supervised results seem to indicate that using only English embeddings (the first row of Table 3) yields a macro-average F1 score of 60.6, but using embedded trained from in-domain code-mixed texts only increases that score to 61.6. Is this because there is so much English in the Hinglish test data that English embeddings are basically good enough? Or is it an indication that the quality of the pretrained embeddings doesn't actually matter much (which could be confirmed with an experiment that doesn't use any pretrained embeddings)? Either way, it's therefore not terribly surprising that the cross-lingual embeddings don't add too much beyond that, only bringing the F1 score up to 63.5. * In all, the paper seems to apply an existing approach for inducing a shared embedding space, and train a standard text classifier in the usual manner. The comparison between the "unsupervised" and "seed dict" approaches is a nice touch, but more needs to be done in this project. Minor comments: * In the first paragraph of the Introduction, it says that this work considers code-mixing to be when Hindi is embedded into English text, but presumably a lot of the code-mixing on Twitter is also English constituents embedded in Hindi text? * In Section 3, it would be good to give the sizes of the train, dev, and test sets. It would also be good to know how much monolingual English sentiment supervision was used. * In Section 4.0, it's confusing to say "as a classificaiton algorithm, we opted for a standard bi-direction LSTM", since an LSTM isn't really a classifier, it's more of an "encoder", turning the input text into a vector. Presumably your model uses the Bi-LSTM to encode the text, and then it's fed into some classification layer(s) with a softmax? * Tables 3 and 4: Multiply all nubmers by 100. The leading "0."s just make the numbers harder to read. Also, change "Average score" to "Macro-average score" to be clear about what kind of averaging is being done. * The last paragraph of Section 5 describes the transfer learning results as having "acceptable accuracies", but a model that has an F1 score of ~56 on a 3-way classification problem doesn't seem very useful. Similarly, Section 6 describes the transfer-learning F1 score of 56 as being "comparable" to the ~61-62 of the supervised case, but that's really quite a big difference. --------------------------------------------------------------------------- ============================================================================ REVIEWER #3 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Appropriateness (1-5): 5 Clarity (1-5): 4 Originality / Innovativeness (1-5): 2 Soundness / Correctness (1-5): 3 Meaningful Comparison (1-5): 5 Thoroughness (1-5): 3 Impact of Ideas or Results (1-5): 3 Recommendation (1-5): 3 Reviewer Confidence (1-5): 4 Detailed Comments --------------------------------------------------------------------------- Section 4.1 - I am not sure how the code-mixed tweets were collected! Since Twitter does not provide any definite languages tag for code-mixed tweets. Twitter either assign ‘hi’ or ‘in’ to mark English tweets. Authors also mentioned separating out Devnagri tweets. Please report the total number of tweets you got after Devnagri separation. Was the search API being used, or the streaming API being used? And a CMI statistics on the same would be useful too. Section 4.2 - Authors reported the following. For our experiments, we tested two variants of both the VecMap and MUSE cross-lingual embeddings: (1) embeddings aligned with an entirely unsupervised dictionary induction method and (2) embeddings aligned using numerals and common tokens like “https” as a bilingual seed dictionary. — My suggestion is to elaborate on these two. It is not clear that how method 2 has been applied, what are the seeds, and how they are chosen. Section 4.3 - 4.3. Transfer Learning with Cross-lingual Embeddings The motivation is quite clear, since the cross-lingual embedding has made several recent progresses it is interesting to see how such methods perform on code-mixed data, but I wonder why no details are provided about the experiments. For example seed set - how they are chosen, whether CCA kind of techniques has been used etc. I like the para graph - Section 5 - “A tweet like “One India…”. This gives some intuition which method is better performing and how. I suggest to write a separate Discussion section in this paper, to write and discuss more comparative analysis on methods. Since more mixed introducing more challenges - also could be seen in your analysis, so a discussion and comparison based on CMI would be good. Although there is no comparable word similarity dataset available for Hindi and on Hinglish, but some examples how the trained embeddings are capturing semantics would be a nice addition and would make the paper more interestingly readable. Can also plan to have TSNE kind of visualization. --------------------------------------------------------------------------- -- 
CS 2020 - https://www.softconf.com/lrec2020/CS2020