paper – Pilsen Eyes

Jan Zelinka, Jakub Kanis

Our work deals with a text-to-video sign language synthesis. Instead of direct video production, we focused on skeletal model production. Our main goal was to design a fully end-to-end automatic sign language synthesis system trained only on available free data (daily TV broadcasting). Thus, we excluded any manual video annotation. Furthermore, our designed approach even does not rely on any video segmentation. A proposed feed-forward transformer and recurrent transformer were investigated. To improve the performance of our sequence-to-sequence transformer, soft non-monotonic attention was employed in our training process. A benefit of character-level features was compared with word-level features. We focused our experiments on a weather forecasting dataset in the Czech Sign Language.

J. Zelinka and J. Kanis, “Neural Sign Language Synthesis: Words Are Our Glosses,” 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 2020, pp. 3384-3392, doi: 10.1109/WACV45572.2020.9093516.

Annotating a dataset for training a Supervised Machine Learning algorithm is time and annotator’s attention intensive. Our colleagues from the Department of Cybernetics (Miroslav Jirik, Ivan Gruber, and Milos Zelezny) developed a procedure for producing the datasets based on microscopic whole slide images for regression tasks. Together with colleagues from the Biomedical Center of Charles University, they prepared an open-source application for creating annotations of the dataset with minimal demands on the expert’s time. The work was published in Lecture Notes in Computer Science (SJR=0.283).

Jirik M. et al. (2020) MicrAnt: Towards Regression Task Oriented Annotation Tool for Microscopic Images. In: Lukić T., Barneva R., Brimkov V., Čomić L., Sladoje N. (eds) Combinatorial Image Analysis. IWCIA 2020. Lecture Notes in Computer Science, vol 12148. Springer, Cham. https://doi.org/10.1007/978-3-030-51002-2_15