TOME project 3

News from April 2024

TOME Corpus and Transcription Work

  • The transcription of the TOME corpus using TRANSKRIBUS, under human supervision, is ongoing. The team is focusing on establishing conventions and standards to streamline and improve the transcription process, ensuring consistent and high-quality outputs.

Advances in Computational Approaches

  • The Computational Group continues to make progress with token-based embeddings, currently testing this method on a Czech corpus of pre-1989 samizdat journals. These efforts aim to refine techniques for analyzing linguistic data in historical and cultural contexts.
  • Vojtěch Kaše is exploring a methodology for transitioning between type-based embeddings (lemmata embeddings) and token-based embeddings (embeddings of individual instances). Collaborating with Jana Švadlenková, he is developing a publicly accessible tool. This tool will allow users to visualize the semantic neighborhood and proximity of lemmata from NOSCEMUS, as well as their diachronic development, tailored to the specific inputs and needs of users. The computational group is progressing on token-based embeddings and they are testing that approach, at this moment on a Czech corpus of pre-1989 samizdat journals. 

Talk at the Stavelot VERITRACE Workshop

  • Petr Pavlas delivered a talk at the Stavelot VERITRACE internal workshop in early March, invited by Cornelis J. Schilt. His presentation covered key aspects of ongoing research, sparking engaging discussions. You can read a detailed report on the meeting.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *