Machine Learning

Cross-model Alignment ●●○

How similarly is structured information represented across models?

  • Data: Any linguistically-motived dataset, ideally multilingual (e.g., multilingual syntactic parsing Universal Dependencies by Nivre et al., 2020) + a diverse selection of pre-trained language models.
  • Method: Encode the same raw data + probe for linguistically-motivated tasks. Compare embeddings and probes for the same data across models.
  • Evaluation: Quantitative metric used by the relevant dataset(s) / Qualitative analysis.
Circuit-based Transfer Learning ●●●

Can circuits in Transformer models be amplified for zero-shot transfer learning?

  • Data: Any large-enough dataset, which allows for the isolation of a specific property (e.g., language, task).
  • Method: Use mechanistic interpretability methods to extract the relevant circuits, and amplify them to induce positive transfer learning effects..
  • Evaluation: Quantitative metrics depending on the task and dataset of choice.

Natural Language Processing

Effects of Syntactic Complexity on LM Training Dynamics ●●○

Which types of linguistic information provide the most generalizable information during LM training?

  • Data: Universal Dependencies (Nivre et al., 2020) filtered by dependency types.
  • Method: Train smaller LMs on data subsets with varying syntactic complexity.
  • Evaluation: Perplexity and standard linguistic benchmark metrics.
  • Bonus: Track the LM updates for different types of syntactic training data.
Syntactic Annotation of Lyrics ●○○

How are syntactic characteristics reflected in lyrics from different types of music?

  • Data: Various options available (e.g., DALI by Meseguer-Brocal et al., 2020).
  • Method: Joint annotation to create a novel dataset + extensive linguistic analysis.
  • Evaluation: Inter-annotator agreement / LAS / qualitative analysis.
Prompting for Dependencies ●○○

Can targeted prompting of pre-trained language models elicit graphical structures?

  • Data: Universal Dependencies (Nivre et al., 2020)
  • Method: Identify a scheme which allows prompting for dependency graphs from pre-trained masked language models
  • Evaluation: UAS/LAS
Storyline Coherence ●●○

Are long-sequence transformers able to accurately estimate the timeline coherence of a story?

  • Data: Crawl fanfiction archive including chapter and paragraph metadata.
  • Method: Ranking, regression or classification using a pre-trained long-sequence transformer model.
  • Alternative: Investigate more recent ∞-former architecture (Martins et al., 2022).
  • Evaluation: Rank correlation.

Accessibility

Contextual Augmentative Communication ●●○

Can LMs generate relevant outputs for AAC devices (text-to-speech systems for people with cognitive/motor disabilities) from the smallest possible set of keywords and context, potentially via iterative feedback?

  • Data: Conversational dataset (potentially, synthetically abbreviated).
  • Method: Standard prompt engineering, parameter-efficient fine-tuning, reinforcement learning from human feedback.
  • Evaluation: Reconstruction loss / Qualitative evaluation.
Matching Songs and Artworks ●●○

Are matches in textual content or sentiment representative of audio-visual correspondence?

Artsy

Spectral Analysis of Music Embeddings ●●●

Do neuron activation frequencies in generative models for music correspond to long/short-term structure?

  • Data: Pre-trained Music Transformer model (Huang et al., 2018).
  • Method: Apply the discrete cosine transform to the model's hidden representations (Tamkin et al., 2020, Müller-Eberstein et al., 2023).
  • Evaluation: Qualitative evaluation / Perplexity over time given hidden representations filtered at different frequencies.
Music Genre Classification ●○○

Which features or combinations thereof best predict musical genre?

  • Data: Million Song Dataset (Bertin-Mahieux et al., 2011) + tagtraum (Schreiber, 2015) or similar.
  • Method: Favourite supervised learning algorithm.
  • Evaluation: F1-Score.
Departure Melody Generation ●●●

Are railway station properties sufficient to conditionally generate appropriate departure melodies?

  • Data: Custom MIDI dataset + crawling publically available sources.
  • Method: Sequence generation model conditioned on, e.g., station names and locations from crawled data.
  • Evaluation: Qualitative evaluation / Output classification with regards to desired statistics.

Multimodality

Cross-modal Onomatopoeia ●●●

How much cross-modal information can be inferred from textual onomatopoeia?

  • Data: Japanese onomatopeia dictionaries (e.g., learning resources) + Annotated image data (e.g., COCO), speech data (e.g., Common Voice), or additional textual data (e.g., Multilingual Amazon Reviews).
  • Method: Extract pairs of relevant onomatopoeia and images/speech/text (e.g., 「凸凹」↔ unpaved road, 「ヒラリ」↔ feather); investigate whether relations between onomatopeia in one latent space hold in the other.
  • Evaluation: Predictive accuracy / Top-k retrieval accuracy.
International Space Station Sightings ●○○

How accurately can a model fit the ISS' orbital path based on visibility data?

  • Data: Sighting data of the International Space Station for 6.8k locations across 3 years from the Flyover service.
  • Method: Favourite supervised time-series prediction algorithm.
  • Addition: Incorporate cross-modal information sources to improve accuracy.
  • Evaluation: Absolute difference evaluated for unseen locations.

A few ideas that could be suitable as supervised projects. The ●○○ indicate the estimated difficulty, effort and uncertainty. If there are projects which fit a similar profile, please feel free to reach out as well.