Machine Learning
- Cross-model Alignment ●●○
-
How similarly is structured information represented across models?
- Data: Any linguistically-motived dataset, ideally multilingual (e.g., multilingual syntactic parsing Universal Dependencies by Nivre et al., 2020) + a diverse selection of pre-trained language models.
- Method: Encode the same raw data + probe for linguistically-motivated tasks. Compare embeddings and probes for the same data across models.
- Evaluation: Quantitative metric used by the relevant dataset(s) / Qualitative analysis.
- Circuit-based Transfer Learning ●●●
-
Can circuits in Transformer models be amplified for zero-shot transfer learning?
- Data: Any large-enough dataset, which allows for the isolation of a specific property (e.g., language, task).
- Method: Use mechanistic interpretability methods to extract the relevant circuits, and amplify them to induce positive transfer learning effects..
- Evaluation: Quantitative metrics depending on the task and dataset of choice.
Natural Language Processing
- Effects of Syntactic Complexity on LM Training Dynamics ●●○
-
Which types of linguistic information provide the most generalizable information during LM training?
- Data: Universal Dependencies (Nivre et al., 2020) filtered by dependency types.
- Method: Train smaller LMs on data subsets with varying syntactic complexity.
- Evaluation: Perplexity and standard linguistic benchmark metrics.
- Bonus: Track the LM updates for different types of syntactic training data.
- Syntactic Annotation of Lyrics ●○○
-
How are syntactic characteristics reflected in lyrics from different types of music?
- Data: Various options available (e.g., DALI by Meseguer-Brocal et al., 2020).
- Method: Joint annotation to create a novel dataset + extensive linguistic analysis.
- Evaluation: Inter-annotator agreement / LAS / qualitative analysis.
- Prompting for Dependencies ●○○
-
Can targeted prompting of pre-trained language models elicit graphical structures?
- Data: Universal Dependencies (Nivre et al., 2020)
- Method: Identify a scheme which allows prompting for dependency graphs from pre-trained masked language models
- Evaluation: UAS/LAS
- Storyline Coherence ●●○
-
Are long-sequence transformers able to accurately estimate the timeline coherence of a story?
- Data: Crawl fanfiction archive including chapter and paragraph metadata.
- Method: Ranking, regression or classification using a pre-trained long-sequence transformer model.
- Alternative: Investigate more recent ∞-former architecture (Martins et al., 2022).
- Evaluation: Rank correlation.
Accessibility
- Contextual Augmentative Communication ●●○
-
Can LMs generate relevant outputs for AAC devices (text-to-speech systems for people with cognitive/motor disabilities) from the smallest possible set of keywords and context, potentially via iterative feedback?
- Data: Conversational dataset (potentially, synthetically abbreviated).
- Method: Standard prompt engineering, parameter-efficient fine-tuning, reinforcement learning from human feedback.
- Evaluation: Reconstruction loss / Qualitative evaluation.
- Matching Songs and Artworks ●●○
-
Are matches in textual content or sentiment representative of audio-visual correspondence?
- Data: Behance Artistic Media Dataset (Wilber et al., 2017) + Million Song Dataset (Bertin-Mahieux et al., 2011).
- Method (1): Similarity measured by proxy of BAM captions and musiXmatch lyrics.
- Method (2): Discriminate between artwork and lyrics with matching/differing BAM emotion and MusicMood labels.
- Evaluation: Human preference / retrieval performance on MusicMood dev.
Artsy
- Spectral Analysis of Music Embeddings ●●●
-
Do neuron activation frequencies in generative models for music correspond to long/short-term structure?
- Data: Pre-trained Music Transformer model (Huang et al., 2018).
- Method: Apply the discrete cosine transform to the model's hidden representations (Tamkin et al., 2020, Müller-Eberstein et al., 2023).
- Evaluation: Qualitative evaluation / Perplexity over time given hidden representations filtered at different frequencies.
- Music Genre Classification ●○○
-
Which features or combinations thereof best predict musical genre?
- Data: Million Song Dataset (Bertin-Mahieux et al., 2011) + tagtraum (Schreiber, 2015) or similar.
- Method: Favourite supervised learning algorithm.
- Evaluation: F1-Score.
- Departure Melody Generation ●●●
-
Are railway station properties sufficient to conditionally generate appropriate departure melodies?
- Data: Custom MIDI dataset + crawling publically available sources.
- Method: Sequence generation model conditioned on, e.g., station names and locations from crawled data.
- Evaluation: Qualitative evaluation / Output classification with regards to desired statistics.
Multimodality
- Cross-modal Onomatopoeia ●●●
-
How much cross-modal information can be inferred from textual onomatopoeia?
- Data: Japanese onomatopeia dictionaries (e.g., learning resources) + Annotated image data (e.g., COCO), speech data (e.g., Common Voice), or additional textual data (e.g., Multilingual Amazon Reviews).
- Method: Extract pairs of relevant onomatopoeia and images/speech/text (e.g., 「凸凹」↔ unpaved road, 「ヒラリ」↔ feather); investigate whether relations between onomatopeia in one latent space hold in the other.
- Evaluation: Predictive accuracy / Top-k retrieval accuracy.
- International Space Station Sightings ●○○
-
How accurately can a model fit the ISS' orbital path based on visibility data?
- Data: Sighting data of the International Space Station for 6.8k locations across 3 years from the Flyover service.
- Method: Favourite supervised time-series prediction algorithm.
- Addition: Incorporate cross-modal information sources to improve accuracy.
- Evaluation: Absolute difference evaluated for unseen locations.
A few ideas that could be suitable as supervised projects. The ●○○ indicate the estimated difficulty, effort and uncertainty. If there are projects which fit a similar profile, please feel free to reach out as well.