ideas :: personads

Machine Learning

Cross-model Alignment ●●○

How similarly is structured information represented across models?

Data: Any linguistically-motived dataset, ideally multilingual (e.g., multilingual syntactic parsing Universal Dependencies by Nivre et al., 2020) + a diverse selection of pre-trained language models.
Method: Encode the same raw data + probe for linguistically-motivated tasks. Compare embeddings and probes for the same data across models.
Evaluation: Quantitative metric used by the relevant dataset(s) / Qualitative analysis.

Circuit-based Transfer Learning ●●●

Can circuits in Transformer models be amplified for zero-shot transfer learning?

Data: Any large-enough dataset, which allows for the isolation of a specific property (e.g., language, task).
Method: Use mechanistic interpretability methods to extract the relevant circuits, and amplify them to induce positive transfer learning effects..
Evaluation: Quantitative metrics depending on the task and dataset of choice.

Natural Language Processing

Effects of Syntactic Complexity on LM Training Dynamics ●●○

Which types of linguistic information provide the most generalizable information during LM training?

Data: Universal Dependencies (Nivre et al., 2020) filtered by dependency types.
Method: Train smaller LMs on data subsets with varying syntactic complexity.
Evaluation: Perplexity and standard linguistic benchmark metrics.
Bonus: Track the LM updates for different types of syntactic training data.

Syntactic Annotation of Lyrics ●○○

How are syntactic characteristics reflected in lyrics from different types of music?

Data: Various options available (e.g., DALI by Meseguer-Brocal et al., 2020).
Method: Joint annotation to create a novel dataset + extensive linguistic analysis.
Evaluation: Inter-annotator agreement / LAS / qualitative analysis.

Prompting for Dependencies ●○○

Can targeted prompting of pre-trained language models elicit graphical structures?

Data: Universal Dependencies (Nivre et al., 2020)
Method: Identify a scheme which allows prompting for dependency graphs from pre-trained masked language models
Evaluation: UAS/LAS

Storyline Coherence ●●○

Are long-sequence transformers able to accurately estimate the timeline coherence of a story?

Data: Crawl fanfiction archive including chapter and paragraph metadata.
Method: Ranking, regression or classification using a pre-trained long-sequence transformer model.
Alternative: Investigate more recent ∞-former architecture (Martins et al., 2022).
Evaluation: Rank correlation.

Accessibility

Contextual Augmentative Communication ●●○

Can LMs generate relevant outputs for AAC devices (text-to-speech systems for people with cognitive/motor disabilities) from the smallest possible set of keywords and context, potentially via iterative feedback?

Data: Conversational dataset (potentially, synthetically abbreviated).
Method: Standard prompt engineering, parameter-efficient fine-tuning, reinforcement learning from human feedback.
Evaluation: Reconstruction loss / Qualitative evaluation.

Matching Songs and Artworks ●●○

Are matches in textual content or sentiment representative of audio-visual correspondence?

Data: Behance Artistic Media Dataset (Wilber et al., 2017) + Million Song Dataset (Bertin-Mahieux et al., 2011).
Method (1): Similarity measured by proxy of BAM captions and musiXmatch lyrics.
Method (2): Discriminate between artwork and lyrics with matching/differing BAM emotion and MusicMood labels.
Evaluation: Human preference / retrieval performance on MusicMood dev.

Artsy

Spectral Analysis of Music Embeddings ●●●

Do neuron activation frequencies in generative models for music correspond to long/short-term structure?

Data: Pre-trained Music Transformer model (Huang et al., 2018).
Method: Apply the discrete cosine transform to the model's hidden representations (Tamkin et al., 2020, Müller-Eberstein et al., 2023).
Evaluation: Qualitative evaluation / Perplexity over time given hidden representations filtered at different frequencies.

Music Genre Classification ●○○

Which features or combinations thereof best predict musical genre?

Data: Million Song Dataset (Bertin-Mahieux et al., 2011) + tagtraum (Schreiber, 2015) or similar.
Method: Favourite supervised learning algorithm.
Evaluation: F1-Score.

Departure Melody Generation ●●●

Are railway station properties sufficient to conditionally generate appropriate departure melodies?

Data: Custom MIDI dataset + crawling publically available sources.
Method: Sequence generation model conditioned on, e.g., station names and locations from crawled data.
Evaluation: Qualitative evaluation / Output classification with regards to desired statistics.

Multimodality

Cross-modal Onomatopoeia ●●●

How much cross-modal information can be inferred from textual onomatopoeia?

Data: Japanese onomatopeia dictionaries (e.g., learning resources) + Annotated image data (e.g., COCO), speech data (e.g., Common Voice), or additional textual data (e.g., Multilingual Amazon Reviews).
Method: Extract pairs of relevant onomatopoeia and images/speech/text (e.g., 「凸凹」↔ unpaved road, 「ヒラリ」↔ feather); investigate whether relations between onomatopeia in one latent space hold in the other.
Evaluation: Predictive accuracy / Top-k retrieval accuracy.

International Space Station Sightings ●○○

How accurately can a model fit the ISS' orbital path based on visibility data?

Data: Sighting data of the International Space Station for 6.8k locations across 3 years from the Flyover service.
Method: Favourite supervised time-series prediction algorithm.
Addition: Incorporate cross-modal information sources to improve accuracy.
Evaluation: Absolute difference evaluated for unseen locations.

A few ideas that could be suitable as supervised projects. The ●○○ indicate the estimated difficulty, effort and uncertainty. If there are projects which fit a similar profile, please feel free to reach out as well.