I'm a data scientist and AI engineer based in Amsterdam with a background in information retrieval and applied ML research.
The interesting problem is rarely which model to use—it's how to structure the problem so the model is actually useful. That means knowing when a language model is the right tool, how to build retrieval systems that retrieve the right thing, and how to diagnose why a model that worked in development stops working in production.
A functional sales or support agent requires two things to work well: a clear model of the conversation it needs to have, and a reliable way to interpret what the user actually means at each step. I work with clients to build both — translating business logic into a structured system and implementing the AI layer that makes it responsive to real language.
Deliverables:
The part of a RAG system most likely to fail is retrieval. A model can only work with what it's given — if the retrieved context is wrong or poorly ranked, the answer will be too, regardless of the model. I build retrieval pipelines where chunking strategy, embedding selection, and reranking are treated as the core engineering problems. Backend uses Python and FastAPI, with support for local models (Ollama) and commercial providers (OpenAI, Mistral), and vector storage via ChromaDB or pgvector.
Deliverables:
When a model degrades in production, there's usually a specific, isolatable reason — a shift in input distribution, a data quality issue, a pattern that didn't generalise. I run controlled ablation studies to identify and rank contributing factors, so fixes can be prioritised by evidence rather than intuition.
Deliverables:
Sales scripts encode something specific: a model of the conversation, and the logic for navigating it based on what the customer says. Translating that into a reliable automated system is mostly an engineering problem — the challenge isn't generating responses, it's making sure the right response is selected consistently.
This project involved taking a client's sales methodology and restructuring it into a configurable system: dialogue flows, branching logic, objection handling at different stages of the conversation, and fallback behaviour for unexpected inputs. The language model handles intent recognition at each step; the business logic handles everything else. Responses come from the script, not from the model.
Because the business logic is fully separated from the model layer, the same system can be configured for different clients and verticals without rebuilding anything from scratch — which matters for an agency managing multiple accounts.
A desktop application for semantic search over Zotero research libraries. Designed for researchers who need to navigate large literature collections and get answers with clear source attribution — not summaries that obscure where the information came from. Supports local LLMs for full data privacy. Open source.
View on GitHub →
An empirical study of whether multi-task learning can improve NLP model robustness when input or label distributions shift between training and deployment. The experimental setting was persuasion detection across two domains — Ukraine conflict and climate change — confirmed to be meaningfully different before any modelling began.
The central finding: multi-task learning helps most in low-resource, class-imbalanced settings, where the auxiliary task provides indirect data augmentation. In stable, data-rich settings, single-task models remain competitive.