Majid Khoshrou

Majid Khoshrou

Data Scientist | Forecasting & Probabilistic Modeling | Generative AI

Projects

This is a selection of projects showcasing my work in machine learning, forecasting, and applied AI - including generative AI applications like Mr M, a personalized assistant trained on my own research and project data.

Talk to Mr M

Mr M - Majid's AI Assistant Generative AI LLM + RAG Knowledge Engineering

Mr M is a custom-built AI assistant trained entirely on my own academic and professional materials - including research papers, personal website content, and relevant project documentation. Its purpose is to make my expertise, background, and ongoing work easily accessible through natural language queries. All underlying data sources (HTML, PDFs, external links) are processed, cleaned, and embedded using OpenAI's models and indexed via FAISS.

One of the major challenges in building Mr M was designing a robust pipeline that could unify information from diverse formats while preserving context fidelity. I implemented automatic text extraction, chunking with overlap, and semantic embedding, all wrapped in a context-restricted QA system to ensure trustworthy answers. The result is an explainable, domain-specific AI system that can accurately respond to questions like "What are Majid’s latest projects?" or "Tell me about his academic research" - with traceability to the original sources.

Beyond the AI layer, the system is engineered for scalability and reliability. The application runs locally with Flask but is fully deployable to the cloud using AWS SAM and CloudFormation, following an Infrastructure-as-Code approach. This enables seamless deployment to AWS Lambda and API Gateway, with all resources versioned and automated. The architecture also includes Dockerized builds, production-ready environment configurations, and modular services for analytics, search, and security (Cloudflare Turnstile for bot protection and abuse prevention).

In short, Mr M is not just an AI experiment - it is a production-grade, cloud-native assistant that combines personal knowledge management with professional software engineering practices.

Short Term Forecasts

Improving Day-Ahead Load Forecasts Under Delayed Feedback - Alliander Time Series Energy Forecasting Open Source

Since June 2024, I've been part of the Short-Term Energy Forecasts (STEF) team at Alliander, where I work on improving allocation forecasts for the day-ahead energy market. Unlike typical forecasting problems, our target variable (Total Load) is only available in delayed stages: first after 3 days, then revised at 5 and 8 days, and finalized after 10 days. This delay made accurate training, evaluation, and monitoring significantly more complex.

To overcome these challenges, I fixed long-standing bugs in the model pipeline, introduced features that better captured seasonality and holiday effects, and implemented a new concept we call data balancing. Instead of training on an entire year of historical data, we now focus the training set on periods most relevant to the forecast horizon — improving signal quality while reducing noise. These improvements led to a ~30% increase in forecast accuracy and are projected to generate seven-figure annual cost savings. All enhancements have been contributed to OpenSTEF, Alliander’s open-source forecasting library, benefiting both internal teams and the wider energy forecasting community.

Risk Assessment

Operational Risk Framework for MV Grids - Alliander Reliability Engineering Decision Intelligence Data Strategy

From January 2023 to June 2024, I worked as a Data Expert in the Power Flow Analysis team at Alliander, where I led the development of a data-driven framework to assess risk across medium voltage (MV) grid routes. My main focus was on analyzing capacity-related data and translating it into actionable insights to support both planning and operational decisions.

One of the biggest challenges was the lack of clear KPIs and a consistent definition of "risk" in MV networks, which made it difficult to prioritize interventions. To address this, I initiated and facilitated cross-functional discussions with stakeholders to align on a transparent risk scoring system, while also identifying measurable and widely available risk factors across the grid. The resulting model enabled systematic monitoring of grid vulnerability, informed more targeted maintenance and sensor deployment strategies, and established a shared language around reliability risk across teams.

AUV GMM Sampling

Real-Time Unsupervised Motion Learning for AUVs Unsupervised Learning Robotics Marine Systems

During my Master's studies at FEUP, I worked as a researcher at the C2SR Lab, focusing on adaptive motion planning for coordinated fleets of Autonomous Underwater Vehicles (AUVs) in uncertain marine environments. My thesis introduced a real-time unsupervised learning framework based on Gaussian Mixture Models (GMMs), capable of incrementally estimating parameters without requiring a fixed number of clusters or a pre-existing dataset.

The algorithm was deployed in a leader-follower fleet structure, enabling AUVs to dynamically adjust their formation in response to local variations in Conductivity-Temperature-Depth (CTD) measurements. By exchanging GMM parameters during coordinated resurfacing, the system quantified distributional differences using variational distance and adapted formation geometry accordingly - allowing the fleet to “zoom in” on regions with high environmental variability. Extensive simulations confirmed the method’s robustness and efficiency across both uniform and complex oceanographic conditions.