Nusrat Lia

I picked this photo since it has Schrödinger’s dataset vibes: simultaneously well-lit and shadowed entries.

Hi, I'm Lia

I am a researcher and AI engineer working in natural language processing, human-centered applications and secured decentralized systems, with experience in large-scale software and LLM development

Currently, I am a Research Intern at Aramco-Ithra, collaborating with global institutions including WHO, UN, Stony Brook Medicine, University of Washington, University of Geneva, Western University, University of Tokyo and research institutes from 35 countries. Previously, I worked with the United States Department of Justice - ICITAP, designed a platform for secure crowdsourced wildlife crime reporting in low-connectivity areas, leveraging custom NLP pipelines, geospatial and predictive models to analyze environmental and crime data. There, I've worked on a gaming application to educate International Youth about wildlife crime and biodiversity conservation.

I am currently a final year software engineering undergraduate student at University of Dhaka where I work in BARTA Lab. There, I focus on low-resource and small-language-model development, design datasets, techniques, and educational resources . I also serve as an instructor at BARTA, where I teach language model building course. At BanglaLLM, I work with amazing researchers and developers building open-source language models for low-resource Bangla language. This year, I am also serving as an Instructor for International AI Olympiad, teaching AI Recommender Systems.

As a Contractual LLM Engineer at Global MicroLearning Solutions, I am designing and deploying large-scale LLM solutions that support engineering teams in the field with intelligent, context-aware systems.

Entrepreneurially, I am a founding researcher of Perspectivity - Drishtikon, the first real-time AI news aggregator for Bangla, featuring multi-axis bias detection, news summarization, and interactive bots that empower citizens with nuanced, research-backed insights.

And... I paint. Some like to call me an artist but I am just someone who expresses this way.

Courses & Teaching

Educational courses I designed, developed and serving as an instructor.

Building Small Language Model: From Foundations to Bangla Financial Text Generation

Building Small Language Model: From Foundations to Bangla Financial Text Generation

Learn the core principles and techniques of language models while building a small Bangla language model that generates financial articles. The course is taken offline at the Institute of Information Technology, University of Dhaka.

Module 1: Introduction to Language Models

History of NLP • Transformer Architecture • Tokenization

Module Slides:
Module 1: Introduction to Language Models

Module 2: Data Preparation Pipeline

Text Preprocessing • Underfitting, Overfitting & Just-Right Fitting • Tokenization Fundamentals • Bangla Tokenization Challenges

Module Slides:
Module 2: Data Preparation Pipeline

Module 3: Transformer Architecture

Token and positional embeddings • Self-attention mechanism with causal masking • Multi-Head Attention • Cross-Attention

Module Slides:
Module 3: Transformer Architecture

Module 4: Model Components

Feed-Forward Networks • Multi-Layer Perceptrons (MLPs) • Forward Pass • Gradient Explosion / Vanishing

Module Slides:
Module 4: Model Components

Module 5: Training, Evaluation, Generation

AdamW Optimizer Configuration • Learning Rate Scheduling • Gradient Clipping • Training Loop Implementation • Autoregressive Decoding • Temperature Tuning

Module Slides:
Module 5: Training, Evaluation, Generation

Research

Read Between the Lines: A Benchmark for Uncovering Political Bias in Bangla News Articles

Read Between the Lines: A Benchmark for Uncovering Political Bias in Bangla News Articles

Nusrat Jahan Lia; Shubhashis Roy Dipta, PhD; Dr. Abdullah Khan Zehady; Naymul Islam; Madhusodan Chakraborty; Abdullah Al Wasif
Associated Organizations:
University of Dhaka
University of Maryland
Perspectivity
Accepted: AACL IJCNLP BLP; To be published: ACL Anthology2025
Exploring Cross-Lingual Knowledge Transfer via Transliteration-Based MLM Fine-Tuning for Critically Low-resource Chakma Language

Exploring Cross-Lingual Knowledge Transfer via Transliteration-Based MLM Fine-Tuning for Critically Low-resource Chakma Language

Adity Khisa; Nusrat Jahan Lia; Tasnim Mahfuz Nafis; Zarif Masud; Tanzir Pial, PhD; Dr.Shebuti Rayana; Dr.Ahmedul Kabir
Associated Organizations:
University of Dhaka
BARTA
State University of New York, Old Westbury
Stony Brook University
Toronto Metropolitan University
Accepted: AACL IJCNLP BLP; To be published: ACL Anthology2025
Adult Attitudes about School Smartphone Bans: A Global Survey of 35 Countries

Adult Attitudes about School Smartphone Bans: A Global Survey of 35 Countries

Dimitri A. Christakis, MD, MPH; Nusrat Jahan Lia; Lauren Hale, PhD; Md Mamunur Rashid
Associated Organizations:
Renaissance School of Medicine, Stony Brook University
Seattle Children's Research Institute, Seattle Children's Hospital
University of Dhaka
ITHRA, King Abdulaziz Center for World Culture, Dammam
Accepted, in press: The Journal of American Medical Association2025
Does Gaming Disorder Symptom Status Predict Poorer Sleep Quality?

Does Gaming Disorder Symptom Status Predict Poorer Sleep Quality?

Nusrat Jahan Lia; Lauren Hale, PhD; Justin Thomas, PhD; Dimitri A. Christakis, MD, MPH; Mamunar Rashid, PhD
Associated Organizations:
University of Dhaka
Renaissance School of Medicine, Stony Brook University
Seattle Children's Research Institute, Seattle Children's Hospital
University of Dhaka
ITHRA, King Abdulaziz Center for World Culture, Dammam
Accepted: World Sleep 2025, Singapore2025
Does Spending "Too Much Time Online" Predict Sleep Health and Mental Health?

Does Spending "Too Much Time Online" Predict Sleep Health and Mental Health?

Lauren Hale, PhD; Nusrat Jahan Lia; Sohailul Islam Alvi; Gina Marie Mathew, PhD; Dimitri A. Christakis, MD, MPH; Mamunar Rashid, PhD; Yasmin Aljedawi; Melisa Valle, PhD
Associated Organizations:
Renaissance School of Medicine, Stony Brook University
Seattle Children's Research Institute, Seattle Children's Hospital
University of Dhaka
ITHRA, King Abdulaziz Center for World Culture, Dammam
Accepted: Association of Professional Sleep Societies. Seattle, Washington, USA2025
International Public Opinion on Digital Media Use for Youth and Schools

International Public Opinion on Digital Media Use for Youth and Schools

Lauren Hale, PhD; Nusrat Jahan Lia; Sohailul Islam Alvi; Gina Marie Mathew, PhD; Dimitri A. Christakis, MD, MPH; Mamunar Rashid, PhD; Yasmin Aljedawi; Melisa Valle, PhD
Associated Organizations:
Renaissance School of Medicine, Stony Brook University
Seattle Children's Research Institute, Seattle Children's Hospital
University of Dhaka
ITHRA, King Abdulaziz Center for World Culture, Dammam
Accepted: Digital Media and Developing Minds International Scientific Congress, Washington DC2025

Evaluating the inclusivity and accessibility of educational apps (games) on the Google Play Store.

Nusrat Jahan Lia; Nahida Sultana; Sabrina Shajin Alam, PhD; Mamunar Rashid, PhD; Aymaan Islam
Associated Organizations:
University of Dhaka
Western University, CA
ITHRA, King Abdulaziz Center for World Culture, Dammam
Reviewed; In Progress: American Educational Research Association (AERA)2026

A Comprehensive Evaluation of the Educational Apps in the Google Play Store: An Exploratory Study

Nahida Sultana; Sabrina Shajin Alam, PhD; Nusrat Jahan Lia; Mamunar Rashid, PhD; Aymaan Islam
Associated Organizations:
University of Dhaka
Western University, CA
ITHRA, King Abdulaziz Center for World Culture, Dammam
Reviewed; In Progress: American Educational Research Association (AERA)2026

Work Experience

Aramco-Ithra
Academic
Current
Research Intern
Aramco-Ithra
Mar 2025 - Present
  • Led projects and worked in collaboration with WHO, Stony Brook Medicine, McGill University, University of Geneva, University of Tokyo and other institutions (from around 35 countries)
  • Engineered Knowledge Graphs integrating worldwide data on Digital Health and Technology Usage to enable semantic analysis and cross-country insights.
  • Developed coding schemes and agentic LLMs to evaluate educational games on the Google Play Store.
  • Co-authored 6 researches on digital well-being, education and technology usage.
International AI Olympiad
Academic
Current
Instructor: AI Recommender Systems
International AI Olympiad
Sep 2025 - Present
  • Teaching: Covers collaborative and content-based recommendation systems, including similarity metrics, feature engineering, hybrid methods, matrix factorization, and deep learning approaches for personalized recommendations.
Global MicroLearning Solutions
Industrial
Current
Contractual LLM Engineer
Global MicroLearning Solutions
Aug 2025 - Present
  • Building large scale LLM and AI solutions for field support and engineering solutions
United States Department of Justice - ICITAP
Government-Affiliated
WPA Software Engineer Intern
United States Department of Justice - ICITAP
Jul 2024 - Dec 2024
  • Designed a mobile-first, crowdsourced wildlife crime reporting platform tailored for rural and low-connectivity environments in the Sundarbans.
  • Handled sparse and noisy community reports by developing custom NLP pipelines and geospatial models optimized for low-resource inputs
  • Leveraged machine learning to analyze spatial crime data and forecast environmental degradation hotspots
  • Designed and developed a gaming application to educate Bangladeshi and International Youth about wildlife and biodiversity conservation and emphasize long-term stewardship ethics
Bangla Artificial Intelligence Research, Tools and Application (BARTA)
Academic
Current
Undergrad Researcher
Bangla Artificial Intelligence Research, Tools and Application (BARTA)
Oct 2024 - Present
  • Co-authored the first paper on Chakma-language Knowledge transfer using MLM. Developing dataset and techniques for indigenous language (like Chakma) models.
  • Designed and directed Small-language-model building course as an instructor of BARTA
  • Developed Educational Resource Allocation AI-Agent for the Government of Bangladesh.
Perspectivity - Drishtikon
Entrepreneurial
Current
Founding Researcher
Perspectivity - Drishtikon
Sep 2024 - Present
  • The first News Aggregation AI agent for Bangla news with the plan of future expansion to other low-resource languages
  • Has research-backed multi-axis bias-analysis to empower citizens to make informed decisions
  • Built in news-summarizer agent and interactivechatbot to know about news in detail
  • Shows local and international news trends in real-time
BanglaLLM
Academic
Current
Member
BanglaLLM
Jul 2024 - Present
  • BanglaLLM introduced many of the first open-source bangla language models

Selected Projects

8 projects found
Bangla - Small Language Model
completed
Bangla - Small Language Model
AI/ML
2025
A small, GPT-style Bangla language model built and trained from scratch on financial news articles using custom tokenization and efficient dataset preprocessing.
LLM
PyTorch
Hugging Face
Llama-3.2-Medico-BD
completed
Llama-3.2-Medico-BD
AI/ML
2024
Fine-tuned version of Llama-3.2-3B for medical product information retrieval in Bangladesh, optimized for brand/generic medicine recommendations and dosage guidance.
LLM
Fine-Tuning
QLoRA
PorTech: A Blockchain-based shipping supply chain solution
completed
PorTech: A Blockchain-based shipping supply chain solution
Blockchain
2024
A hybrid blockchain-based platform that optimizes foreign trade and shipping supply chain document workflows, providing seamless, secure transactions, precise access control, real-time tracking, and enhanced efficiency.
Zero-Knowledge Proof
IPFS
Solidity
+1 more
DefTax: A Blockchain-based E-Governance System
completed
DefTax: A Blockchain-based E-Governance System
Blockchain
2023
A hybrid Blockchain-based Government-to-Corporation integrated transaction tracking e-governance system with fine-grained access control that facilitates secure and accurate tax collection.
Hyperledger Fabric
Zero-Knowledge Proof
Hyperledger Indy
+3 more
StackOverflow-Lite: A Microservices-based StackOverflow Clone
completed
StackOverflow-Lite: A Microservices-based StackOverflow Clone
Web Development
2024
A full stack implementation of StackOverflow, decompose to microservices, containerized and orchestrated
React
FastAPI
Python
+6 more
MEG: Mars Exploration Game
completed
MEG: Mars Exploration Game
EdTech
AI/ML
Game Development
VR/AR
2024
An immersive 3D game that simulates Mars exploration with realistic physics and educational content along with real-time weather updated, predicted using ML models.
Unity
C#
Python
+2 more
Kook-wa: 3D Action-Adventure Game
completed
Kook-wa: 3D Action-Adventure Game
Game Development
2024
A 3D game built in Unity, where players explore a procedurally generated mystical island filled with AI-driven enemies, environmental challenges, and magical artifacts.
Unity
C#
3D Rendering
+2 more
BlockChainEd: Educational Blockchain Implementation
completed
BlockChainEd: Educational Blockchain Implementation
Blockchain
EdTech
Systems Programming
2023
An educational project that empowers users to create and explore their own blockchain networks.
C++
Cryptography
Data Structures

Blogs

Just my random thoughts, opinions, curiosities, and questions. My blogs are pretty conversational... I write just how I would speak. It's a Rubber-ducking session to me.

The Art of Knowing When to Stop: Early Stopping in AI and Life

Consider stopping soon. How many times have we all needed that exact warning in our lives?

Aug 20, 2025Read More
The Canvas of Resistance: On Differentiation, Algorithms, and the Mathematics of Justice

Thinking aloud: in a world that systematically flattens difference into hierarchy, what is means to be a differentiator?

Aug 7, 2025Read More
Breaking Down Language Barriers: How AI Can Learn to Fix Bangla Grammar

Exploring how synthetic data and AI can bridge the grammar gap for Bangla speakers.

Aug 3, 2025Read More
How AI is Learning to Spot Bias (And Why It Matters More Than Ever)

We’re living through what researchers call “hyperpartisan” news : content written with such extreme ideological manipulation that it barely resembles reality.

Jul 24, 2025Read More
Can Diffusion Models Reshape Privacy Boundaries?

Exploring How Diffusion Models Challenge and Redefine Privacy in AI-Generated Data

Mar 12, 2025Read More

Life Events

A limited subset of spatiotemporal phenomena was logged and archived.