PhD/MS Opening — Online Game Theory with Partial Observability

About the Project

Classical game theory assumes that players know the rules of the game, even if they do not know each other's moves. In many real adversarial settings, this assumption breaks down: an agent must act without knowing what options are even available to the opponent. This project studies zero-sum repeated games where each round involves a simultaneous move under imperfect information, states are fully revealed after each round, but the opponent's feasible action set is unknown at decision time and only revealed ex-post. This compound uncertainty, over both actions and the space of possible actions, falls outside the scope of both standard extensive-form game theory and classical online learning, creating a rich and largely unexplored theoretical frontier. The formal model we work with is called an Online Imperfect-Information Extensive-Form Game (OIIEFG), and the questions it raises are both theoretically deep and practically urgent.

The project draws on three converging lines of research. From extensive-form game theory, we inherit the tools of Counterfactual Regret Minimization (CFR) and the goal of converging to Nash and Subgame Perfect Nash Equilibria. From online learning and no-regret algorithms, we borrow Online Mirror Descent and EXP3 to handle the adversarial, sequential nature of the interaction. And from the recent literature on Stackelberg Security Games (Balcan et al. 2015, Blum et al. 2014), we take inspiration from how regret-based techniques handle unknown opponent behavior in repeated adversarial settings — while moving beyond the leader-follower Stackelberg structure to simultaneous-move games where both action sets and utilities are partially hidden. The central algorithmic contribution will be Online CFR with lazy infoset expansion (O-CFR), a new algorithm that expands the game tree incrementally as new opponent actions are discovered, with provable Revelation-Adjusted Regret guarantees and convergence to equilibrium.

Applications of this framework are wide-ranging. In adversarial cybersecurity, a defender must respond to an attacker whose exploit toolkit is revealed only through attempted intrusions. In operational planning, a decision-maker faces an adversary whose available maneuvers are unknown until observed in the field. In adaptive auction markets, a bidder does not know the full set of instruments available to competing algorithms. In each case, the key challenge is the same: learning to play well in a game whose rules are themselves being discovered through play. The project is funded by the ARG grant and is looking for motivated PhD and MS students who want to work at the intersection of theory, algorithms, and impactful applications.

Open Positions

🎓

PhD Position

Own the theoretical foundations: convergence proofs, regret bounds, complexity results
Develop and analyze Online CFR with lazy infoset expansion (O-CFR)
Characterize computational complexity of RF-NE and RF-SPNE
Establish minimax-optimal sample complexity bounds
Publish at AAMAS, IJCAI, EC, NeurIPS, or equivalent

💻

MS / Research Position

Implement O-CFR, CFR+, and Belief-BCBR algorithms
Build benchmarking suite on security games and network routing instances
Run experiments comparing convergence rates to known baselines
Develop simulation environments for adversarial action-revelation games
Co-author on experimental papers; thesis credit for strong contributions

What We Expect from Applicants

📋

Requirements & Profile

Must Have — Theory

Solid mathematical maturity: probability, linear algebra, convex analysis. Comfort with proof-writing and formal reasoning in algorithms or game theory.

Must Have — Coding

Strong programming skills in Python. Ability to implement and debug iterative algorithms, run experiments, and produce reproducible results.

Good to Have — Game Theory

Prior exposure to Nash equilibrium, extensive-form games, or mechanism design (e.g., via a course like CS 6001 / CS 405 at IIT Bombay).

Good to Have — ML/Optimization

Familiarity with online learning, bandit algorithms, or no-regret methods. Exposure to CFR or reinforcement learning is a strong plus.

For PhD Applicants Specifically

Curiosity-driven, self-motivated, comfortable with open-ended problems and long research cycles.
Ideally: read at least one paper from the reading list below before contacting us.
We value depth over breadth — focused expertise in one area of theory is more important than broad familiarity.

Concrete Research Problems to Work On

🔬

Selected Open Problems from the Thesis Proposal

Algorithm Design. Design Online-CFR with lazy infoset expansion and prove Revelation-Adjusted Regret of O(|A_T|·√T). Is this rate minimax-optimal?

Belief Concentration. How many rounds until the posterior over opponent action sets concentrates to mass ≥ 1–δ on the true set? Derive sample complexity bounds.

Complexity Theory. Is computing an ε-RF-NE in a two-player zero-sum OIIEFG PPAD-complete, or does zero-sum structure allow polynomial-time computation?

Exploration for Action Discovery. Design an explore-then-exploit policy achieving minimax-optimal sample complexity for revealing the opponent's full action set in a zero-sum game.

SPNE via Progressive Backward Induction. Does backward induction over a dynamically revealed game tree converge to Subgame Perfect NE, and at what rate?

Applications. Implement and validate O-CFR on adversarial network routing and security games. Measure convergence to RF-NE against CFR+ and DeepCFR baselines.

Papers to Read Before Applying

📄

Core Reading List

Commitment Without Regrets: Online Learning in Stackelberg Security Games

Balcan, Blum, Haghtalab, Procaccia — EC 2015

Establishes no-regret algorithms for a defender in repeated Stackelberg games where the attacker's utility function is unknown. The algorithmic template — OMD over strategy polytopes with adversarial feedback — directly informs O-CFR design.

Why read this: Closest existing work to our setting; understand the distinction between Stackelberg (leader-follower) commitment and the simultaneous setting we study.

EC 2015

Lazy Defenders Are Almost Optimal Against Diligent Attackers

Blum, Haghtalab, Procaccia — AAAI 2014

Shows that a defender who does not update strategy every round loses only a constant factor in utility. Motivates the lazy infoset expansion in O-CFR and gives intuition for why computational simplicity can preserve strategic optimality.

Why read this: Provides the theoretical justification for lazy update schedules, a core feature of the algorithms in this project.

AAAI 2014

Learning Optimal Commitment to Overcome Insecurity

Blum, Haghtalab, Procaccia — NeurIPS 2014

Studies a learning agent who must discover the optimal Stackelberg commitment through play, without knowing attacker utilities. Introduces UCB-style exploration of attacker best responses and achieves O(T^−1/3) convergence.

Why read this: The exploration-of-opponent-responses problem here directly parallels the action-discovery exploration problem in our OIIEFG setting.

NeurIPS 2014

Regret Minimization in Games with Incomplete Information (CFR)

Zinkevich, Johanson, Bowling, Piccione — NeurIPS 2007

The foundational paper on Counterfactual Regret Minimization. Introduces the decomposition of regret across information sets and proves O(1/√T) convergence to NE in two-player zero-sum EFGs. Essential reading for understanding what O-CFR extends.

Why read this: This is the algorithm our project extends. You must understand it deeply before working on O-CFR.

NeurIPS 2007

Partial Monitoring — Classification, Regret Bounds, and Algorithms

Bartók, Foster, Pál, Rakhlin, Szepesvári — Mathematics of Operations Research, 2014

Classifies partial monitoring games by their minimax regret (O(√T) vs O(T^2/3) vs linear). Our setting falls in the locally observable class; this paper is essential for understanding the information-theoretic limits we work within.

Why read this: Establishes the regret-theoretic landscape that bounds what our algorithms can achieve.

MOR 2014

Bandit Algorithms (Chapters 1–5, 30)

Lattimore & Szepesvári — Cambridge University Press, 2020

The standard reference for bandit and online learning theory. Chapters 1–5 cover the basics; Chapter 30 covers partial monitoring. Available free online at tor-lattimore.com.

Why read this: Background reference for all algorithmic and regret-theoretic tools used in this project.

Textbook

Recommended Course & Video Lectures

🎬

CS 6001 / CS 405: Game Theory and Algorithmic Mechanism Design

Course by Prof. Swaprava Nath, IIT Bombay. Full lecture videos, boardwork, and slides are publicly available. Course webpage →

Weeks 1–3 Foundations — Normal Form Games, Equilibria, Mixed Strategies

01–05Introduction to Game Theory, strategies, rationality, common knowledge, Normal Form Games

06–12Dominance, Nash Equilibrium, best response, maxmin strategies, matrix games

13–17Mixed Strategies, MSNE characterization theorem, algorithm to find MSNE, Nash's theorem

Week 4 — Required Correlated Equilibrium & Extensive Form Games (Perfect Information)

18–19Correlated Equilibrium — definition, example, comparison with NECore

20–21Extensive Form Games — formal definition, transformation to NFG, Subgame Perfect NECore

22Limitations of SPNE — centipede game, backward induction complexityCore

Week 5 — Required Imperfect Information Extensive Form Games (IIEFG)

23IIEFG — definition, information sets, examplesCore

24–25Randomized & behavioral strategies in IIEFGs — equivalence conditionsCore

26Perfect recall — definition, Kuhn's theorem, implications for strategy equivalenceCore

Week 6 — Required Equilibrium in IIEFGs & Bayesian Games

27Equilibrium in IIEFGs — beliefs, sequential rationality, Perfect Bayesian Equilibrium (PBE)Core

29–31Bayesian Games — definition, strategy, equilibrium (BNE), equivalence with NECore

32Examples of Bayesian Equilibrium — first-price and second-price auctionsCore

28Game Theory in Practice — P2P file sharing and BitTorrent protocol

Weeks 7–12 cover mechanism design and auctions — useful background but not required for this project. We strongly recommend the lecture notes and boardwork PDFs linked on the course page alongside the videos.

Additional Resources

📚

Textbooks

Game Theory — Maschler, Solan, Zamir (Cambridge UP). Primary reference for EFG and Bayesian game foundations.
Multiagent Systems — Shoham & Leyton-Brown. Free online. Good for Chapter 5 on EFGs.
Bandit Algorithms — Lattimore & Szepesvári. Free PDF. Ch. 1–5 for online learning basics; Ch. 30 for partial monitoring.
Introduction to Online Convex Optimization — Hazan. Free PDF. Essential for OMD and regret theory.

🛠

Tools & Codebases to Explore

OpenSpiel (DeepMind) — open-source library for game-theoretic algorithms including CFR variants. Start here for implementation.
Gambit — software tools for computing equilibria in normal and extensive form games.
regretlib (Python) — lightweight regret minimization library useful for prototyping OMD and EXP3.
NetworkX — for constructing and analyzing adversarial routing game instances.

Interested? Get in Touch

Read at least one paper from the reading list and the course videos for Weeks 4–6 before reaching out. In your email, briefly describe your background, which open problem excites you most, and attach your CV. Write to swaprava@cse.iitb.ac.in or sujoy@cse.iitb.ac.in or surajs@cse.iitb.ac.in.

Funding: ARG Grant — positions available for immediate start · Inquiries: include [OIIEFG Research] in subject line

PhD & MS Positions inOnline Game Theory with Partial Observability

Interested? Get in Touch

PhD & MS Positions in
Online Game Theory with Partial Observability