ARG-Funded Research Opening

PhD & MS Positions in
Online Game Theory with Partial Observability

Designing algorithms for adversarial agents in zero-sum games where the opponent's available actions are unknown and revealed only through play.

Algorithmic Game Theory Online Learning Extensive-Form Games No-Regret Algorithms Stackelberg Security Games Belief Updating SPNE Computation
About the Project

Classical game theory assumes that players know the rules of the game, even if they do not know each other's moves. In many real adversarial settings, this assumption breaks down: an agent must act without knowing what options are even available to the opponent. This project studies zero-sum repeated games where each round involves a simultaneous move under imperfect information, states are fully revealed after each round, but the opponent's feasible action set is unknown at decision time and only revealed ex-post. This compound uncertainty, over both actions and the space of possible actions, falls outside the scope of both standard extensive-form game theory and classical online learning, creating a rich and largely unexplored theoretical frontier. The formal model we work with is called an Online Imperfect-Information Extensive-Form Game (OIIEFG), and the questions it raises are both theoretically deep and practically urgent.

The project draws on three converging lines of research. From extensive-form game theory, we inherit the tools of Counterfactual Regret Minimization (CFR) and the goal of converging to Nash and Subgame Perfect Nash Equilibria. From online learning and no-regret algorithms, we borrow Online Mirror Descent and EXP3 to handle the adversarial, sequential nature of the interaction. And from the recent literature on Stackelberg Security Games (Balcan et al. 2015, Blum et al. 2014), we take inspiration from how regret-based techniques handle unknown opponent behavior in repeated adversarial settings — while moving beyond the leader-follower Stackelberg structure to simultaneous-move games where both action sets and utilities are partially hidden. The central algorithmic contribution will be Online CFR with lazy infoset expansion (O-CFR), a new algorithm that expands the game tree incrementally as new opponent actions are discovered, with provable Revelation-Adjusted Regret guarantees and convergence to equilibrium.

Applications of this framework are wide-ranging. In adversarial cybersecurity, a defender must respond to an attacker whose exploit toolkit is revealed only through attempted intrusions. In operational planning, a decision-maker faces an adversary whose available maneuvers are unknown until observed in the field. In adaptive auction markets, a bidder does not know the full set of instruments available to competing algorithms. In each case, the key challenge is the same: learning to play well in a game whose rules are themselves being discovered through play. The project is funded by the ARG grant and is looking for motivated PhD and MS students who want to work at the intersection of theory, algorithms, and impactful applications.

Open Positions
🎓
PhD Position
  • Own the theoretical foundations: convergence proofs, regret bounds, complexity results
  • Develop and analyze Online CFR with lazy infoset expansion (O-CFR)
  • Characterize computational complexity of RF-NE and RF-SPNE
  • Establish minimax-optimal sample complexity bounds
  • Publish at AAMAS, IJCAI, EC, NeurIPS, or equivalent
💻
MS / Research Position
  • Implement O-CFR, CFR+, and Belief-BCBR algorithms
  • Build benchmarking suite on security games and network routing instances
  • Run experiments comparing convergence rates to known baselines
  • Develop simulation environments for adversarial action-revelation games
  • Co-author on experimental papers; thesis credit for strong contributions
What We Expect from Applicants
📋
Requirements & Profile
Must Have — Theory

Solid mathematical maturity: probability, linear algebra, convex analysis. Comfort with proof-writing and formal reasoning in algorithms or game theory.

Must Have — Coding

Strong programming skills in Python. Ability to implement and debug iterative algorithms, run experiments, and produce reproducible results.

Good to Have — Game Theory

Prior exposure to Nash equilibrium, extensive-form games, or mechanism design (e.g., via a course like CS 6001 / CS 405 at IIT Bombay).

Good to Have — ML/Optimization

Familiarity with online learning, bandit algorithms, or no-regret methods. Exposure to CFR or reinforcement learning is a strong plus.

For PhD Applicants Specifically
  • Curiosity-driven, self-motivated, comfortable with open-ended problems and long research cycles.
  • Ideally: read at least one paper from the reading list below before contacting us.
  • We value depth over breadth — focused expertise in one area of theory is more important than broad familiarity.
Concrete Research Problems to Work On
🔬
Selected Open Problems from the Thesis Proposal
Algorithm Design. Design Online-CFR with lazy infoset expansion and prove Revelation-Adjusted Regret of O(|AT|·√T). Is this rate minimax-optimal?
Belief Concentration. How many rounds until the posterior over opponent action sets concentrates to mass ≥ 1–δ on the true set? Derive sample complexity bounds.
Complexity Theory. Is computing an ε-RF-NE in a two-player zero-sum OIIEFG PPAD-complete, or does zero-sum structure allow polynomial-time computation?
Exploration for Action Discovery. Design an explore-then-exploit policy achieving minimax-optimal sample complexity for revealing the opponent's full action set in a zero-sum game.
SPNE via Progressive Backward Induction. Does backward induction over a dynamically revealed game tree converge to Subgame Perfect NE, and at what rate?
Applications. Implement and validate O-CFR on adversarial network routing and security games. Measure convergence to RF-NE against CFR+ and DeepCFR baselines.
Papers to Read Before Applying
📄
Core Reading List
Commitment Without Regrets: Online Learning in Stackelberg Security Games
Balcan, Blum, Haghtalab, Procaccia — EC 2015
Establishes no-regret algorithms for a defender in repeated Stackelberg games where the attacker's utility function is unknown. The algorithmic template — OMD over strategy polytopes with adversarial feedback — directly informs O-CFR design.
Why read this: Closest existing work to our setting; understand the distinction between Stackelberg (leader-follower) commitment and the simultaneous setting we study.
EC 2015
Lazy Defenders Are Almost Optimal Against Diligent Attackers
Blum, Haghtalab, Procaccia — AAAI 2014
Shows that a defender who does not update strategy every round loses only a constant factor in utility. Motivates the lazy infoset expansion in O-CFR and gives intuition for why computational simplicity can preserve strategic optimality.
Why read this: Provides the theoretical justification for lazy update schedules, a core feature of the algorithms in this project.
AAAI 2014
Learning Optimal Commitment to Overcome Insecurity
Blum, Haghtalab, Procaccia — NeurIPS 2014
Studies a learning agent who must discover the optimal Stackelberg commitment through play, without knowing attacker utilities. Introduces UCB-style exploration of attacker best responses and achieves O(T−1/3) convergence.
Why read this: The exploration-of-opponent-responses problem here directly parallels the action-discovery exploration problem in our OIIEFG setting.
NeurIPS 2014
Regret Minimization in Games with Incomplete Information (CFR)
Zinkevich, Johanson, Bowling, Piccione — NeurIPS 2007
The foundational paper on Counterfactual Regret Minimization. Introduces the decomposition of regret across information sets and proves O(1/√T) convergence to NE in two-player zero-sum EFGs. Essential reading for understanding what O-CFR extends.
Why read this: This is the algorithm our project extends. You must understand it deeply before working on O-CFR.
NeurIPS 2007
Partial Monitoring — Classification, Regret Bounds, and Algorithms
Bartók, Foster, Pál, Rakhlin, Szepesvári — Mathematics of Operations Research, 2014
Classifies partial monitoring games by their minimax regret (O(√T) vs O(T2/3) vs linear). Our setting falls in the locally observable class; this paper is essential for understanding the information-theoretic limits we work within.
Why read this: Establishes the regret-theoretic landscape that bounds what our algorithms can achieve.
MOR 2014
Bandit Algorithms (Chapters 1–5, 30)
Lattimore & Szepesvári — Cambridge University Press, 2020
The standard reference for bandit and online learning theory. Chapters 1–5 cover the basics; Chapter 30 covers partial monitoring. Available free online at tor-lattimore.com.
Why read this: Background reference for all algorithmic and regret-theoretic tools used in this project.
Textbook
Recommended Course & Video Lectures
🎬
CS 6001 / CS 405: Game Theory and Algorithmic Mechanism Design

Course by Prof. Swaprava Nath, IIT Bombay. Full lecture videos, boardwork, and slides are publicly available. Course webpage →

Weeks 1–3 Foundations — Normal Form Games, Equilibria, Mixed Strategies
01–05Introduction to Game Theory, strategies, rationality, common knowledge, Normal Form Games
06–12Dominance, Nash Equilibrium, best response, maxmin strategies, matrix games
13–17Mixed Strategies, MSNE characterization theorem, algorithm to find MSNE, Nash's theorem
Week 4 — Required Correlated Equilibrium & Extensive Form Games (Perfect Information)
18–19Correlated Equilibrium — definition, example, comparison with NECore
20–21Extensive Form Games — formal definition, transformation to NFG, Subgame Perfect NECore
22Limitations of SPNE — centipede game, backward induction complexityCore
Week 5 — Required Imperfect Information Extensive Form Games (IIEFG)
23IIEFG — definition, information sets, examplesCore
24–25Randomized & behavioral strategies in IIEFGs — equivalence conditionsCore
26Perfect recall — definition, Kuhn's theorem, implications for strategy equivalenceCore
Week 6 — Required Equilibrium in IIEFGs & Bayesian Games
27Equilibrium in IIEFGs — beliefs, sequential rationality, Perfect Bayesian Equilibrium (PBE)Core
29–31Bayesian Games — definition, strategy, equilibrium (BNE), equivalence with NECore
32Examples of Bayesian Equilibrium — first-price and second-price auctionsCore
28Game Theory in Practice — P2P file sharing and BitTorrent protocol

Weeks 7–12 cover mechanism design and auctions — useful background but not required for this project. We strongly recommend the lecture notes and boardwork PDFs linked on the course page alongside the videos.

Additional Resources
📚
Textbooks
  • Game Theory — Maschler, Solan, Zamir (Cambridge UP). Primary reference for EFG and Bayesian game foundations.
  • Multiagent Systems — Shoham & Leyton-Brown. Free online. Good for Chapter 5 on EFGs.
  • Bandit Algorithms — Lattimore & Szepesvári. Free PDF. Ch. 1–5 for online learning basics; Ch. 30 for partial monitoring.
  • Introduction to Online Convex Optimization — Hazan. Free PDF. Essential for OMD and regret theory.
🛠
Tools & Codebases to Explore
  • OpenSpiel (DeepMind) — open-source library for game-theoretic algorithms including CFR variants. Start here for implementation.
  • Gambit — software tools for computing equilibria in normal and extensive form games.
  • regretlib (Python) — lightweight regret minimization library useful for prototyping OMD and EXP3.
  • NetworkX — for constructing and analyzing adversarial routing game instances.

Interested? Get in Touch

Read at least one paper from the reading list and the course videos for Weeks 4–6 before reaching out. In your email, briefly describe your background, which open problem excites you most, and attach your CV. Write to swaprava@cse.iitb.ac.in or sujoy@cse.iitb.ac.in or surajs@cse.iitb.ac.in.

Funding: ARG Grant — positions available for immediate start  ·  Inquiries: include [OIIEFG Research] in subject line