Classical game theory assumes that players know the rules of the game, even if they do not know each other's moves. In many real adversarial settings, this assumption breaks down: an agent must act without knowing what options are even available to the opponent. This project studies zero-sum repeated games where each round involves a simultaneous move under imperfect information, states are fully revealed after each round, but the opponent's feasible action set is unknown at decision time and only revealed ex-post. This compound uncertainty, over both actions and the space of possible actions, falls outside the scope of both standard extensive-form game theory and classical online learning, creating a rich and largely unexplored theoretical frontier. The formal model we work with is called an Online Imperfect-Information Extensive-Form Game (OIIEFG), and the questions it raises are both theoretically deep and practically urgent.
The project draws on three converging lines of research. From extensive-form game theory, we inherit the tools of Counterfactual Regret Minimization (CFR) and the goal of converging to Nash and Subgame Perfect Nash Equilibria. From online learning and no-regret algorithms, we borrow Online Mirror Descent and EXP3 to handle the adversarial, sequential nature of the interaction. And from the recent literature on Stackelberg Security Games (Balcan et al. 2015, Blum et al. 2014), we take inspiration from how regret-based techniques handle unknown opponent behavior in repeated adversarial settings — while moving beyond the leader-follower Stackelberg structure to simultaneous-move games where both action sets and utilities are partially hidden. The central algorithmic contribution will be Online CFR with lazy infoset expansion (O-CFR), a new algorithm that expands the game tree incrementally as new opponent actions are discovered, with provable Revelation-Adjusted Regret guarantees and convergence to equilibrium.
Applications of this framework are wide-ranging. In adversarial cybersecurity, a defender must respond to an attacker whose exploit toolkit is revealed only through attempted intrusions. In operational planning, a decision-maker faces an adversary whose available maneuvers are unknown until observed in the field. In adaptive auction markets, a bidder does not know the full set of instruments available to competing algorithms. In each case, the key challenge is the same: learning to play well in a game whose rules are themselves being discovered through play. The project is funded by the ARG grant and is looking for motivated PhD and MS students who want to work at the intersection of theory, algorithms, and impactful applications.
🔬
Selected Open Problems from the Thesis Proposal
Algorithm Design. Design Online-CFR with lazy infoset expansion and prove Revelation-Adjusted Regret of O(|AT|·√T). Is this rate minimax-optimal?
Belief Concentration. How many rounds until the posterior over opponent action sets concentrates to mass ≥ 1–δ on the true set? Derive sample complexity bounds.
Complexity Theory. Is computing an ε-RF-NE in a two-player zero-sum OIIEFG PPAD-complete, or does zero-sum structure allow polynomial-time computation?
Exploration for Action Discovery. Design an explore-then-exploit policy achieving minimax-optimal sample complexity for revealing the opponent's full action set in a zero-sum game.
SPNE via Progressive Backward Induction. Does backward induction over a dynamically revealed game tree converge to Subgame Perfect NE, and at what rate?
Applications. Implement and validate O-CFR on adversarial network routing and security games. Measure convergence to RF-NE against CFR+ and DeepCFR baselines.
Commitment Without Regrets: Online Learning in Stackelberg Security Games
Balcan, Blum, Haghtalab, Procaccia — EC 2015
Establishes no-regret algorithms for a defender in repeated Stackelberg games where the attacker's utility function is unknown. The algorithmic template — OMD over strategy polytopes with adversarial feedback — directly informs O-CFR design.
Why read this: Closest existing work to our setting; understand the distinction between Stackelberg (leader-follower) commitment and the simultaneous setting we study.
EC 2015
Lazy Defenders Are Almost Optimal Against Diligent Attackers
Blum, Haghtalab, Procaccia — AAAI 2014
Shows that a defender who does not update strategy every round loses only a constant factor in utility. Motivates the lazy infoset expansion in O-CFR and gives intuition for why computational simplicity can preserve strategic optimality.
Why read this: Provides the theoretical justification for lazy update schedules, a core feature of the algorithms in this project.
AAAI 2014
Learning Optimal Commitment to Overcome Insecurity
Blum, Haghtalab, Procaccia — NeurIPS 2014
Studies a learning agent who must discover the optimal Stackelberg commitment through play, without knowing attacker utilities. Introduces UCB-style exploration of attacker best responses and achieves O(T−1/3) convergence.
Why read this: The exploration-of-opponent-responses problem here directly parallels the action-discovery exploration problem in our OIIEFG setting.
NeurIPS 2014
Regret Minimization in Games with Incomplete Information (CFR)
Zinkevich, Johanson, Bowling, Piccione — NeurIPS 2007
The foundational paper on Counterfactual Regret Minimization. Introduces the decomposition of regret across information sets and proves O(1/√T) convergence to NE in two-player zero-sum EFGs. Essential reading for understanding what O-CFR extends.
Why read this: This is the algorithm our project extends. You must understand it deeply before working on O-CFR.
NeurIPS 2007
Partial Monitoring — Classification, Regret Bounds, and Algorithms
Bartók, Foster, Pál, Rakhlin, Szepesvári — Mathematics of Operations Research, 2014
Classifies partial monitoring games by their minimax regret (O(√T) vs O(T2/3) vs linear). Our setting falls in the locally observable class; this paper is essential for understanding the information-theoretic limits we work within.
Why read this: Establishes the regret-theoretic landscape that bounds what our algorithms can achieve.
MOR 2014
Bandit Algorithms (Chapters 1–5, 30)
Lattimore & Szepesvári — Cambridge University Press, 2020
The standard reference for bandit and online learning theory. Chapters 1–5 cover the basics; Chapter 30 covers partial monitoring. Available free online at
tor-lattimore.com.
Why read this: Background reference for all algorithmic and regret-theoretic tools used in this project.
Textbook
🎬
CS 6001 / CS 405: Game Theory and Algorithmic Mechanism Design
Course by Prof. Swaprava Nath, IIT Bombay.
Full lecture videos, boardwork, and slides are publicly available.
Course webpage →
01–05Introduction to Game Theory, strategies, rationality, common knowledge, Normal Form Games
06–12Dominance, Nash Equilibrium, best response, maxmin strategies, matrix games
13–17Mixed Strategies, MSNE characterization theorem, algorithm to find MSNE, Nash's theorem
18–19Correlated Equilibrium — definition, example, comparison with NECore
20–21Extensive Form Games — formal definition, transformation to NFG, Subgame Perfect NECore
22Limitations of SPNE — centipede game, backward induction complexityCore
23IIEFG — definition, information sets, examplesCore
24–25Randomized & behavioral strategies in IIEFGs — equivalence conditionsCore
26Perfect recall — definition, Kuhn's theorem, implications for strategy equivalenceCore
27Equilibrium in IIEFGs — beliefs, sequential rationality, Perfect Bayesian Equilibrium (PBE)Core
29–31Bayesian Games — definition, strategy, equilibrium (BNE), equivalence with NECore
32Examples of Bayesian Equilibrium — first-price and second-price auctionsCore
28Game Theory in Practice — P2P file sharing and BitTorrent protocol
Weeks 7–12 cover mechanism design and auctions — useful background but not required for this project.
We strongly recommend the lecture notes and boardwork PDFs linked on the course page alongside the videos.
Interested? Get in Touch
Read at least one paper from the reading list and the course videos for Weeks 4–6 before reaching out.
In your email, briefly describe your background, which open problem excites you most, and attach your CV. Write to swaprava@cse.iitb.ac.in or sujoy@cse.iitb.ac.in or surajs@cse.iitb.ac.in.
Funding: ARG Grant — positions available for immediate start ·
Inquiries: include [OIIEFG Research] in subject line