INSTITUTIONAL REPOSITORY
Welcome to the Institutional Repository (IR) of the Indian Statistical Institute (ISI). You can find articles published by researchers of the Institute, It also preserves and enables access to many other digital contents including Dissertation theses, Convocation addresses, Question papers, official records and the collections of special mention. However, you can request us to get the restricted materials you need for your research and development.

Communities in DSpace
Select a community to browse its collections.
Recent Submissions
On the Jordan-Chevalley-Dunford Decomposition of Certain Classes of Operators and Convergence of Their Normalized Power Sequences
(Indian Statistical Institute, 2026-02-25) Shekhawat, Renu
The classical Jordan–Chevalley decomposition expresses a matrix A ∈ Mn(C) as a unique commuting sum A = D + N, where D is diagonalizable and N is nilpotent. Although this decomposition is algebraic in origin, it encodes significant spectral information and, as shown by Nayak, has an important analytic consequence: the convergence of the normalized power sequence
{|A^n|^ 1/n }n∈N ; |A| := (A∗A)^1/2 . In this thesis we study Jordan-Chevalley–type decompositions in infinite-dimensional settings and their connection with the convergence behaviour of normalized power sequences. In particular,
we discuss this phenomenon for Dunford’s spectral operators and compact operators on a complex Hilbert space, and further extend the theory to operators affiliated with finite type I von Neumann algebras.
Innovations in Graph Neural Network Design: Addressing Oversmoothing, Heterophily, and Information Propagation
(Indian Statistical Institute, 2026-04-13) Bose, Kushal
In an unstructured learning paradigm, Graph Neural Networks (GNNs) adeptly tackle graph data like social networks, molecules, transaction networks, etc. In the primitive stage, GNNs are designed to be shallow, comprising two or three layers. Emulating the success of deep CNNs, deep GNNs are also proposed by stacking multiple layers. Those multi-layered GNNs are pivotal in enabling long-range interactions where multi-hop neighbors carry significant information, like molecular property prediction. Yet, the multi-layered GNNs face challenges of Oversmoothing, where node features become indistinguishable due to the recursive nature of message passing. In the second chapter of the thesis, we propose a non-recursive message passing technique to address oversmoothing. Our method explores random paths and computes path features, and those are subsequently aggregated to update the node features. The multi-hop message passing also depends on the homophily or heterophily settings of the network. GNNs typically perform better in homophilic settings where adjacent nodes share identical class labels. Conversely, the performance of GNNs is exacerbated in the heterophilic networks where adjacent nodes may have different class labels. In the third chapter of the thesis, we address the challenges of graph heterophily by rewiring the graph topology. We learn the similarity scores of the edges obtained from the autoencoder-based class representations. The impressive performances on heterophilic benchmarks reaffirm the superiority of our approach. We also study the effects of rewiring special edges like self-loops and parallel edges. In the fourth chapter of the thesis, we investigate the effects of the addition of self-loops and parallel edges on the eigenvalues of the graph Laplacian. Empirically, we observe that the gradual addition of self-loops or parallel edges generates performance trends (either increasing or decreasing) on the heterophilic graphs. This work offers insights into the graph spectrum based on the observed performance trends, bypassing the need to execute expensive eigenvalue decomposition. The deep GNNs also suffer from Oversquashing, an information bottleneck arises due to the requirement of storing exponentially growing information into fixed capacity channels. In the fifth chapter of the thesis, we propose asynchronous message passing to utilize fixed-capacity channels in a time-dependent access. This prevents the capacity constraints and ultimately overcomes oversquashing. We achieved commendable performances on the REDDIT-BINARY and Peptides-struct datasets. To mitigate both oversmoothing and oversquashing, Graph Transformers (GTs) come into the scenario to enable pair-wise message passing across the network. Precisely, GT incorporates structural information of the underlying graph datasets via positional encodings. In the sixth chapter of the thesis, we designed a novel and efficient positional encoding that is learnable and maps the encodings into hyperbolic spaces. Our positional encodings are expressive and efficiently capture hierarchical structures embedded in the molecular graphs, which is validated by extensive theoretical underpinnings. We further demonstrate that hyperbolic positional encodings, when added with features in final layers, diminish the effects of oversmoothing. We achieved superior performance on MNIST and OGBG-MOLHIV graphs by employing hyperbolic positional encodings. In the seventh chapter of the thesis, we shed light on the potential future research avenues and scope in the domain of GNN.
Essays on Monetary-Fiscal Interactions in Emerging Market and Developing Economies
(Indian Statistical Institute, 2025-07-17) Bahl, Ojasvita
This thesis contains three chapters on monetary-fiscal interactions in Emerging Market and Developing Economies. Governments in emerging markets and developing economies (EMDEs) frequently intervene in agricultural markets to stabilize food prices following adverse shocks. These interventions often take the form of large-scale food procurement and redistribution, which we define as a redistributive policy shock. This chapter examines the effects of such shocks on inflation and the distribution of consumption between rich and poor households. We develop a tractable two-sector, two-agent New Keynesian DSGE model and estimate its parameters for the Indian economy using Bayesian methods. Our findings reveal that under an inflation-targeting regime, consumer heterogeneity plays a crucial role in determining whether monetary policy responses to various shocks enhance or reduce aggregate welfare. The second chapter evaluates the welfare implications of redistributive policy shocks under alternative monetary policy regimes. Building on Chapter 1, which finds that redistributive policy shocks are inflationary and expansionary in terms of aggregate output, we assess how different monetary responses alter welfare outcomes. Following Schmitt-Grohe Uribe (2007), we compute consumptionequivalent welfare gains to compare the welfare cost of these shocks under the optimised simple monetary rule and the planner’s solution (Ramsey Optimal Monetary Policy). The optimal rule features no interest rate smoothing, a strong response to inflation, and a limited reaction to output. Our findings demonstrate the critical role of monetary policy in shaping the welfare impact of redistributive shocks. We further compare these welfare effects to those of an agricultural productivity shock and show that the steady-state level of redistribution significantly affects the relative costs of redistribution-driven fluctuations. We find that non-optimised rules lead to significantly higher welfare costs than optimised simple rules. In the third chapter, we study the interactions between informality, underdeveloped financial markets and fiscal consolidation by developing a two-sector, twoagent medium-scale NK-DSGE model that allows public expenditure and private consumption to be either substitutes or complements. While there is a large literature that tries to understand the effects of fiscal consolidation in AEs, there is a relatively small literature on fiscal consolidation in EMDEs. We find that greater informality dampens the reduction in public debt from a contractionary fiscal policy shock. We find tax-based shocks to exhibit greater decline in debt at the cost of a greater contraction in output than spending-based shocks. Our analysis suggests that a fiscal consolidation shock can be expansionary when private consumption and public spending exhibit moderately-high substitutability consistent with the literature on expansionary fiscal consolidations.
Flexible Modeling of non-Gaussian Longitudinal Data: Some Approaches using Copula
(Indian Statistical Institute, Kolkata, 2026-03-16) Chattopadhyay, Subhajit
Longitudinal data are common in medical and biological sciences, where measurements are gathered
from subjects over time to explore relationships with explanatory variables (covariates) and to uncover
the underlying mechanisms of dependence among these measurements. The responses observed at each instance can be either discrete or continuous. One of the primary challenges in longitudinal data analysis lies in the non-Gaussian nature of the response variables. As a result, there are relatively few multivariate models in the literature that effectively address the specific characteristics observed in such datasets. In this dissertation, we address four problems concerning longitudinal data analysis by developing new statistical models. These models specifically address the time-related relationships found in various types of non-Gaussian longitudinal data by employing suitable classes of parametric copulas. In the third chapter of this dissertation, we examine a motivating dataset from a recent HIV-AIDS study conducted in Livingstone district, Zambia. The histogram plots of the repeated measurements at each time point reveal asymmetry in the marginal distributions, and pairwise scatter plots uncover nonelliptical dependence patterns. Traditional linear mixed models, typically used for longitudinal data, struggle to capture these complexities effectively. We introduced skew-elliptical copula based mixed models to analyze this continuous data, where we use generalized linear mixed models (GLMM) for the marginals (e.g., Gamma mixed model), and address the temporal dependence of repeated measurements by utilizing copulas associated with skew-elliptical distributions (such as skew-normal/skew-t). The proposed class of copula-based mixed models addresses asymmetry, between-subject variability, and non-standard temporal dependence simultaneously, thereby extending beyond the limitations of standard linear mixed models based on multivariate normality. We estimate the model parameters using the IFM (inference function of margins) method, and outline the procedure for obtaining standard errors of the parameter estimates. To evaluate the performance of this approach under finite sample conditions, rigorous simulation studies are conducted, encompassing skewed and symmetric marginal distributions along with various copula selections. Finally, we apply these models to the HIV dataset and present the insight gained from the analysis. In the fourth chapter of this dissertation, we introduce factor copula models tailored for unbalanced non-Gaussian longitudinal data. Modeling the joint distribution of such data, where subjects may have varying numbers of repeated measurements and responses can be continuous or discrete, poses practical challenges, especially with numerous measurements per subject. Factor copula models, which are canonical vine copulas, leverage latent variables to elucidate the underlying dependence structure of multivariate data. This approach aids in interpretation and implementation for unbalanced longitudinal datasets, enhancing our ability to model complex dependencies effectively. We develop regression models for continuous, binary and ordinal longitudinal data, incorporating covariates, using factor copula constructions with subject-specific latent variables. With consideration for homogeneous within-subject dependence, the proposed models enable feasible parametric inference in moderate to high dimensional scenarios, employing a two-stage (IFM) estimation method. We also present a method for evaluating the residuals of factor copula models to visually assess the goodness of fit. The performance of the proposed models in finite samples is assessed through extensive simulation studies. In empirical analyses, we apply these models to analyze various longitudinal responses from two real-world datasets. Furthermore, we compare the performance of these models with widely used random effects models using standard selection techniques, revealing significant improvements. Our findings suggest that factor copula models can serve as viable alternatives to random effect models, offering deeper insights into the temporal dependence of longitudinal data across diverse contexts.
In the fifth chapter of this dissertation, we address the issue of modeling complex and hidden temporal
dependence of count longitudinal data. Multivariate elliptical copulas are typically preferred in statistical literature to analyze dependence between repeated measurements of longitudinal data since they allow for different choices of the correlation structure. But these copulas lack in flexibility to model dependence and inference is only feasible under parametric restrictions. In this chapter, we propose the
use of finite mixtures of elliptical copulas to enhance the modeling of temporal dependence in discrete
longitudinal data. This approach enables the utilization of distinct correlation matrices within each component of the mixture copula. We theoretically explore the dependence properties of finite mixtures of copulas before employing them to construct regression models for count longitudinal data. Inference for this proposed class of models is based on a composite likelihood approach, and we evaluate the finite sample performance of parameter estimates through extensive simulation studies. To validate the fitting of the proposed models, we extend traditional techniques and introduce the t-plot method to accommodate finite mixtures of elliptical copulas. Finally we apply the proposed models to analyze the temporal dependence within two real-world count longitudinal datasets and demonstrate their superiority over standard elliptical copulas. In the final contributing chapter of this dissertation, we introduce a novel multivariate copula based on the multivariate geometric skew-normal (GSN) distribution. This asymmetric copula serves as an alternative to the skew-normal copula proposed by Azzalini. Unlike the standard skew-normal copula, the multivariate GSN copula retains closure properties under marginalization, which offers computational advantages for modeling multivariate discrete data. In this chapter, we outline the construction of the geometric skew-normal copula and its application in modeling the temporal dependence observed in non-Gaussian longitudinal data. We begin by exploring the theoretical properties of the proposed multivariate copula. Subsequently, we develop regression models tailored for both continuous and discrete longitudinal data using this innovative framework. Notably, the quantile function of this copula remains independent of the correlation matrix of its respective multivariate distribution, offering computational advantages in likelihood inference compared to copulas derived from skew-elliptical distributions proposed by Azzalini. Furthermore, composite likelihood inference becomes feasible for this multivariate copula, allowing for parameter estimation from ordered probit models with the same dependence structure as the geometric skew-normal distribution. We conduct extensive simulation studies to validate the geometric skew-normal copula based models and apply them to analyze the longitudinal dependence of two real-world data sets. Finally, We present our findings in terms of the improvements over regression models based on multivariate Gaussian copulas.
Development of Some Scalable Pattern Recognition Algorithms for Real Life Data Analysis
(2017-11-20) Garai, Partha
A huge amount of data is being generated continuously as a result of recent advancement
and wide use of high-throughput technologies. With the rapid increase in size of data
distributed worldwide, understanding the data has become critical. In this regard, dimensionality
reduction and clustering have become the necessary preprocessing steps of
multiple research areas and applications. One of the important problems of real life large
data sets is uncertainty. Some of the sources of this uncertainty include imprecision in
computation and vagueness in class denitions. The uncertainty may also be present in
the denition of class membership function.
In this background, the thesis addresses the problem of dimensionality reduction and
clustering of real life data sets, in the presence of noise and uncertainty. The thesis rst
presents the problem of feature selection using both type-1 and interval type-2 fuzzyrough
sets, which are eective for dimensionality reduction of real life data sets when
uncertainty is present in the data set. The properties of fuzzy-rough sets allow greater
exibility in handling noisy and real valued data. While the concept of lower approximation
and boundary region of rough sets deals with uncertainty, incompleteness, and vagueness
in class denition, the use of either type-1 or interval type-2 fuzzy sets enables ecient
handling of overlapping classes in uncertain environment. Moreover, a new concept of
\simultaneous attribute selection and feature extraction" is introduced for dimensionality
reduction, integrating judiciously the merits of both feature selection and extraction.
A scalable rough-fuzzy clustering algorithm is introduced for large real life data sets,
where the theory of rough hypercuboid approach, interval type-2 fuzzy sets, and c-means
algorithm are integrated judiciously to handle the uncertainty present in a data set. While
the concept of rough hypercuboid approach deals with uncertainty, incompleteness, and
vagueness in cluster denition, the use of fuzzy membership of interval type-2 fuzzy sets
in the boundary region of a cluster enables ecient handling of overlapping partitions in
uncertain environment. Finally, the application of both clustering and feature selection
algorithms is demonstrated by grouping functionally similar microRNAs from microarray
data. The proposed approach can automatically select the optimum set of features while
clustering the microRNAs, making the complexity of the algorithm lower.
