RSSB 2021 - Annual meeting of the Royal Statistical Society of Belgium

Our sponsors

EDT Stat-actu

Thanks to all of them !

Program

Thursday 21 Oct. | Friday 22 Oct.

Thursday 21 October

8:30 - 9:15

Welcome and registration in building L3

9:15 - 9:30

Opening by the organizers and the President of the RSSB [Room: L3 Auditoire 0/13A]

9:30 - 10:20

Plenary session [Room: L3 Auditoire 0/13A]
Chair : Cédric Heuchenne
Stéphane Gaiffas - WildWood: a new Random Forest algorithm

We introduce WildWood (WW), a new ensemble algorithm for supervised learning of Random Forest (RF) type.
While standard RF algorithms use bootstrap out-of-the-bag samples to compute out-of-the-bag scores, WW uses these samples to produce improved predictions given by an aggregation of all possible subtrees of each fully grown tree in the forest.
This is achieved by aggregation with exponential weights computed over out-of-the-bag samples, that are computed exactly and very efficiently thanks to an algorithm called context tree weighting.
This improvement, combined with an histogram strategy to accelerate split finding, makes WW fast and competitive compared with other well-established ensemble methods, such as standard RF and extreme gradient boosting algorithms.

10:20 - 10:50

Coffee Break

10:50 - 12:30

Parallel sessions - Each parallel session includes one invited talk (40') and three contributed talks (20').

Session 1 - Machine Learning [Room: L3 Auditoire 0/13A]
Chair : Rainer Von Sachs

Robin Van Oirbeek - mCube: multinomial micro-level reserving model

[Joint work with Emmanuel Jordy Menvouta, Jolien Ponnet and Tim Verdonck]

The estimation of the claims reserve or the remaining future claim cost of a given set of open claims is a crucial exercise to ensure the financial viability of an non-life insurance company. Typically, the claims reserve is estimated for the entire portfolio simultaneously using a macro-level reserving model, of which the ChainLadder is the best known member. However, it is also possible to estimate the claims reserve on a claim-by-claim basis by means of a micro-level reserving model. This type of models captures the entire lifecycle of a claim by explicitly modelling the underlying time and payment process separately. In this presentation, we will focus on the mCube or
multinomial micro-level reserving model, an adaptation of [1], where both processes, as well as the IBNR or ‘Incurred But Not Reported’ model are modelled using separate multinomial regression models. The predictive performance as well as the different components of the model will be discussed during this presentation.

Sophie Mathieu - Monitoring of sunspot number observations based on neural networks

[Joint work with Rainer Von Sachs, Christian Ritter, Laure Lefèvre and Véronique Delouille]

The observation of sunspots is one of the most important empirical data source giving information about the long-term solar activity. The sunspots observation extends from the seventeenth century to the present day. Surprisingly, determining the number of sunspots consistently over time remains a challenging problem. The challenge involves the absence of stationarity, different types of correlation and many kinds of observational errors.
In this work, we construct an artificial neural network for monitoring these important series. The method is trained by simulations that are sufficiently general to allow the predictions on unseen deviations of various types. The procedure can efficiently detect when the observations are deviating and takes into account the autocorrelation of the data. The network has been compared to a more classical procedure based on the CUSUM chart and appears to be consistent with the latter. It can also predict the size of the encountered deviations over a large range of values.
Using this method allows us to detect and identify a wide range of deviations. Many of these deviations are observer or equipment related. Detecting and understanding them will help improve future observations. Eliminating or correcting them in past data will lead to a more precise reconstruction of the International Sunspot Number, the world reference for solar activity.

Jakob Raymaekers - Regularized k-means Through Hard-Thresholding

[Joint work with Ruben H. Zamar]

The k-means algorithm remains a very popular and widely used clustering method in a variety of scientific fields due to its intuitive objective function and relative ease of computation. Whereas in classical k-means, all p features are used to partition the data, it can be desirable to identify a subset of features that partitions the data particularly well. This feature selection may lead to a more interpretable partitioning of the data and more accurate recovery of the 'true' clusters.

We study a framework for performing regularized k-means, based on direct penalization of the size of the cluster centers. Different penalization strategies are considered and compared in a theoretical analysis and an extensive Monte Carlo simulation study. Based on the results, we propose a new method called hard-threshold k-means (HTK-means), which uses an L0 penalty to induce sparsity. HTK-means is a fast and competitive sparse clustering method which is easily interpretable, as is illustrated on several real data examples. In this context, new graphical displays are presented and used to gain further insight into the data sets.

Hugues Annoye - Statistical Matching using KCCA, Super-OM and Autoencoders-CCA

[Joint work with Alessandro Beretta, Cédric Heuchenne and Ida-Marie Jensen]

The potential to study and improve different aspects of our lives is ever growing thanks to the abundance of data available in today’s modern society. Scientists and researchers often need to analyze data from different sources; the observations, which only share a subset of the variables, cannot always be paired to detect common individuals.
This is the case, for example, when the information required to study a certain phenomenon is coming from different sample surveys. Statistical matching is a common practice to combine these data sets. In this talk, we investigate and extend to statistical matching three methods based on Kernel Canonical Correlation Analysis (KCCA), Super-Organizing Map (Super-OM) and Autoencoders-Canonical Correlation Analysis (A-CCA). These methods are designed to deal with various variable types, sample weights and incompatibilities among categorical variables. We use the 2017 Belgian Statistics on Income and Living Conditions (SILC) and we compare the performance of the proposed statistical matching methods by means of a cross-validation technique, as if the data were available from two separate sources.

Session 2 - Asymptotics [Room: L3 (0/54)]
Chair : Amir Aboubacar

Eva Cantoni - Robust Fitting for Generalized Additive Models for Location, Scale and Shape

[Joint work with William H. Aeberhard, Giampiero Marra and Rosalba Radice]

The validity of estimation and smoothing parameter selection for the wide class of generalized additive models for location, scale and shape (GAMLSS) relies on the correct specification of a likelihood function. Deviations from such assumption are known to mislead any likelihood-based inference and can hinder penalization schemes meant to ensure some degree of smoothness for non-linear effects. We propose a general approach to achieve robustness in fitting GAMLSSs by limiting the contribution of observations with low log-likelihood values. Robust selection of the smoothing parameters can be carried out either by minimizing information criteria that naturally arise from the robustified likelihood or via an extended Fellner-Schall method. The latter allows for automatic smoothing parameter selection and is particularly advantageous in applications with multiple smoothing parameters. We also address the challenge of tuning robust estimators for models with non-linear effects by proposing a novel median downweighting proportion criterion. This enables a fair comparison with existing robust estimators for the special case of generalized additive models, where our estimator competes favorably. The overall good performance of our proposal is illustrated by further simulations in the GAMLSS setting and by an application to functional magnetic resonance brain imaging using bivariate smoothing splines.

Lorenzo Tedesco - Estimation of a General Semiparametric Hazards Regression Model With Application to Change analysis Hazard

[Joint work with Ingrid Van Keilegom]

We propose a method for the estimation of a general semiparametric hazards regression model that extends the Accelerated failure time model and the Cox proportional hazards model in survival analysis. The method is based on a kernel-smoothed profile likelihood. The estimator is shown to be consistent and achieves the semiparametric efficiency. The method is then applied to a generalisation of the Accelerated failure time model that considers a single change point in the hazard. The change point can depend both on the time but also on the covariate values in case of a time dependent covariate. Simulations are provided together with applications to a real data set and comparisons with alternative methods.

Guy Mélard - New method for the asymptotic properties of self-excited threshold models

[Joint work with Marcella Niglio]

A new method for obtaining the asymptotic properties of self-excited threshold autoregressive (SETAR) or ARMA (SETARMA) models is introduced. Threshold models are non-linear models for time series with k regimes, where the regime depends on the value of a variable called the threshold variable, with respect to one (when k = 2) or several (when k > 2) threshold values. Like for most non-linear models, the usual method for obtaining the asymptotic properties of such models consists in exhibiting a stationary and ergodic solution of the model equation and using ergodicity to prove the consistency and the asymptotic normality of an estimator of the model parameters. A new method for obtaining these asymptotic properties was recently proposed by the authors when the threshold variable is exogenous and independent of the innovations of the time series models. That method is based on an asymptotic theory for scalar or vector ARMA models, where the coefficients are not constant but are deterministic functions of time and a small number of parameters. The method is thus valid well beyond threshold autoregressive (TAR) models, like TARMA models and their multivariate counterparts, but still assuming an exogenous threshold variable. SETARMA models have random coefficients so that the theory is not directly
applicable. Nevertheless, it is possible to adapt these fundamental results to the case of the randomly varying coefficients that appear in SETARMA models and their vector generalization. The only problem with the new method is that the existence of the information matrix has to be assumed or proved.

Alexander Duerre - Depth conditioned functional curves

[Joint work with Davy Paindaveine]

Statistical functionals are useful tools to extract essential informations like location, scale or dependence from possibly complicated distributions. Popular examples are the expected value, the variance and the covariance matrix. A major appeal lies in their simplicity, which is also their major limitation. Sometimes the location of a distribution cannot be solely described by its expectation. Imagine a mixture distribution of a standard normal with large mixture weight and a point mass in far a way from 0. The expectation then fails to capture neither the location of the standard normal nor the location of the ”outlying” probability mass. We will develop the idea of conditional functional curves which capture both the properties of the central probability mass and properties of the outer or outlying probability mass.
For a subclass of functionals, we propose a consistent estimator for the corresponding conditional functional curve and derive pointwise asymptotic normality. We conclude with some examples underlining the various possibilities to apply this method.

Session 3 - Genetics [Room: L5 (1/1)]
Chair : Dirk Valkenborg

Kai Kammers - Transcriptional landscape of platelets and iPSC-derived megakaryocytes

[Joint work with M.A. Taub, B. Rodriguez, L.R. Yanek, I. Ruczinski, J. Martin, K. Kanchan, A. Battle, L. Cheng, Z.Z. Wang, A.D. Johnson, J.T. Leek, N. Faraday, L.C. Becker and R.A. Mathias]

Genome-wide association studies have identified common variants associated with platelet-related phenotypes, but because these variants are largely intronic or intergenic, their link to platelet biology is unclear. Additionally, extensive missing heritability may be resolved by integrating genetics and transcriptomics. To better understand the transcriptome signature and its genetic regulatory landscape in platelets and induced pluripotent stem cell-derived megakaryocyte (MK) cell lines (platelet precursor cells), we performed expression-quantitative trait locus (eQTL) analyses of wholegenome sequencing and RNA-sequencing data on both cell types in African American (AA) and European American (EA) subjects from the Genetic Studies of Atherosclerosis Risk (GeneSTAR) project.
By meta-analyzing the results of AAs and EAs and selecting the peak single-nucleotide polymorphism (SNP) for each expressed gene, we identified 946 cis-eQTLs in MKs and 1830 cis-eQTLs in platelets. Among the 57 eQTLs shared between the two tissues, the estimated directions of effect are very consistent (98.2% concordance). A high proportion of detected cis-eQTLs (74.9% in MKs and 84.3% in platelets) are unique to MKs and platelets compared with peak-associated SNP-expressed gene pairs of 48 other tissue types that are reported in version V7 of the Genotype-Tissue Expression Project. The locations of our identified eQTLs are significantly enriched for overlap with several annotation tracks highlighting genomic regions with specific functionality in MKs.
These results offer insights into the regulatory signature of MKs and platelets, with significant overlap in genes expressed, eQTLs detected, and enrichment within known superenhancers relevant to platelet biology.

Yao Chen - A Bootstrap Method for Variance Estimation in dPCR Experiments

[Joint work with Ward De Spiegelaere, Wim Trypsteen, David Gleerup and Olivier Thas]

Digital PCR (dPCR) is a highly sensitive technique for quantification of a target
molecule copy number in a biological sample. It proceeds by massive partitioning of the sample, followed by PCR reactions in all partitions and eventually classifying the individual partitions as positive or negative based on the end-point fluorescence intensities. The data are often analysed by relying on the binomial or Poisson distribution. However, these assumptions may not stand when there are other sources of variation than sampling error. Moreover, when more complicated statistics need to be computed (see further), then these parametric methods cannot be easily used for statistical inference.

We have developed a bootstrap method that takes into account not only the sampling variability and the inherent partitioning variation, but also other sources of errors, such as partition loss and pipetting error. Furthermore, the method is generic so that it can be easily used for variance estimation of more complicated non-linear statistics, such as copy number variation and the DNA shearing index. The method can also be extended to work with multiplex dPCR. We have evaluated the performance of the method under various realistic simulation scenarios.

The simulation results demonstrate the capability of this new bootstrap method for
variance estimation even when many sources of variation are present. Another strength is that it also works well for the variance estimation of non-linear statistics.

Leyla Kodalci - Simple and Flexible Sign and Rank-Based Methods for Testing for Differential Abundance in Microbiome Studies

[Joint work with Olivier Thas]

Microbiome data obtained from high-throughput sequencing are considered as compositional data, which is characterised by a sum-constraint. Hence, only ratios of count observations are informative. Furthermore, microbiome data are overdispersed and have many zero abundances. Many compositional data analysis methods make use of log ratios of the components of the observation vector. However, the many zero abundances cause problems when calculating ratios and logarithms.
In this work, we focus on the identification of taxa that are differentially abundant between two groups. We have developed semiparametric methods targeting the probability that the outcome of one taxon is smaller than the outcome of another taxon. The methods rely on logistic and probabilistic index models and hence inherit the flexibility that comes with these modelling frameworks. The estimation of this probability only requires information about the pairwise ordering of the taxa, and hence zero observations cause no problems. We have constructed several estimators of the effect size parameters in the model, and hypothesis tests based on these estimators. Results from a simulation study indicate that our methods control the false discovery rate at the nominal level and have good sensitivity compared to competitors.

Mohamad Zafer Merhi - Single Cell RNAseq data: Application of Clustering and Biclustering for structure Identification of Antigen Specificity

[Joint work with Dan Lin, Ahmed Essaghir and Ziv Shkedy]

The single cell RNA-sequencing technology allows the assessment of heterogenous cell-specific changes and their biological characteristics. In this study, we focus on a single cell omics data for immune profiling purposes. T-cells exhibit unique behavior referred to as cross-reactivity; the ability of T-Cells to recognize two or more peptide-MHC complexes by the TCR. Our work is applied on single cell RNA-seq data(publicly available in https://support.10xgenomics.com/single-cell-vdj/datasets/) consisting of CD8+ T Cells obtained using a single cell omics technology from 10X Genomics and our aim is to understand the heterogeneic characteristics and the binding specificities of these T cells, i.e., we aim to identify the specificity of the CD8+ T cells to one (or more) antigen(s). For the identification of specific CD8+ T Cells, we proposed an unsupervised data analysis pipeline. Biclustering methods are applied to recover and explore the cross-reactive behaviour of T Cells and to identify a subset of cells which are specific to a subset of antigens. Clustering methods are used to link these subsets to the RNA-seq data. Furthermore, we discuss the challenges of the application and evaluation of clustering algorithms on the single Cell RNA-seq data.

12:30 - 14:15

Lunch on the boat Pays de Liège

14:15 - 15:55

Session 4 - Covid - This session includes one invited talk (40') and three contributed talks (20').
[Room: L3 Auditoire 0/13A]
Chair : Stijn Vansteelandt

Niel Hens - Lessons Learned, Remaining Challenges and Pandemic Preparedness

In this talk, I will reflect on the COVID-19 pandemic both from a national and international perspective while focussing on the lessons learned, remaining challenges and pandemic preparedness. I will focus on design and analysis of infectious disease studies and the importance of peacetime research.

Hege Michiels - Estimation and interpretation of vaccine efficacy in COVID-19 trials

[Joint work with An Vandebosch and Stijn Vansteelandt]

An exceptional effort by the scientific community has led to the development of multiple vaccines against COVID-19. Efficacy estimates for these vaccines have been widely communicated to the general public, but may nonetheless be challenging to compare quantitatively. Indeed, the performed phase 3 trials differ in study design, definition of vaccine efficacy and in how cases arising shortly after vaccination are handled. In this work, we investigate the impact of these choices on the obtained vaccine efficacy estimates, both theoretically and by re-analysing the Janssen and Pfizer COVID-19 trial data using a uniform protocol. We moreover study the causal interpretation that can be assigned to per-protocol analyses typically performed in vaccine trials. Finally, we propose alternative estimands to measure vaccine efficacy in settings with delayed immune response and provide insight into the intrinsic effect of the vaccine after achieving adequate immune response.

Cécile Kremer - Quantifying superspreading using Poisson mixture distributions

[Joint work with Andrea Torneri, Sien Boesmans, Hanne Meuwissen, Selina Verdonschot, Koen Vanden Driessche, Christian L. Althaus, Christel Faes and Niel Hens]

An important parameter for the control of infectious diseases is the number of secondary cases, i.e. the number of new infections generated by an infectious individual. When individual variation in disease transmission is present, the distribution of the number of secondary cases is skewed and often modeled using a negative binomial distribution. However, this may not always be the best distribution to describe the underlying transmission process. We propose the use of three other offspring distributions to quantify heterogeneity in transmission, and find that estimates of the mean and variance may be biased when there is a substantial amount of heterogeneity. In addition we (re-)analyze three COVID-19 datasets and find that for two of these datasets the distribution of the number of secondary cases is better described by a Poisson-lognormal distribution. Since conclusions regarding superspreading potential of a disease are made based on the distribution used for modeling the number of secondary cases, we recommend comparing different distributions and selecting the most accurate one before making inferences on superspreading potential.

Lisa Hermans - Infectieradar.be, Crowdsourcing Surveillance System to Monitor the Spread of Infectious Diseases in Belgium

[Joint work with Yannick Vandendijck, Sarah Vercruysse, Emiliano Mancini, Jakob Randa, Geert Jan Bex, Sajeeth Sadanand, Daniela Paolotti, Christel Faes, Philippe Beutels, Niel Hens and Pierre Van Damme]

Infectieradar monitors the spread of infectious diseases with the help of volunteers via the internet. In this platform individuals can report symptoms and complaints related to their health condition, and report whether they seek medical care or not.
Traditional surveillance for respiratory infections (COVID-19, influenza, etc.) relies on patients that consult physicians. However, many individuals do not seek health care while ill. Also, not everyone is tested. On top of this, individuals may change their health seeking behaviour over the course of an epidemic. Community participatory surveillance is therefore critical. Infectieradar receives the data directly from the population, creating a fast and flexible monitoring system. The platform is part of a larger network, Influenzanet, and allows to put the Belgian data in an European perspective.
It brings insights into symptom burden, and gives an estimate to the true number of cases that got infected, allowing us to monitor how health complaints are distributed in Belgium over time. This data is of utmost importance for scientific research into the spread of COVID-19, but also for other viruses and infectious diseases. The more people subscribe to the platform, the more accurate the prediction of incidence of new infection will be.

16:00 - 16:30

Posters - Flash presentations [Room: L3 Auditoire 0/13A]
Chair : Arnout Van Messem

Please refer to the book of abstract for the corresponding abstracts and lists of coauthors.

Rahmasari Nur Azizah - Analysis of the Dose-response Relationship of Nanomaterial Toxicity Data
Alexandre Bohyn - Enumeration of large mixed four-and-two-level regular designs
Samuel Branders - Leveraging historical data for covariate adjustment in the analysis of randomized clinical trials.
Rembert De Blander - Inference on Statistics of Interest by Simulation
Benjamin Deketelaere - Quantile regression for Interval-Censored data using an Enriched Laplace Distribution
Morine Delhelle - Dependent censoring in cure models
Mohammed Saif Ismail Hameed - A tailored analysis of data from OMARS designs
Jimmy Keydener - Non parametric estimations for biased spatiotemporal data
Chikeola Ladekpo - Outliers detection in meta-analysis
Ensiyeh Nezakati - An unbalanced, distributed estimator for precision matrices using the confidence distribution method
Jessica Pavani - Spatial modelling for mosquito-borne diseases: a joint approach
Murih Pusparum - Individual Reference Intervals as Preventive Instruments for Personalized Health
Marc Vidal - Estimation Process in Hilbertian Independent Component Analysis Models
Pieter Willems - Post-selection inference for partially linear high-dimensional single-index models

16:30 - 17:15

Posters and coffee break

17:15 - 18:15

Quetelet session [Room: L3 Auditoire 0/13A]
Chair : Beatrijs Moerkerke
Jelle Goeman - All-Resolutions Inference

Many fields of science nowadays gather data at a very fine resolution but do inference at a higher aggregated level. For example, in neuroimaging data are gathered at the level of 3 mm × 3 mm × 3 mm voxels, but the relevant biology happens at the level of cm-scale brain areas; in genetics, data are gathered at the level of single-DNA-base polymorphisms, but interesting questions happen at the level of genes or even gene groups; in spatial statistics, data may be gathered at street level but interesting questions are about neighbourhoods or regions. Often, there is not just one natural way to aggregate data to prepare for inference. Multiple alternative criteria could be used to drive the grouping. Aggregation to large regions may give low specificity; more limited aggregation may give low power.
This talk presents how Closed Testing can be used to analyze this type data at all resolutions simultaneously. The method allows the choice how and how much to aggregate to be chosen freely by the researcher, in a data-dependent way, while still strictly controlling the probability of false positive findings. This allows researchers to adapt the inference to the amount and the shape of the signal that is present in the data: the stronger the signal, the better it will be pinpointed by the closed testing procedure.
I will review the general idea and theory of closed testing and recent progress in method development in this area. Several example contexts illustrate the wide applicability of all-resolutions inference.

Announcement of the Quetelet award winners

18:15 - 19:30

General Assembly of the RSSB [Room: L3 Auditoire 0/13A]

19:45 - 20:30

Reception

20:30

Conference Dinner

Hotel rooms have been booked for all the participants having selected the full conference package (even for non-early bird registrations)

Friday 22 October

9:00 - 10:40

Parallel sessions - Each parallel session includes one invited talk (40') and three contributed talks (20').

Session 5 - New advances in Multivariate Statistics [Room: L3 Auditoire 0/13A]
Chair : Yvik Swan

Amandine Veber - A multitype branching process to model the growth of a filamentous fungus

[Joint work with Milica Tomasevic and Vincent Bansaye]

Filamentous fungi form a large family of species playing an important role in the functioning of many ecosystems. They develop spatially thanks to the growth and multiplication of filaments (also called hyphae) which allow the uptake and sharing of nutrients and other molecules. In this presentation, we shall present a toy model for the development of a hyphal network, whose main aim is to identify a small number of key parameters describing the growth of the fungus in homogeneous conditions (in particular, in lab conditions) and to understand and quantify the impact of different forms of stress on this growth.

This work benefited from a long-lasting collaboration (in particular) with Florence Chapeland-Leclerc, Gwenael Ruprich-Robert and Eric Herbert (mycologists and physicist at LIED, University of Paris), which is part of the NEMATIC research program on the growth of constrained networks.

Davy Paindaveine - Affine-equivariant estimation of location under L_p loss

Daniel Hlubinka - Multivariate distribution-free center-outward rank tests

[Joint work with Marc Hallin and Sarka Hudecova]

We use the idea of center-outward multivariate ranks and signs based on measure transportation to construct distribution-free tests in MANOVA and multiple-output regression models .We prove the Hájek representation and asymptotic normality of the tests. Simulations and an empirical study demonstrate the excellent performance of the proposed procedures.

Gaspard Bernard - Detecting the dimension of a signal under weak identifiability

[Joint work with Thomas Verdebout]

We tackle the well-known subsphericity testing problem - i.e. testing the equality of the q smallest eigenvalues of a scatter parameter - introduced by Lawley [1] in a Gaussian setting. This problem is closely related to PCA, dimension reduction and estimation of the dimension of a signal [2]. The speci?city of our work lies in the fact that we consider weak identifiability scenarios - i.e. scenarios where the (q - 1)-th
eigenvalue converges to the q-th. Using Le Cam theory, we study asymptotic properties of the classical gaussian LR test, derive an optimality property up to a certain rate of convergence of the (q-1)-th eigenvalue and propose some new tests, robust to the weak identifiability hypothesis while retaining optimality in absence of weak identifiability.
Based on these new testing procedures, we also propose an estimator of the signal dimension of the same type as the Nordhausen-Oja-Tyler one [3].

Session 6 - Statistics tools for Economics [Room: L5 (2/7)]
Chair : Tim Verdonck

Ines Wilms - Tree-based Node Aggregation in Sparse Graphical Models

[Joint work with Jacob Bien]

High-dimensional graphical models are often estimated using regularization that is aimed at reducing the number of edges in a network.
In this work, we show how even simpler networks can be produced by aggregating the nodes of the graphical model. We develop a new convex regularized method, called the tree-aggregated graphical lasso or tag-lasso, that estimates graphical models that are both edge-sparse and node-aggregated.
The aggregation is performed in a data-driven fashion by leveraging side information in the form of a tree that encodes node similarity and facilitates the interpretation of the resulting aggregated nodes. We provide an efficient implementation of the tag-lasso by using the locally adaptive alternating direction method of multipliers and illustrate our proposal's practical advantages in simulation and in applications.

Samson Assele - The value of consideration data in a discrete choice experiment

[Joint work with Michel Meulders and Martina Vendebroek]

In stated preference surveys, data regarding the considered alternatives is sometimes collected prior to the preferred alternative. This consideration data is inconsistent with the choice data when the chosen alternative is not in the stated consideration set. Several modeling approaches have been used in such situations. Some researchers ignore the consideration data and assume all alternatives are considered. Others only use the consistent choice data and delete the inconsistent observations. The most intricate methods use a latent consideration set formation approach in modeling the choice process. Several studies have shown that ignoring the consideration set formation can lead to misleading results if there is indeed screening. On the other hand, using only the consistent data or the latent consideration set formation approach ignores some or all the information contained in the stated consideration data. We extend the latent consideration set formation model to incorporate the stated consideration set data but allow for inconsistencies in consideration and choice data. Besides allowing for inconsistencies, we model the consideration phase and the choice phase using a mixed logit model that allows for individual level heterogeneity which also extends previous attempts. We compare our model with the existing approaches through simulation. We confirm that methods that account for screening recover the population preference parameters better than those that ignore the screening. We show that the mean preference parameters and heterogeneity parameters can be retrieved very well by modeling the screening process in a latent fashion but also that including information of stated consideration data barely improves recovery of the population parameters.

Jad Beyhum - A nonparametric instrumental approach to endogeneity in competing risks models

[Joint work with Jean-Pierre Florens and Ingrid Van Keilegom]

This paper discusses endogenous treatment models with duration outcomes, competing risks and random right censoring. The endogeneity issue is solved using a discrete instrumental variable. We show that the competing risks model generates a nonparametric quantile instrumental regression problem. The cause-specific cumulative incidence, the cause-specific hazard and the subdistribution hazard can be recovered from the regression function. A distinguishing feature of the model is that censoring and competing risks prevent identification at some quantiles. We characterize the set of quantiles for which exact identification is possible and give partial identification results for other quantiles. We outline an estimation procedure and discuss its properties. The estimator exhibits good finite sample performance in simulations. We apply the proposed method to the Health Insurance Plan of Greater New York experiment.

Mario Becerra - Bayesian D- and I-optimal designs for choice experiments with mixtures using a multinomial logit model

Discrete choice experiments are frequently used to quantify consumer preferences by having respondents choose between different alternatives. Choice experiments involving mixtures of ingredients have been largely overlooked in the literature, even though many products and services can be described as mixtures of ingredients. As a consequence, little research has been done on the optimal design of choice experiments involving mixtures. The only existing research has focused on D-optimal designs, which means that an estimation-based approach was adopted. However, in experiments with mixtures, it is crucial to obtain models that yield precise predictions for any combination of ingredient proportions. This is because the goal of mixture experiments generally is to find the mixture that optimizes the respondents' utility. As a result, the I-optimality criterion is more suitable for designing choice experiments with mixtures than the D-optimality criterion because the I-optimality criterion focuses on getting precise predictions with the estimated statistical model. In this paper, we study Bayesian I-optimal designs, compare them with their Bayesian D-optimal counterparts, and show that the former designs perform substantially better than the latter in terms of the variance of the predicted utility.

10:40 - 11:00

Coffee Break

11:00 - 11:40

Parallel sessions - These parallel sessions include two contributed talks (20').

Session 7 - Copulas [Room: L3 Auditoire 0/13A]
Chair : Thomas Verdebout

Negera Wakgari Deresa - Copula based Cox proportional hazards models for dependent censoring

[Joint work with Ingrid Van Keilegom]

Abstract: Most existing copula models for dependent censoring in the literature assume that the parameter defining the copula function is known. However, prior knowledge on the dependence is often not available. In this paper we propose a novel model that allows the estimation of the copula parameter. The model is based on a parametric copula model for the relation between the survival time (T) and the censoring time (C), where the marginal distributions of T and C follow a semiparametric Cox proportional hazards model and a parametric model, respectively. We show that this model is identified, and propose estimators of the nonparametric cumulative hazard and the finite dimensional parameters. It is shown that the estimators of the model parameters and the cumulative hazard are consistent and asymptotically normal. We also investigate the performance of the proposed method using finite sample simulations.

Jonas Baillien - Inference for copulas with two-piece margins

[Joint work with Anneleen Verhasselt and Irène Gijbels]

Copulas provide a versatile tool in the modelling of multivariate distributions. With increased awareness of possible asymmetry in data, skewed copulas combined with classical margins have been employed to model these data appropriately. The reverse, skewed margins with a (classical) copula has also been considered, but mainly with skew-symmetrical margins. We focus on a different type of skewed margins, namely the two-piece distributions. More specifically, we use the recently proposed quantile-based asymmetric family of distributions in given copula structures. For this combination, we provide statistical inference results in consistency and asymptotic normality for the Inference Functions for Margins estimator. A simulation study complements the theoretical results, and the practical usefulness is shown through some real data examples.

Session 8 - Epidemiology [Room: L5 (2/7)]
Chair : Catherine Legrand

Oswaldo Gressani - The EpiLPS project: a new Bayesian tool for estimating the time-varying reproduction number

In epidemiology, the instantaneous reproduction number (Rt) is a time-varying metric defined as the average number of secondary infections generated by individuals who are infectious at time t. We present a new Bayesian tool for robust estimation of the time-varying reproduction number based on Laplacian-P-splines. The proposed EpiLPS methodology uses Bayesian P-splines coupled with Laplace approximations to the conditional posterior of the spline vector to smooth the epidemic curve and inject the latter information into the renewal equation to obtain point estimates and credible envelopes of Rt. Two alternative approaches for inference are presented, namely:
(1) a fully sampling-free approach (LPSMAP) based on a maximum a posteriori argument for the model hyperparameters,
delivering estimates of Rt in only a few seconds; and
(2) a Metropolis-adjusted Langevin algorithm (LPSMALA) based on a MCMC scheme with underlying Langevin dynamics for efficient sampling of the posterior target. In EpiLPS, incidence counts are assumed to follow a negative Binomial distribution to account for potential excess variability in the data that would not be captured by a classic Poisson model.
The performance of EpiLPS is checked in various numerical studies and compared with the popular EpiEstim package. Finally, EpiLPS is applied on real epidemic outbreak data.

Martijn Bollen - A Spatio-Temporal Statistical Model to Assess Wild Boar Occupancy Trends

[Joint work with Thomas Neyens, Maxime Fajgenblat, Valérie De Waele, Alain Licoppe, Benoît Manet, Jim Casaer and Natalie Beenaerts]

In this project, we aimed to assess the impact and effectiveness of a set of management strategies (i.e., fencing, culling and carcass removal), adopted in response to a recent African swine fever outbreak in Wallonia, on wild boar occupancy trends in an infected and non-infected zone. Therefore, we collected absence-presence data (March 2019 – May 2020) through a camera trapping network in the South of Belgium and fit a spatio-temporal zero-inflation model that accounts for false absences using Hamiltonian Monte Carlo estimation. We capture extra-variability induced by spatial correlation through the addition of a Hilbert space (reduced-rank) Gaussian process. Our model successfully captures significant population declines and attributes differences in initial occupancy to the infection status. Over a period of 15 months, we find mean extinction rates of 62.03% and 97.70% for a non-infected and infected zone respectively. Together, these results confirm the effectiveness of ASF control measures implemented in Belgium.

11:40 - 12:30

Plenary session [Room: L3 Auditoire 0/13A]
Chair : Sophie Vanbelle
Michel Dumontier - Accelerating discovery science with FAIR data and services

Biomedicine has always been a fertile and challenging domain for computational discovery science. Indeed, the existence of millions of scientific articles, thousands of databases, and hundreds of ontologies, offer exciting opportunities to reuse our collective knowledge, were we not stymied by incompatible formats, overlapping and incomplete vocabularies, unclear licensing, and heterogeneous access points. In this talk, I will discuss our work to create computational standards, platforms, and methods to wrangle knowledge into simple, but effective representations based on semantic web technologies that are maximally FAIR - Findable, Accessible, Interoperable, and Reusable - and to further use these for biomedical knowledge discovery. But only with additional crucial developments will this emerging Internet of FAIR data and services enable automated scientific discovery on a global scale.

Closing from the President of the RSSB

12:30 - 13:45

Lunch on the campus

13:45 - 14:15

MINI HACKATHON - Topic presentations [Room: L3 Auditoire 0/13A]

14:15 - 17:00

MINI HACKATHON - Brainstorming [Room: L3 Auditoire 0/13A]
or
Guided tour of the permanent collection of La Boverie museum

17:00 - 17:45

MINI HACKATHON - Team presentations [Room: L3 Auditoire 0/13A]

17:45 - 18:00

MINI HACKATHON - Deliberation of the jury [Room: TBA]

18:00

Concluding award ceremony

Deadlines

Opening registration	15 June 2021
Call for abstract	15 June 2021
End of abstract submission	16 August 2021
End of early* registration *with hotel room guaranteed	6 Sept. 2021
End of late* registration *with hotel room if available	15 Oct. 2021
RSSB2021 Conference	21-22 Oct. 2021

28th Annual Meeting of the RSSB

21 and 22 October 2021

Hôtel Van der Valk and Campus Liège-Centre of ULiège

Our sponsors

Program

Thursday 21 October

Friday 22 October

Deadlines