Posts by Collection


Fleet Management Optimisation with Spatio-Temporal Demand Forecasting in MaaS for Free-floating Micro-mobility

Bachelor Thesis. Student(s): Almström Oscar ,Carlsson Erik, Cronqvist Daniel ,Karlsson Max, Lilliecreutz Fredrik, Viala Bellander Alexander, 2021

In recent years, micro-mobility services have grown rapidly. Companies such as Voi, Bolt and Lime are front figures in this development. Utilising their resources efficiently through fleet management has been an essential factor in reaching profitability. Gathering data is becoming more critical for companies, enabling them to use data science and mathematical models to leverage their business. This report aims to examine the opportunity to use different data-driven models to predict future demand for micro-mobility services. Voi Technology provided real market data for this paper, limited to Gothenburg. All geospatial data was aggregated to geospatial units defined by Uber’s H3 spatial index. Furthermore, the research focused on the following methods; Markov properties, Time Series Analysis and Poisson processes. These methods were implemented in the programming language Python. The Markov and Prophet models provided solid demand predictions. However, due to sparse data representation, for some areas of the city, the data was clustered to give more accurate forecasts at the cost of geospatial granularity. Comparing the two, the Prophet models gave better results alone, while the Markov models instead gave insight into how the supply moves around in the market. With further work, the Markov model could prove favourable for a fleet operator during rebalancing due to its ability to track the vehicle flow and predict lack and surplus of supply. The data indicated significant seasonality effects and sporadic behaviour. FBprophet showed decent results in analysing those characteristics, using a sliding window technique as a significant contributor. Supporting FBprophet with several features improved the prediction, where rain stood out as the most impactful feature. The Poisson proccess model interprets demand as non-homogeneous and stochastic with inherent temporal randomness. While the theory behind the Poisson process model seems relevant, the results remain inconclusive. FBprophet performed the best demand prediction in contrast to the other models. However, further research could contradict this, and there is room for deeper exploration. Onwards, Neural Networks and Deep Learning are also exciting subjects for further research in demand forecasting.

Multi-Armed Bandits with Clustered Arms

Master Thesis. Student(s): Emelie Lööf, 2022

The project presents an allocation strategy for the stochastic multi armed bandit when considering instances with a clustered structure. Using the architecture of the KL-UCB policy as a source of inspiration, an algorithm which exploits and takes advantage from a clustered structure is derived. Firstly, encouraged by previous work related to the subject, a multi-level structure approach will constitute as an initial examination. Secondly, the Cluster KL-UCB policy will be derived and evaluated considering three different approaches. It will be shown, both theoretically and empirically, that adapting to a clustered environment improves the performance compared to its non cluster-adapting ancestor. Both upper and lower bounds on the regret will be provided in order to theoretically ensure the performance of the algorithm. Lastly, a number of empirical experiments will be performed in order to further ensure the performance and validate the theoretical results.


A reinforcement-learning approach to efficient communication

Mikael Kågebäck, Emil Carlsson, Devdatt Dubhashi, Asad Sayeed. Published in PlosOne, 2020

We present a multi-agent computational approach to partitioning semantic spaces using reinforcement-learning (RL). Two agents communicate using a finite linguistic vocabulary in order to convey a concept. This is tested in the color domain, and a natural reinforcement learning mechanism is shown to converge to a scheme that achieves a near-optimal trade-off of simplicity versus communication efficiency. Results are presented both on the communication efficiency as well as on analyses of the resulting partitions of the color space. The effect of varying environmental factors such as noise is also studied. These results suggest that RL offers a powerful and flexible computational framework that can contribute to the development of communication schemes for color names that are near-optimal in an information-theoretic sense and may shape color-naming systems across languages. Our approach is not specific to color and can be used to explore cross-language variation in other semantic domains.

Learning Approximate and Exact Numeral Systems via Reinforcement Learning

Emil Carlsson, Devdatt Dubhashi, Fredrik D. Johansson. Published in Proceedings of the Annual Meeting of the Cognitive Science Society, 43, 2021

Recent work (Xu et al., 2020) has suggested that numeral systems in different languages are shaped by a functional need for efficient communication in an information-theoretic sense. Here we take a learning-theoretic approach and show how efficient communication emerges via reinforcement learning. In our framework, two artificial agents play a Lewis signaling game where the goal is to convey a numeral concept. The agents gradually learn to communicate using reinforcement learning and the resulting numeral systems are shown to be efficient in the information-theoretic framework of Regier et al.(2015); Gibson et al. (2017). They are also shown to be similar to human numeral systems of same type. Our results thus provide a mechanistic explanation via reinforcement learning of the recent results in Xu et al. (2020) and can potentially be generalized to other semantic domains.

Thompson Sampling for Bandits with Clustered Arms

Emil Carlsson, Devdatt Dubhashi, Fredrik D. Johansson. Published in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence Main Track. Pages 2212-2218., 2021

We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-armed bandit and its contextual variant with linear expected rewards, in the setting where arms are clustered. We show, both theoretically and empirically, how exploiting a given cluster structure can significantly improve the regret and computational cost compared to using standard Thompson sampling. In the case of the stochastic multi-armed bandit we give upper bounds on the expected cumulative regret showing how it depends on the quality of the clustering. Finally, we perform an empirical evaluation showing that our algorithms perform well compared to previously proposed algorithms for bandits with clustered arms.

Towards Learning Abstractions via Reinforcement Learning

Erik Jergéus, Leo Karlsson Oinonen, Emil Carlsson, Moa Johansson. Published in AIC 2022, 8th International Workshop on Artificial Intelligence and Cognition, 2022

In this paper we take the first steps in studying a new approach to synthesis of efficient communication schemes in multi-agent systems, trained via reinforcement learning. We combine symbolic methods with machine learning, in what is referred to as a neuro-symbolic system. The agents are not restricted to only use initial primitives: reinforcement learning is interleaved with steps to extend the current language with novel higher-level concepts, allowing generalisation and more informative communication via shorter messages. We demonstrate that this approach allow agents to converge more quickly on a small collaborative construction task.

Pragmatic Reasoning in Structured Signaling Games

Emil Carlsson, Devdatt Dubhashi. Published in Proceedings of the Annual Meeting of the Cognitive Science Society, 44, 2022

In this work we introduce a structured signaling game, an extension of the classical signaling game with a similarity structure between meanings in the context, along with a variant of the Rational Speech Act (RSA) framework which we call structured-RSA (sRSA) for pragmatic reasoning in structured domains. We explore the behavior of the sRSA in the domainof color and show that pragmatic agents using sRSA on top of semantic representations, derived from the World Color Survey, attain efficiency very close to the information-theoretic limit after only 1 or 2 levels of recursion. We also explore the interaction between pragmatic reasoning and learning in multi-agent reinforcement learning framework. Our results illustrate that artificial agents using sRSA develop communication closer to the information theoretic frontier compared to agents using RSA and just reinforcement learning. We also find that the ambiguity of the semantic representation increasesas the pragmatic agents are allowed to perform deeper reasoning about each other during learning

Iterated learning and communication jointly explain efficient color naming systems

Emil Carlsson, Devdatt Dubhashi, Terry Regier. Published in Proceedings of the 45th Annual Conference of the Cognitive Science Society, 2023

It has been argued that semantic systems reflect pressure for efficiency, and a current debate concerns the cultural evolutionary process that produces this pattern. We consider efficiency as instantiated in the Information Bottleneck (IB) principle, and a model of cultural evolution that combines iterated learning and communication. We show that this model, instantiated in neural networks, converges to color naming systems that are efficient in the IB sense and similar to human color naming systems. We also show that iterated learning alone, and communication alone, do not yield the same outcome as clearly.

Pure Exploration in Bandits with Linear Constraints

Emil Carlsson, Debabrota Basu, Fredrik D Johansson, Devdatt Dubhashi. Published in Under review, earlier version presented at the European Workshop on Reinforcement Learning, 2023

We address the problem of identifying the optimal policy with a fixed confidence level in a multi-armed bandit setup, when \emph{the arms are subject to linear constraints}. Unlike the standard best-arm identification problem which is well studied, the optimal policy in this case may not be deterministic and could mix between several arms. This changes the geometry of the problem which we characterize via an information-theoretic lower bound. We introduce two asymptotically optimal algorithms for this setting, one based on the Track-and-Stop method and the other based on a game-theoretic approach. Both these algorithms try to track an optimal allocation based on the lower bound and computed by a weighted projection onto the boundary of a normal cone. Finally, we provide empirical results that validate our bounds and visualize how constraints change the hardness of the problem.