Etienne Boisseau

Bringing State-of-the-Art Artificial Intelligence to Your Business

Philosophy

My approach to delivering maximum value

Purpose-driven

The field of Artificial Intelligence is still in its infancy.
This means most practitioners are experts in the technical aspects of their work rather than in the business value it can bring.

By focusing on bringing concrete value first, we can make sure that our AI systems stay in the service of the business, rather than the other way around.

Co-construction

Whatever we build, we'll build it together.
On the business side, that means going the extra mile to be transparent and clear on what we are building, how it works, and what it can and can't do.

On the technical side, that means integrating with your existing pipelines and processes, making sure your technical teams are fully on board and putting the product in your hands.

Project Portfolio

Innovative solutions I helped my clients achieve

Tracking Technology Adoption Around the Globe

We created a real-time connected database (Knowledge Graph) to track the adoption of technologies by top companies. The information in this database was extracted fully automatically from unstructured text data.

Turning raw text into a living network allowed an unprecedented level of big-picture analysis on the technology landscape.

Knowledge Graphs | Information Extraction | Deep Syntax Analysis

Technical Details

The aim of the project was to extract relationships between companies and technologies based on transcripts of Earnings Calls. The three major technical challenges were the high level of noise (the large majority of technology mentions in the data are meaningless), the lack of labeled data and a requirement of interpretability and robustness.

The solution we found was to use an OpenIE model based on dependency parsing to extract assertions and only keep those which expressed an explicit relationship between the company and the technology. Each assertion became an edge between two nodes in the Knowledge Graph.

Then, a classification model (based on the TARS model, trained using transfer learning with weak supervision) classified each edge, thus allowing for various types of edges between the nodes.

Tools and Libraries: Flair | Spacy | Vis.js

Detecting and Predicting Logistics Incidents

The client needed an incident detection system that would have a 0% false-negative rate - meaning it knew when to call upon a human for any ambiguous case. We were able to achieve this thanks to a rigourous data processing, data quality control and monitoring pipeline.

We also used the client's historical data to build predictive models and anticipate incidents before they happen, allowing the client to gain a competitive edge by having predictions tailored to their sector.

Data Quality Control | Database Design | Machine Learning | Data Engineering

Technical Details

To build a truly reliable system, we needed to ensure the quality of the incoming data, by developing a robust processing pipeline for the data provided by the suppliers. This processing chain was based on :

A data schema developed for the project (6 new tables were added to the database).
A Python library providing an abstraction layer to interact with the raw data.
5 atomic workers developed in Python to implement the different processing steps. This separation of tasks was designed to create loose coupling between the workers and maximize desirable properties such as involutivity and statelessness.

Additionally, we built robust, interpretable predictive models with automatic monitoring and retraining as new data comes in.

Optimizating Neural Networks for Large-scale Inference

Using state-of-the-art neural network optimization methods , optimizing data flow and minimizing idle time, we managed to divide the cost of inference by 6 for large NLP models.

We conducted an extensive study exploring a large spectrum of techniques and providing guidance for how to optimize future models at the company.

MLOps | NLP | Deep Learning | Transformers

Technical Details

Techniques evaluated for this project include quantization, distillation, pruning, graph optimization, and several backends such as ONNX and TensorRT.

Tools and Libraries: PyTorch | ONNX | TensorRT | Docker

Monitoring Tree Growth From Space

We built a prototype to remotely measure the growth of trees for a forestry company based in Uruguay. Using freely-available Radar Satellite images, we could go from measuring their size by hand once every few years, to having an estimate every 6 days.

This project was done in close partnership with a Computer Vision researcher from the Borelli Research Center.

Remote Sensing | Computer Vision | Time Series

Technical Details

The main technical challenge was that, past a certain age, trees become too dense for their volume to be accurately measured using radar signals (a problem known as backscatter saturation). Our goal was to quantify at what age this saturation would be reached depending on the radar frequency used (C-band or L-band) .

By combining ground-truth measurements provided by the company, Sentinel-1 satellite images, daily weather data for the area, and the scientific literature on tree biology, we developed a method comprising a normalization scheme and a three-scale analysis to assess the presence of backscatter saturation at various tree ages, and provided estimates for the saturation age : 2.5 years for the C band, 6.2 for the L band.

Tools and Libraries: Rasterio | GDAL | Scikit-learn

Simulating a Social Network

We built a unique algorithm for simulating user messages on a social network. The text content of the posts was machine-generated using a novel algorithm allowing for unprecedented control over the nature, theme, and emotion of the text.

This system powered an innovative immersive experience, which was used in a crisis management course at a top French university.

Controllable Text Generation | BERT | Deep Learning

Technical Details

The project implied generating text in French, which was not possible using existing models. Pretrained causal language models (such as GPT-2), which allow for text generation, did not exist in French at the time (and the training budget for such a model was too high for us) . Furthermore, the generation process needed to be highly controllable (in terms of the emotion conveyed or the subjects mentioned) , and have high quality (realistic sentences) and high diversity (no two generated sentences should be alike) .

To this end, I reproduced a method from the literature to generate text using masked language models instead of causal ones. The method suffered from a number of shortfalls and was not controllable. I proposed, implemented and tested 6 major improvements to the method which allowed for much higher-quality outputs as well as high controllability.

Tools and Libraries: Pytorch | Transformers | Spacy

Education

	MVA (Mathematics, Vision, Learning) : Master's Degree in Artificial Intelligence at ENS Paris-Saclay
	Master's degree in Statistics, Economics and Data Science at ENSAE
	Master's degree in Management at ESCP Business School
	Bachelor's and Master 1 in Mathematics at Sorbonne University

Experience

	I applied State-of-the-Art NLP methods to automate time-consuming language tasks and built unique products for analyzing large-scale text datasets.
	I gave a three-day Deep Learning bootcamp for trainees in the "AI Developer" program. I assessed trainees' final projects at the end of the program alongside other Data Science professionals.
	I developed real-time algorithms for analyzing massive social network data, going from idea to prototype to production in a matter of months.
	I built deep learning-based price prediction models and led a project to unify the data processing pipeline across six teams and several countries, reducing human error and increasing productivity.
	I helped link together software engineering and data science teams, building joint projects to better extract valuable insights from satellite imagery.