Private AI

Customer trust starts with protecting privacy

De-identify data at source
Text, Image, and Video De-identification

De-identify your data wherever they are, with only 3 lines of code.


Chrome         Firefox         Safari

IOS         Android

  • Liability and AI: An AI System Developer's Perspective

    A brief review of the Legal Handbook of Artificial Intelligence and Machine Learning.

  • Accelerating Tensorflow Lite with XNNPACK

    The new Tensorflow Lite XNNPACK delegate enables best in-class performance on x86 and ARM CPUs.

  • Perfectly Privacy-Preserving AI

    What is it and how do we achieve it? We identified four pillars of privacy-preserving machine learning.

  • NVIDIA DALI: Speeding up PyTorch

    Some techniques to improve DALI resource usage and create a completely CPU-based pipeline. Up to 4x faster PyTorch training.

  • Homomorphic Encryption for Beginners: A Practical Guide (Part 2)

    The Fourier Transform

  • Which privacy-preserving method should I use??

    A tentative decision tree for the privacy-conscious programmer

  • Homomorphic Encryption for Beginners: A Practical Guide (Part 1)

    The basics of homomorphic encryption, followed by a brief overview of the open source homomorphic encryption libraries that are currently available, ending with a tutorial on how to use one of those libraries (namely, PALISADE).

  • Why is Privacy-Preserving Natural Language Processing Important?

    Why we should bother creating natural language processing (NLP) tools that preserve privacy. Apparently not everyone spends hours upon hours thinking about data breaches and data privacy infringements.

  • A Brief Overview of Privacy-Preserving Software Methods

    Symmetric encryption, asymmetric encryption, homomorphic encryption, differential privacy, and secure multi-party computation.

  • Reasoning about Unstructured Data De-Identification

    We frame the problem of de-identifying unstructured text within the greater landscape of privacy enhancing technologies. We then cover what sort of background knowledge can be gained from only stylistic information about a written document and how we can use research on authorship attribution and author profiling to improve our understanding about the sorts of inferences that can be made from an otherwise de-identified text. Finally, we provide a risk score for determining the likelihood that a message will be attributed to a particular author within a dataset using only author profiling tools.

  • Extracting MFCCs and BFCCs from Encrypted Signals

    We describe a method for extracting MFCCs and BFCCs from an encrypted signal without having to decrypt any intermediate values. To do so, we introduce a novel approach for approximating the value of logarithms given encrypted input data. This method works over any interval for which logarithms are defined and bounded. Extracting spectral features from encrypted signals is the first step towards achieving secure end-to-end automatic speech recognition over encrypted data. We experimentally determine the appropriate precision thresholds to support accurate WER for ASR over the TIMIT dataset.

  • A Critical Reassessment of Evaluation Baselines for Speech Summarization

    We assess the current state of the art in speech summarization, by comparing a typical summarizer on two different domains: lecture data and the SWITCHBOARD corpus. Our results cast significant doubt on the merits of this area's accepted evaluation standards in termms of: baselines chosen, the correspondence of results to our intuition of what "summaries" should be, and the value of adding speech-related features to summarizers that already use transcripts from automatic speech recognition (ASR) system.

  • Convolutional Neural Networks for Speech Recognition

    We show that further error rate reduction can be obtained by using convolutional neural networks (CNNs). We first present a concise description of the basic CNN and explain how it can be used for speech recognition. We further propose a limited-weight-sharing scheme that can better model speech features. The special structure such as local connectivity, weight sharing, and pooling in CNNs exhibits some degree of invariance to small shifts of speech features along the frequency axis, which is important to deal with speaker and environment variations.

  • Privacy-Preserving Character Language Modelling

    Some of the most sensitive information we generate is either written or spoken using natural language. Privacy-preserving methods for natural language processing are therefore crucial, especially considering the ever-growing number of data breaches. However, there has been little work in this area up until now. In fact, no privacy-preserving methods have been proposed for many of the most basic NLP tasks. We propose a method for calculating character bigram and trigram probabilities over sensitive data using homomorphic encryption.

  • Flexible Web document analysis for delivery to narrow-bandwidth devices

    We propose a set of baseline heuristics for identifying genuinely tabular information and news links in HTML documents. A prototype implementation of these heuristics is described for delivering content from news providers' home pages to a narrow-bandwidth device such as a portable digital assistant or cellular phone display. Its evaluation on 75 Web sites is provided, along with a discussion of topics for future research.

  • Web-Based Language Modelling for Automatic Lecture Transcription

    Universities have long relied on written text to share knowledge. As more lectures are made available on-line, these must be accompanied by textual transcripts in order to provide the same access to information as textbooks. While Automatic Speech Recognition (ASR) is a cost-effective method to deliver transcriptions, its accuracy for lectures is not yet satisfactory. One approach for improving lecture ASR is to build smaller, topic-dependent Language Models (LMs) and combine them (through LM interpolation or hypothesis space combination) with general-purpose, large-vocabulary LMs. In this paper, we propose a simple solution for lecture ASR with similar or better Word Error Rate reductions (as well as topic-specific keyword identification accuracies) than combination-based approaches. Our method eliminates the need for two types of LMs by exploiting the lecture slides to collect a web corpus appropriate for modelling both the conversational and the topic-specific styles of lectures.


Gerald Penn, PhD

Co-Founder and CSO

Gerald Penn is a Professor of Computer Science at the University of Toronto, where he studies spoken language processing and computational linguistics. He has over 100 publications, with the top one accruing 1,581 citations. He is a senior member of IEEE and AAAI, and a past recipient of the Ontario Early Researcher Award. His lab revolutionized speech recognition with its work on neural networks, which received the IEEE Signal Processing Society's Best Paper Award. He has led numerous research projects, including ones funded by Avaya, Bell Canada, CAE, the Connaught Fund, Microsoft, NSERC, the German Ministry for Training and Research, SMART Technologies, the U.S. Army and the U.S. Office of the Director of National Intelligence. Gerald has also worked at Bell Labs and NASA.

John Stocks, MA

Head of Sales

John has a Masters in Journalism from Concordia and a BA in Russian History from UofT, and has tried to be the dumbest guy in the room ever since. With a decade of experience in software sales split between London and Toronto at Airbnb and Limelight, he specializes in customer acquisition for start-ups, growth marketing, and scaling high-performance teams. He's extremely excited to be working with such a talented group of individuals and to be maintaining user privacy while unlocking more datasets for exploration. If you ever need to bribe him, beer and/or pancakes are your best friend.

Peizhao Hu, PhD

Faculty Affiliate, Security Research

Peizhao Hu is an Assistant Professor in the Department of Computer Science at Rochester Institute of Technology (RIT), New York. His research focuses on (1) privacy-preserving cloud data analytics, specifically homomorphic encryption and multiparty computations; (2) distributed systems, including mobile and pervasive computing. Before joining RIT, he was Senior Research Engineer at NICTA (Australia's centre of research excellence; now Data61@CSIRO).

Remi Daviet, PhD

Post-Doctoral Affilliate, Statistics and Quantitative Marketing

Remi Daviet is a Post-Doctoral researcher at the Wharton School of the University of Pennsylvania. His research focuses on marketing analytics and behavior modelling. He is interested in the development and promotion of privacy respecting marketing practices.

DV" Acceleprise" NextAI
UofT ventureLab GAN
Contact Us

Interested in a demo? Email us at
Twitter LinkedIn