Image for post
Image for post
Source: Image by JuraHeep from Pixabay

Data Engineering

Scalable and efficient data pipelines are as important for the success of analytics, data science, and machine learning as reliable supply lines are for winning a war.

For deploying big-data analytics, data science, and machine learning (ML) applications in the real-world, analytics-tuning and model-training is only around 25% of the work. Approximately 50% of the effort goes into making data ready for analytics and ML. The remaining 25% effort goes into making insights and model inferences easily consumable at scale. The big data pipeline puts it all together. It is the railroad on which heavy and marvelous wagons of ML run. Long term success depends on getting the data pipeline right.

This article gives an introduction to the data pipeline and an overview of big data architecture alternatives through the following four…


Image for post
Image for post
Photo by Victor Freitas from Pexels

Programming Tips

How to profile performance and balance it with ease of use

Applying a function to all rows in a Pandas DataFrame is one of the most common operations during data wrangling. Pandas DataFrame apply function is the most obvious choice for doing it. It takes a function as an argument and applies it along an axis of the DataFrame. However, it is not always the best choice.

In this article, you will measure the performance of 11 alternatives. With a companion Colab, you can do it all in your browser. No need to install anything on your machine.

Problem

Recently, I was analyzing user behavior data for an e-commerce app. Depending on the number of times a user did text and voice searches, I assigned each user to one of four…


Image for post
Image for post
(Image by Slang Labs)

Conversational AI

A case for Bhārat Bhāṣā Stack

Bhārat Bhāṣā Stack will catalyze Voice Assistant and Conversational AI innovations for vernacular Indic languages as India Stack did for FinTech.

A decade ago, it was unimaginable.

That one would pay a street vendor in a nondescript small town in India by scanning on mobile a QR code hung on his cart. Even for an amount as little as 50 rupees (less than a dollar).

That there would be many mobile apps and payment wallets from banks and non-banks. All seamlessly interoperable. Any two parties would transact by sharing an email like wallet-address. Without paying any transaction fee.

That myriad of small businesses would send catalogs on WhatsApp. Deliver goods to your home. Accept digital payments at your doorsteps. Without having to build a website or payment gateway. …


Image for post
Image for post
Photo by Vanessa Bucceri on Unsplash

Microservices

Distilled lessons from building microservices powering Slang Labs platform. Presented in a PyCon India 2019 tutorial.

A data model organizes data elements and formalizes their relationships with one another. In database design, data modeling is the process of analyzing application requirements and designing conceptual, logical, and physical data models for storage. However, data storage is only one, albeit an important, aspect of microservices.

There are three related but distinct data models in a microservice for:

  • API Data Model for interactions: It defines the schema of data payload that can be sent to or is received from the endpoints of a microservice. Also known as communication or exchange data model.
  • Object Data Model for computations: It is designed for efficient business logic implementation. Also known as application data model or data structures. …


Image for post
Image for post
My trek mates and I climbing Mayali Pass in Uttarakhand Himalaya, India

Machine Learning for Developers

Map of the terrain and a compass for software developers to embark on ML expedition.

You are a Software Engineer. You notice Artificial Intelligence, Machine Learning, Deep Learning, Data Science buzzwords all around. You wonder what these phrases mean, whether all this is for real and useful or is yet another hype and passing fad.

You want to figure out how it is changing or will change the computer/IT industry, and why you should care, if at all. You google about it, you read various articles, blogs, and tutorials. You get some idea but are also overwhelmed by the enormous wealth of math, tools, frameworks you discover.

You wish if someone could give an overview, say, a map and compass suitable for an engineer, to help you embark on the journey of mastering it all. …


Image for post
Image for post
Seashells and tree annual rings are nature’s meticulous logs. Image by Friedrich Frühling from Pixabay

Microservices

Distilled lessons from building microservices powering Slang Labs platform. Presented in a PyCon India 2019 tutorial.

Nature is a meticulous logger, and its logs are beautiful. Calcium carbonate layers in a seashell are nature’s log of ocean temperature, water quality, and food supply. Annual rings in tree cambium are nature’s log of dry and rainy seasons and forest fires. Fossils in the layers in sedimentary rocks are nature’s log of the flora and fauna life that existed at the time.

In software projects, logs, like tests, are often afterthoughts. But at Slang Labs, we take inspiration from nature’s elegance and meticulousness.

We are building a platform for programmers to make interaction with their mobile and web apps more natural by adding Voice Augmented eXperiences (VAX). The platform is powered by a collection of microservices. Each log entry a microservice emits is a fossil record of a request. Logs are designed for timely and effective use in raising alerts and swift diagnosis of issues. …


Python Microservices: Build and Test REST endpoints with Tornado
Python Microservices: Build and Test REST endpoints with Tornado

Microservices

Distilled lessons from building microservices powering Slang Labs platform. Presented in a PyCon India 2019 tutorial.

At Slang Labs, we are building a platform for programmers to easily and quickly add multilingual, multimodal Voice Augmented eXperiences (VAX) to their mobile and web apps. Think of an assistant like Alexa or Siri, but running inside your app and tailored for your app.

The platform is powered by a collection of microservices. For implementing these services, we chose Tornado because it has AsyncIO APIs. It is not heavyweight. Yet, it is mature and has a number of configurations, hooks, and a nice testing framework.

This blog post covers some of the best practices we learned while building these services; how…


Python Microservices: Build and Test REST endpoints with Tornado
Python Microservices: Build and Test REST endpoints with Tornado

Microservices

Distilled lessons from building microservices powering Slang Labs platform. Presented in a PyCon India 2019 tutorial.

At SlangLabs, we are building a platform for programmers to easily and quickly add multilingual, multimodal Voice Augmented eXperiences (VAX) to their mobile and web apps. Think of an assistant like Alexa or Siri, but running inside your app and tailored for your app.

The platform consists of:

This blog post is to share the best practices and lessons we have learned while building the microservices. …


Automatic Speech Recognition with Python: survey and comparison of alternatives
Automatic Speech Recognition with Python: survey and comparison of alternatives
Image by Slang Labs

Speech Recognition

Comparing 9 most prominent alternatives.

Speech recognition technologies have been evolving rapidly for the last couple of years, and are transitioning from the realm of science to engineering. With the growing popularity of voice assistants like Alexa, Siri, and Google Assistant, several apps (e.g., YouTube, Gana, Paytm Travel, My Jio) are beginning to have functionalities controlled by voice. At Slang Labs, we are building a platform for programmers to easily augment existing apps with voice experiences.

Automatic Speech Recognition (ASR) is the necessary first step in processing voice. In ASR, an audio file or speech spoken to a microphone is processed and converted to text, therefore it is also known as Speech-to-Text (STT). Then this text is fed to a Natural Language Processing/Understanding (NLP/NLU) to understand and extract key information (such as intentions, sentiments), and then appropriate action is taken. There are also stand-alone applications of ASR, e.g. …


How to build Python transcriber using Mozilla DeepSpeech
How to build Python transcriber using Mozilla DeepSpeech
Image by Slang Labs

Speech Recognition

Transcriber with PyAudio and DeepSpeech in 70 lines of Python code.

Voice Assistants are one of the hottest techs right now. Siri, Alexa, Google Assistant, all aim to help you talk to computers and not just touch and type. Automated Speech Recognition (ASR) and Natural Language Understanding (NLU/NLP) are the key technologies enabling it. If you are just-a-programmer like me, you might be itching to get a piece of the action and hack something. You are at the right place; read on.

Though these technologies are hard and the learning curve is steep, but are becoming increasingly accessible. Last month, Mozilla released DeepSpeech along with models for US English. It has smaller and faster models than ever before, and even has a TensorFlow Lite model that runs faster than real time on a single core of a Raspberry Pi 4. There are several interesting aspects, but right now I am going to focus on its refreshingly simple batch and stream APIs in C, .NET, Java, JavaScript, and Python for converting speech to text. By the end of this blog post, you will build a voice transcriber. …

About

Satish Chandra Gupta

Cofounder @SlangLabs. Ex Amazon, Microsoft Research. I built compilers for a decade. Now I make ML services handling billion events/day in realtime.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store