About Me — ToC of Medium Articles
I look at Machine Learning through Software Developer’s lens.
Hello and welcome to my corner @ Medium! Wish you a very happy new year!
My name is Satish Chandra Gupta, and I am a computer programmer.
I am a co-founder at Slang Labs, where we are building the world’s first Voice Assistant as a Service (VAaaS).
Before Slang Labs, I have worked at Amazon, Microsoft Research, and IBM (Rational Software).
At Medium, I write about Machine Learning from a practitioner's perspective and share my experience and learning from building “data” applications. I also publish Machine Learning for Developers (ML4Devs) newsletter.
In my past life, I crafted profilers, compilers, program analysis tools, and IDEs.
I grew up in the industrial town of Kanpur (India) at the bank of river Ganga. I studied Chemical Engineering (with a minor in Computing) in my undergrad, and did M.S. in Computer Science.
When not wrangling with data, I enjoy reading and writing Hindi poetry, and trekking at the high altitudes of the Himalayas.
My 5 Most Popular Medium Articles
In the last two years (2020 and 2021), I have written only 14 articles on Medium. Five of these got 19K-155K views and 100–550, which is not bad considering my very modest following of only 956 on 1 Jan 2022. I aim to write more regularly this year.
Scalable Efficient Big Data Pipeline Architecture on Cloud
12 Ways to Apply a Function to Each Row in Pandas DataFrame
SQL vs. NoSQL Database: When to Use, How to Choose
Top 10 Programming Languages Portfolio for 2022
Python Microservices: Choices, Key Concepts, and Project setup
List of My Medium Articles
Building successful Machine Learning products or product features takes 5 disciplines (product-data-ML-dev-ops) to come together:
- Product: Product Design & Management encompasses the whole gamut of things: identifying the user and business needs, designing the user experience (including implicitly or explicitly collecting feedback from the use of ML-assisted features), defining business success metrics, and guiding the whole journey from conception to delivery. It is one of the hardest parts and key to the success of the ML-assisted feature/product. Sadly, it is often ignored.
- Data: Data Engineering takes care of collecting, curating, storing, and managing the needed data at scale (aka Big Data). As Monica Rogati explained in the Data Science Hierarchy of Needs, data engineering covers the first 2 layers (out of 6) of the pyramid. Anaconda State of Data Science 2021 report says that a good 39% of the effort goes into data cleaning and data preparation (page 14). Without good quality data, there is no ML. And without a solid Data Engineering foundation, there is no ML product. Data Pipelines are the railroads on which heavy and marvelous wagons of ML run.
- ML: The spectrum of data analytics, data science, and machine learning that covers designing statistical/probabilistic models vs. traditional deterministic algorithms/programs. In ML, data is logic, and some of the product features are implemented using statistical models.
- Dev: Developers knit an ML model seamlessly into the rest of the product, and continuously develop-test-deploy code to achieve business goals. They apply the rigor of software engineering principles to design, develop, test, evaluate, and maintain software systems. They scale a model for mass consumption.
- Ops: Operations (DevOps or DevSecOps) is the discipline of continuous integration and continuous delivery/deployment (CI/CD). In the case of ML, it becomes CT/CI/CD: continuous model training, integration, and delivery/deployment. The aim is to automate the process of training models, integrating and packaging them into software services (typically docker containers), deploying them on the cloud, monitoring their performance in production (e.g. catching concept/data drift), firing alerts in case of issues, triggering rollbacks or retraining as and when needed.
Here is the list of my published articles:
[Data] Date Engineering
- Scalable Efficient Big Data Pipeline Architecture on Cloud
- SQL vs. NoSQL Database: When to Use, How to Choose
- 12 Ways to Apply a Function to Each Row in Pandas DataFrame
[ML] Machine Learning
- An Engineer’s Trek into Machine Learning
- Actionable Insights from 4 Types of Data Analytics
- How to build Python transcriber using Mozilla DeepSpeech
- Speech Recognition with Python
- Indic Language Stack for Voice Assistants and Conversational AI
[Dev] Software Engineering
- Software Architecture Design and Engineering at a Startup
- Python Microservice Tutorial: part 1, part 2, part 3, part 4
- Top 10 Programming Languages Portfolio for 2022
[ML4Devs] Machine Learning for Developers — Biweekly Newsletter
- Issue 3 [11 Feb 2022]: To be agile, or not to be
- Issue 2 [28 Jan 2022]: ML Model Testing
- Issue 1 [15 Jan 2022]: Machine Learning for Developers