Hello and welcome to our first bi-weekly reading group kick-off! We’ll be reading Operationalizing Machine Learning: An Interview Study over the next two weeks. This topic is to help get us started, but please feel free to create your own topics as you read!
Some getting started prompts:
- What is the significance of this article?
- How does this article impact your current workflow?
- What’s one idea from the article that has changed the way you think?
- What’s something new that you learned from this article?
- Why should someone else read this article?
hi y’all! I hope everyone is off to a great start this week. I’m really struggling to find time to sit down and read this paper in one sitting, so I thought I’d break it up into smaller pieces and post as I go – definitely hop in on the conversation, because I’d love to know what you think as you read!
I’ve made it through the introduction of the paper, which was fascinating! it really reminded me of when I was working as a data scientist and trying to figure out how to do all of these things on my own, as a team of one
even though this paper focuses on MLEs (machine learning engineers) I’m curious if any of you who are not MLEs have any experience with some of the practices outlined in the introduction?
I went over the paper last week. I plan to read it in more detail because I forget details of a paper, book, or movie if I do not purposefully take notes to remember.
The introduction and the rest of the paper made me relieved. The machine learning process in my team requires significant human-in-the-loop processes, especially during the construction of the training dataset. It always bothers me. I am always looking for a solution to minimize the effort. Recently, I introduced an anomaly detection algorithm that reduced the time spent significantly. Anyway, learning that many machine learning teams deal with that kind of problem makes me feel better.
I am planning to share my further thoughts later on.
yes! I completely agree - I feel like sometimes we’re hard on ourselves when we use human-in-the-loop processes, because it feels like everything should be automated and effortless
after reading through the related work and methods sections, I feel like I have a better handle on just how vast the problem space is for MLOps, and how challenging it is to find a balance between creating tools that work for most people and creating tools to solve a very specific problem.
I’m really excited to dig in to the findings section next!
I read the paper and saw much that seemed familiar. There are several points where my group has had to deal with similar problems and some problems that were different. For context, we are working on a battery-powered, wearable device to detect when people fall. The data is from an integrated circuit inertial measurement unit (IMU). Here are some things that are different and worth mentioning.
- Adding a rule-based layer on the front-end
ML processing uses energy, and we need to limit the number of times we invoke ML processing. In our case, the rule-based layer occurs before ML processing to minimize the energy expended on false alarms.
- Getting good labeled data is sometimes tough
Some university databases of activities of daily living help you understand what you don’t want to detect. You can also get data from martial arts folks tossing each other around – generally young and fit people who know how to fall. However, data sets of older adults falling are not readily available. The old folks are the people at greatest risk.
- You may need to make your system observable
We have had to add telemetry to send back IMU data so we could learn about actual falls with real people. You may be surprised to learn that jumping off a horse and falling can look similar. Unfortunately, sending lots of IMU data back also takes a lot of energy. And you want to see actual falls and fall-like behaviors.
- Balance between data fidelity and performance
Actual data is sampled and comes in bits. There is a tradeoff to be made in terms total amount of data (sample rate * the number of bits) and energy. More samples with finer resolution are really useful but expensive in terms of energy.
this is really interesting, thank you for sharing! in a lot of ways you’ve highlighted how there are general approaches to MLOps, but (maybe?) always a need for some level of personalization based on what you’re working on.
I always think that we’re absolutely swimming in data, but of course that doesn’t mean data for everything! I imagine that getting data around older adults falling is difficult, and I’m curious if there’s anything kinesthetically different in falls based on age.