Tags
adaptable, adaptible, AI, Artificial Intelligence, aws data engineer, AWS EMR, aws glue, aws ml engineer, aws redshift, champion, Claude, context, CoPilot, critical thinking, data engineer, Derivatives, DuckDuckGo, Google Data Engineer, Harvard Business Review, HBR, HuggingFace, Latent Dirichlet Allocation, lda, Linear Discriminant Analysis, machine learning, Machine Learning Engineer, mice, ML Engineer, Multivariate Imputation by Chained Equations, one hot encoding, OpenAI, Partial Derivatives, PCA, principal component analysis, roi, stakeholder, team work
NOTE:
First, this is not an Academic or Theoretical paper. Nor is this specifically about day-to-day efforts, per se, of a D.E. (Data Engineer) or ML (Machine Learning) Engineer. It is mainly about the path to Blasted Good Results & SUCCESS… using strategy and tactics.
Primarily, this material is for junior / mid-level folks. Even for many of this cadre, some (or maybe all) of the content is only a refresher for those who are experienced and/or have undertaken a lot of digging and research.
For this little article, I also list 5 Caveats regarding these two (2) roles.
BUT. The “Caveats” are ‘not’ restricted only to these roles. They can be applied to ANYTHING else – ANY role for ANY work out there.
CAVEATS 1 – 5:
- deep dive studying costs
- deep dive study time
- SIMPLE
- Troubleshooting
- Context (I also talk a bit about Analytical & Strategic efforts via my MBA & role as a Federal Intelligence Analyst/Officer)
This paper is not about any kind of Diva posturing such as someone uttering (or exhibiting) “I’m better than everyone else”.
Two of the primary areas are on Teamwork and Critical Thinking, in addition to the Caveats.
- Teamwork is the primary here, with a good, successful end result for all, along with
- Critical Thinking: you ABSOLUTELY must do a great deal in this area, in any effort you do – D.E., ML Engineering, Cloud Architect/Engineer – AS MUCH as possible
- In addition to Critical Thinking, you SHOULD be skeptical about any and all solutions.
— NOTE: This bit on skepticism came from Harvard Business Review some time ago. - Not in the manner of “My answer is the best one and the only one” — NO. That is not how it should be.
You should be skeptical in the manner of, “is this the best solution“?
— End of HBR piece… Now… back to my content.
SOLUTION CHOICES:
Is there a better solution? Is there an easier solution? Should I chat with a team mate and see if there ‘IS’ a better solution. Will a better solution save time AND money? Will a better solution make the process easier?
Or, do some research and find out. Looking at your solution ‘only’ may not be the best.
And just because someone in authority, with their power, attempts to say their solution is the right, best solution. It may not be… In that kind of scenario, talk to that recalcitrant leader (or team Diva) in private — DO NOT do it publicly if at all possible. Having a private side bar keeps you from unintentionally embarrassing or humiliating that leader (or team mate)… Or you in turn, being embarrassed for questioning them. - at the very least, ‘TRY’ critical thinking… and be skeptical too…
- In addition to Critical Thinking, you SHOULD be skeptical about any and all solutions.
Of course, there are more strategic experts out there who does strategic thinking for a living, I am not (‘yet‘) anywhere near that level. I am only bringing it to the forefront for D.E.’s and ML Engineers to really ponder over, in using these skills more on a daily basis.
To begin with, you have to think and consider how all of your efforts will be conceived at various levels – from Executive to Staff to User levels.
You need to understand all of what you are doing from the perspective of those various levels in order to be successful. The sad part is, not everyone has that kind of background/experience to do that kind of thinking.
But if willing and interested there are many books (and courses) out there for you to at least try to gain these kinds of insights, to help you be even more successful in your career. The three levels I mentioned:
Executive
- Business success overall – ROI (return on investments), budgets, corporate brand & reputation and success.
- This level basically contains all of the stakeholders/champions that may be necessary to approve or jettison projects.
- These leaders want to see results at their company/agency that will make their outfit shine with high hopes of bringing in more business (and accolades).
- Some of these leaders may not care ‘too’ deeply how you succeed, as long as it is ethical and meets all the various compliance guidelines for whatever business the company is doing. (And maybe even saving ‘more’ budget costs.)
- And let’s not forget, these leaders ALL care that the work remains consistently within specified budgets.
Staff (managers, supervisors, tech folks)
- Day-to-day, hour-by-hour efforts – diving deep as possible to ensure that each piece of the pie (hardware & software) is working as it should. With fail safes built in, in case there are snags (failures) in the work that you and your team are doing. Of course, this is a no-brainer for anyone in the tech world.
- You definitely want to know how everything works in order to make the executive leadership content in your efforts to make the shareholders satisfied along with the users being satisfied and willing to return to do more business with your company / agency.
User
- This is the bread and butter of it all. The users – staff within the company OR customers of the company – they primarily ONLY care if your efforts help them. Or makes them angry enough to cut off any association with your company…
- The internal staff want tools that will possibly help them become rock stars at whatever work they are doing.
- Customers, well, they want software (web pages, research sites, stock trading, etc.) that will make their lives easier – saving time, money and removing stress whenever possible
- at the very least, ‘TRY’…
Combine Data Engineer with ML & AI efforts
Strategically, you ‘should’ learn and know enough coding in order to review and / all of the AI generated Python and SQL code content you desire – that you created via prompts or agents. I am talking about more than just one or two 20-minute youtube videos… but if you have done some serious studying in these areas PLUS watching several youtube videos – those videos only help compound and cement that knowledge in your head. As long as you are applying some of the video content.
- Most all, you MUST be adaptive — HIGHLY ADAPTABLE in learning multiple topics, from basic to complex. And if you are afraid of something that is not clear (as in working on something like “Min-Max Scaler” or “Robust Scaling [Z-Score might be better]” to handle outliers) — just ask someone and learn, earnestly…
- Smarter, team oriented folks will gladly share knowledge in helping ‘YOU’ and the team succeed.
And in conjunction with the code, you ‘should’ learn as much as possible in the various fields (D.E., ML, AI) in order to make yourself, not indispensable but in demand, to handle multiple aspects/areas of the work you are doing.
CAVEAT 1 deep dive studying costs:
Bleeding savings… no lie, a huge factor for everyone. This matters and it hinges on how you are doing your studying to shift over to D.E. and/or ML Engr.
If you are going this route while working full time and commuting (and likely have a partner) — it likely means you are:
- studying at night and weekends, a path that will take far, FAR longer to reach a goal of becoming qualified enough to jump into one or both of the roles. The cost here – is ‘time‘…
But. If you are going FULL BORE in doing the studying FULL TIME, all day, every day until you believe you are qualified enough to switch gears and jump into one or both of those roles — well…. You had best buckle up, because you will be living on your savings (and your mates’ as well if you are part of a couple).
You had best be committed in this work role goal… to achieve the financial goals you desire to make up for what you lost in earnings during this deep dive.
My savings were/are drying up for me, but I was determined in continuing my FULL BORE deep dive technical sabbatical studies.
This was due to making a VERY serious errant, miscalculation (2025 ML Engr journey attempt) prior to this current administration taking over.
Of course, I had no prescient knowledge of how bad the market would become — none of us did… This economy is NO joke…
- See the “background on me” section below.
CAVEAT 2 deep dive study time:
Now, be aware of the time you HAVE to put in for your deep dive studying.
“IF” you truly are going the FULL BORE deep dive studying — this WILL mean a LOT of extra research / studying time and it WILL drain you.
Add to that, a significant loss of time with your mate while you are doing this deep dive. This is not a cake walk and you cannot simply just order a D.E. or ML Engr course, zip through it and attempt to gain the certification.
- IT – AIN’T – THAT – EASY…!!!
You WILL be:
a) spending a lot of time in learning the material and doing the various labs (especially if you screw up on a step and have to repeat everything to see where you missed something). And doing a ‘little’ D.E. project (via AWS), it can become convoluted if you do not want to spend much of your own money (AWS resources).
I did an AWS project but I veered and used low, no cost other open-source tools to complete the project.
b) cross-checking that course material content via other sources, for your knowledge, to see if it registers and sinks into your brain – OR to see if that instructor was correct…
c) using additional courses and videos to hammer in the knowledge. And it could be because the course instructor has a different manner of teaching. You may not understand the instructor’s method as easy as others. You may not even like the speaking mannerisms of that instructor…
All the while hoping that that “new” knowledge remains IN your brain as you continue on your journey – because, there will be a LOT more material to learn. This is in either arena but in the ML Engr world, there is a GREAT deal more to learn. Trying to recall something you learned six (6) months ago and have not used whatever algorithm or sklearn tool too often – well “that-ain’t-easy” either.
I’ve a great memory and I am not that worried about learning new content to put to use. BUT… yep, another ‘but’ – if we are not working with something, using our brain, that content becomes rusty, FAST.
But it still dwells within both hemispheres of your brain – you will just have to review and dredge it up again.
That is the king (or queen) of massive studying – ‘TIME‘. Time is not a forgiving adversary. Once you spend it on something, it is not easy to recoup it because you cannot gain time back….
However, Tactically, trying to learn all of the various code tools out there today is becoming very difficult. For example, when I started learning more Python, I chatted with a couple of guys who “were” Python experts for their advice.
- They both told me, in order to become a Python expert, it would take 1 – 2 years of coding to do that… and that was just for Python
Today, there are many folks out there working to have AI generated content to complete (or do) their work.
But, once you get your generated output, you will have to be able to review the output to see if it is the data content results you wanted. You cannot just slough it all off (delegation) to AI….
- YOU are the one accountable for that content in order to achieve task success for you and your team.
And today, folks out there in the AI world, or most of them, should already know that ‘only’ settling in on one AI system (OpenAI, Claude, HuggingFace, DuckDuckGo, CoPilot, etc.) would be a fatal mistake.
No AI system is foolproof yet, there WILL be errors. There are AI systems out there that may not be updated to what you are looking for.
- I’ve seen that more than once – I asked one for something, it came back as a failure – I asked another AI system, it came back with what I was looking for. Went back to the failed AI system, stated the ‘newer’ content and THAT system corrected itself.
For me, I will not take any AI generated output on its own without my own background knowledge, in conjunction with comparing the original AI content against one or two other AI systems.
Still speaking in this tactical arena, you have to learn many, many, many tools ‘IF’ you are working in both arenas of D.E. and ML Engineer. Learning it for the first time is tough and if you are not using the tools, it becomes very, VERY tough to retain all of the knowledge. In learning many of the various D.E. / ML Engineer tools, in AWS alone, is mind-boggling. No one can learn all of the various tools, especially if they are not using them. Here are a few.
For ML Engineer:
- Take MICE for example – that is “Multivariate Imputation by Chained Equations”, used to fill in multiple missing values in contiguous columns. Think you will remember that if you do not use it…?
- How about PCA (Principal Component Analysis), an “unsupervised” algorithmic tool for reducing the dimensionality of data while preserving as much variance as possible…?
- Why not use LDA (Linear Discriminant Analysis), a ‘supervised’ algorithm, it also reduces the amount of data to be processed using dimensionality reduction techniques
- but NOT the other LDA (Latent Dirichlet Allocation), another “unsupervised” algorithmic tool which is primarily used to identify a specified number of topics within a set of text documents….
- As well as using Sagemaker and the many various components and jobs it can do, in pre- and post-training such as checking bias or performing ML pipelines.
- And this is ONLY the tip of the iceberg (not iceberg storage okay…)
For Data Engineer:
- Working with AWS EMR and AWS Glue (serverless).
- Working with Redshift or AWS Kinesis (Firehose, Data Stream, Managed Service for Apache Flink [real-time], Managed Streaming for Apache Kafka).
- Working with One Hot Encoding sometimes for ETL workflows, even though it is more for ML Engineers.
- Etc., etc., etc…..
Primarily, building data pipelines and cleaning data so that it can be used by all without areas like data quality or inconsistency issues.
Overall, in day-to-day work, in the world of strategy and tactics, I tend to incorporate various areas of my previous knowledge / work into what I might be currently working on.
There is a lot of cross-over knowledge you can take from one job to another to be used – the “dual-purpose” effect in different work environments (as long as there are no conflicts of interest and/or NDAs involved….). A lot of that knowledge in my case came from various areas (see “Previous work roles” section below).
To be successful, one has to be able to see and comprehend what is in front of them, using as much information that you can gain and be able to connect the dots of what goes where – for here and down the road (visualization kicks into gear at that point).
You have to have some pretty grounding in work experience in different levels or at least exposure to, to gain insights into what is happening.
This is where being Strategic comes into play along with thinking outside the cube (not box, not for me).
- You HAVE to be creative as well. Of course that comes with how much leeway you can grab… If the leadership is not really visionary, then you are kind of stuck on being highly traditional – unless you are one heck of a convincing, persuasive talker…?
CAVEAT 3 SIMPLE:
Whatever creation you are working on, infrastructure, algorithms, pipelines, etc., no matter what, “keep it as simple as possible”. You do not want too much complexity.
Complex works of art that MIGHT look pretty visually (or in trying to be impressive) can become cost hogs; needlessly creating problem solving issues and at times – making any job/task run longer.
This is when thinking creatively and outside the cube comes in handy.
Try to come up with solutions that have potentially high levels of success, while cutting as many corners as possible, saving money & time. But, in the process, you have to ensure you HIT ALL the necessary markers & compliance milestones.
Trust me, you do NOT want to skip compliance measures. You CANNOT cheat and be successful (legally) at the same time – that is, unless you are one of those with very, VERY poor Ethical boundaries.
CAVEAT 4 Troubleshooting:
There is and will always be “troubleshooting” involved in whatever technical work you undertake. The “WONDERFUL” detailed and sometimes VERY deep dives into solving problems and fixing whatever errors you come across. Always a fun time. For me, it actually is at times.
It may appear frustrating, which it is and will be. Whereas for me, that is just my determination to fix an issue and prevent it from occurring in the future. I may appear angry but it is just the determination, driving me on….
CAVEAT 5 Context:
Last but not least — CONTEXT. And it is a bit tricky.
Using ‘context’ depends on multiple variables.
- One of the variables is, it depends on how much real world experience you have. How much from how many varying work life situations have you been engaged in?
NOTE: Yeah, this can also encompass your time in college with differing class mates and the multiple varying courses — to gain VERY different perspectives from others.- That you can draw from in crafting whatever work you are doing right now.
- That you can draw from in order to do more comprehensive/complete AI prompts to gain a better answer than a verbose, rambling answer that may STILL not get you what you are looking for.
How much experience of differing situations have you benefited learning from?
- Were you able to learn business/work lessons from your different roles, from different levels of responsibilities?
- And are able to translate that knowledge from your past roles to current roles?
- Are you able to truly see how successful your work efforts could be at a staff member level? At a Manager level? Or even from an Executive level?
- Have you ever intermingled those different past experiences in order to see from MULTIPLE perspectives, what you need in order to do your job and achieve the expected outcome — or BETTER?
- Or to visualize how you could prevent mistakes or issues from occurring?
- Or visualize what ensuing ramifications might occur from ‘making’ a mistake and how you would/could correct/prevent the mistake?
If you do not have enough experience or information to visualize a solid working context, you will need to figure out a way to plug the holes, fill in the cracks. But in doing so, the information has to be as rock solid as possible.
You have to know as much as possible about whatever you are working on. Or at least as much as you can find or draw out of your leaders and other collaborative teams (or team mates).
You cannot work on something without being aware of any underlying snags (if you can that is, this is for anyone) or possible crippling branches off the path.
Context is a BIG DEAL in everything you do in life, especially where you are making a living…. If you have a good variety of work roles from the past that are highly complementary (hopefully) but also has some very good diversity (i.e.: doing intelligence work at one place or cybersecurity efforts with individuals overseas or overseeing a nationwide help desk or managing a data network) — you can do a lot of intermixing of experiences and knowledge….
Background on me:
My “Strategic“ move to ML Engineering failed, due to that errant miscalculation I mentioned. I spent 2025 completely focused on learning and earning both of the AWS ML Engineer certifications and attempting to find an ML Engineer role. And now, making a run back to AWS D.E.
Late December 2024 time frame.
AFTER my AWS D.E. success — I had believed that if I spent most of 2025 in doing this deep dive study technical sabbatical, chasing those two AWS ML Engineer certifications – I would not have a problem switching over from AWS D.E. to ML Engineer.
But the late 2025 job market (a repeat of late 2023 when attempting to find a Google D.E. [GCP Pro D.E.] role after gaining that cert) well, sheesh…, the job market “again”, was rather lame.
Several months of job searches wasted. Gone. Each time :
- late 2023 (GCP D.E.) and
- late 2025 (AWS ML Engr)
— — And I wish a LOT of luck to those many out there searching for more than 6 months at a time.
To top it off, there are the many MORE experienced ML Engineers out there looking for work that I cannot compete with. They have hands-on work experience, I did not.
This was even after delving into learning things such as Derivatives and Partial Derivatives (had not touched this stuff since college [late bloomer, Marines came first] in my advanced math and physics courses) and all of the other deep ML knowledge.
That ML journey came about after 2023/2024, learning and becoming certified in Google D.E. and AWS D.E. then starting work at Deloitte as a mid-level D.E. / manager (which I resigned from right away with regret – I should have taken up Booz Allen’s offer).
Now…. Now, my current strategy is in (scrambling) refreshing my 2024 D.E. knowledge and returning to a D.E. role.
In the meantime, in this job market, like many others – bleeding savings…
Previous work roles:
- AWS Cloud work (up to Tier III) and previous cyber security work in the past at an agency
- intelligence work/leadership role at another agency
- having learned to read, speak and write in other languages (college: Arabic, Japanese, Chinese & self-taught Russian) PRIOR to becoming a federal Intelligence Analyst/Officer did not hurt either
- point person at INS for cyber audits and one-on-one with Regional Commanders
- point person at State Dept in doing cybersecurity efforts at embassies overseas and one-on-ones with the Ambassador, with the Dep. Chief of Mission sitting in:
- my best visit was at one embassy (Morocco) where the Exec Secretary asked me beforehand to cut my 1-hour session short, to 20 minutes if possible
- that Ambassador kept me there for 1 hour & 20 minutes because he liked it and wanted to get into the weeds (the DCoM was lost)
- afterwards, the number 3 asked me if I knew that the Ambassador was G. W. Bush’s college roommate (I do not get impressed by that kind of thing – I just go by the mutual respect and collegial work efforts with anyone, famous or not)
- Would it impress you if I said that at one meeting, I once sat next to the Chief — head of a mega-agency? Would it?
- No, it would not matter to me either. It was just another close hold work effort, with 12 of us in the room.
- my best visit was at one embassy (Morocco) where the Exec Secretary asked me beforehand to cut my 1-hour session short, to 20 minutes if possible
- corporate data/network communication hands-on chief in the commercial world and
- plugging in as much of my strategic (MBA) / tactical thinking into everything I do.
So, strategically and tactically speaking – In my D.E. & M.L. certifications journeys over the past couple of years, my goal was to attempt learn as much as possible in D.E. & M.L. in order to do the work – not necessarily to be the top expert – but to have a clear vision of what:
a) was going on in order to UNDERSTAND the job,
b) was actually needed to ‘DO’ the job at hand &
c) the necessary, successful output should be and/or look like
Originally, this Big Data Journey for me started in 2021 when I was brought in for an AWS Data Analytics role – I stood up an AWS Redshift data warehouse and then taught myself SQL AND Python in order to help the team’s Data Scientists who did ‘not’ know Python to work on Redshift.
That contract was cancelled later that year, RIGHT AFTER I earned the AWS Data Analytics cert…. The agencies involved settled on NOT sharing their data with the other agencies and wanting to maintain their own data silos…
Waste, what a waste that was. In sharing the data, they could have had more collaborative successes.
While working in Intelligence and running the Intel / Cyber Watch standers, I had access to a great deal of data. Because of that access, it allowed me to connect more dots globally. That kind of access ended when folks like Edward Snowden did what he did. CIA changed the caveat access, it reduced access to data for MANY of us…
That collaborative knowledge had allowed me to have a very good view point of what was happening in associated countries and/or other organizations around the globe.
So, data, data is a big deal to me. Good, comprehensive, clean data. But yes, it takes quite an effort to have clean, comprehensive data – data that shows what many individuals from many areas what they want to see in the various lenses they are using.
You have to use your strategic and tactical knowledge to put together data that means something in achieving that end goal.
Another example – my team was also responsible for putting together daily charts (PPT) for the 2, 3 & 4 stars and equivalent executives at the Pentagon. Here is the catch, some of those execs like to see their data in a format they like. My 1 star had instructed us to do high level charts to make the readability much better for those execs. But there were a few execs who wanted deep dive charts (more than what fits in the footnotes) – so I had my teams create high level and deep dive versions for whenever an exec wanted more – and there are more than you think out there who wants a very good grasp of what is happening – with more data….
Basically, you have to know your audience. You:
- Need to know what kind of data is coming in
- MUST know what the desired output is supposed to be
- For multiple levels of audiences
- Definitely ‘gotta’ know how to clean up bad or missing data
Lastly, I continually attempt to broaden my scope of insights for many years, along with attempting to undertake roles that were/are highly complementary to each other. In doing the complementary work roles, the work effort knowledge becomes extended due to each one enhancing the other, causing one to think more broadly or even more narrowly at times when needed. Without going down a rabbit hole, which I had been very close to a few times in the past. Just had to force myself to back off and back out of whatever I was doing at the time.
Ultimately, it takes a lot of mental effort for all of the above to play out.