Home/Careers/Data Scientists
technology

Data Scientists

Develop and implement a set of techniques or analytics applications to transform raw data into meaningful information using data-oriented programming languages and visualization software. Apply data mining, data modeling, natural language processing, and machine learning to extract and analyze information from large structured and unstructured datasets. Visualize, interpret, and report data findings. May create dynamic data reports.

Median Annual Pay
$108,020
Range: $61,070 - $184,090
Training Time
4-5 years
AI Resilience
🟔AI-Augmented
Education
Bachelor's degree

šŸŽ¬Career Video

šŸ’”Inside This Career

The data scientist extracts insights from data to inform business decisions—a role that emerged from the intersection of statistics, computer science, and domain expertise. A typical week involves exploring datasets, building and validating models, presenting findings to stakeholders, and collaborating with engineers on deploying solutions. Perhaps 40% of time goes to data preparation—the cleaning, transforming, and wrangling that consume more time than the glamorous modeling work. Another 30% involves analysis and modeling: applying statistical and machine learning techniques to extract patterns and predictions. The remaining time splits between communication—translating technical findings for business audiences—and collaboration with engineering teams. The work requires moving between coding, statistical thinking, and business context.

People who thrive in data science combine technical skills with curiosity and the ability to communicate findings to non-technical audiences. Successful data scientists develop the humility to question their own analyses and the persistence to work through messy data problems. They build relationships with business stakeholders that ground technical work in real needs. Those who struggle often focus too heavily on sophisticated techniques while neglecting the data quality and problem framing that determine whether analysis is useful. Others fail because they cannot communicate findings accessibly, producing impressive models that nobody acts upon. Burnout affects those who cannot manage the expectation that data science should magically produce insights from any data.

Data science has produced practitioners who shaped the field, from academic statisticians who developed methods to industry figures who built data organizations at major technology companies. DJ Patil, credited as the first Chief Data Scientist of the United States, elevated the profession's visibility. The role appears in contemporary culture as synonymous with modern technology work, though specific data scientists rarely achieve public recognition. The profession benefits from and suffers from AI hype that inflates expectations.

Practitioners cite the satisfaction of discovering patterns in data and seeing analyses influence decisions as primary rewards. The compensation for data scientists is strong, particularly in technology and finance. The intellectual variety—different problems require different approaches—prevents monotony. The intersection of technical work and business impact appeals to those who want both rigor and relevance. Common frustrations include the data quality issues that consume time before any analysis begins and the organizational resistance to data-driven decision making despite rhetoric to the contrary. Many resent the expectations that data science can answer any question when the reality involves substantial uncertainty. The rapid evolution of the field requires continuous learning.

This career typically develops through graduate education in statistics, computer science, or quantitative fields, though boot camps and self-study provide alternative paths. Python, R, and SQL skills are foundational, with machine learning and domain expertise building on this base. The role suits those who enjoy quantitative problem-solving and can tolerate the ambiguity of working with imperfect data. It is poorly suited to those who need clear answers, find data preparation tedious, or struggle to communicate with non-technical audiences. Compensation is strong but varies by industry and company, with technology companies and finance offering the highest salaries.

šŸ“ˆCareer Progression

1
Entry
0-2 years experience
$75,614
$42,749 - $128,863
2
Early Career
2-6 years experience
$97,218
$54,963 - $165,681
3
Mid-Career
5-12 years experience
$108,020
$61,070 - $184,090
4
Senior
10-20 years experience
$135,025
$76,338 - $230,113
5
Expert
15-30 years experience
$162,030
$91,605 - $276,135
Data source: Levels.fyi (exact match)

šŸ“šEducation & Training

Requirements

  • •Entry Education: Bachelor's degree
  • •Experience: Several years
  • •On-the-job Training: Several years
  • !License or certification required

Time & Cost

Education Duration
4-5 years (typically 4)
Estimated Education Cost
$53,406 - $199,410
Public (in-state):$53,406
Public (out-of-state):$110,538
Private nonprofit:$199,410
Source: college board (2024)

šŸ¤–AI Resilience Assessment

AI Resilience Assessment

Moderate human advantage with manageable automation risk

🟔AI-Augmented
Task Exposure
Medium

How much of this job involves tasks AI can currently perform

Automation Risk
Medium

Likelihood that AI replaces workers vs. assists them

Job Growth
Stable
0% over 10 years

(BLS 2024-2034)

Human Advantage
Moderate

How much this role relies on distinctly human capabilities

Sources: AIOE Dataset (Felten et al. 2021), BLS Projections 2024-2034, EPOCH FrameworkUpdated: 2026-01-02

šŸ’»Technology Skills

PythonSQLTensorFlow/PyTorchCloud ML platforms (AWS SageMaker, GCP)Apache SparkJupyter notebooksTableau/visualization toolsGit

šŸ·ļøAlso Known As

Analytics ConsultantApplied ScientistData AnalystData Analytic ScientistData Analytics ManagerData Analytics ScientistData Analytics SpecialistData ArchitectData ConsultantData Economist+5 more

šŸ”—Related Careers

Other careers in technology

šŸ’¬What Workers Say

54 testimonials from Reddit

r/datascience2753 upvotes

Data Science Has Become a Pseudo-Science

I’ve been working in data science for the last ten years, both in industry and academia, having pursued a master’s and PhD in Europe. My experience in the industry, overall, has been very positive. I’ve had the opportunity to work with brilliant people on exciting, high-impact projects. Of course, there were the usual high-stress situations, nonsense PowerPoints, and impossible deadlines, but the work largely felt meaningful. However, over the past two years or so, it feels like the field has taken a sharp turn. Just yesterday, I attended a technical presentation from the analytics team. The project aimed to identify anomalies in a dataset composed of multiple time series, each containing a clear inflection point. The team’s hypothesis was that these trajectories might indicate entities engaged in some sort of fraud. The team claimed to have solved the task using ā€œgenerative AIā€. They didn’t go into methodological details but presented results that, according to them, were amazing. Curious, nespecially since the project was heading toward deployment, i asked about validation, performance metrics, or baseline comparisons. None were presented. Later, I found out that ā€œgenerative AIā€ meant asking ChatGPT to generate a code. The code simply computed the mean of each series before and after the inflection point, then calculated the z-score of the difference. No model evaluation. No metrics. No baselines. Absolutely no model criticism. Just a naive approach, packaged and executed very, very quickly under the label of generative AI. The moment I understood the proposed solution, my immediate thought was "I need to get as far away from this company as possible". I share this anecdote because it summarizes much of what I’ve witnessed in the field over the past two years. It feels like data science is drifting toward a kind of pseudo-science where we consult a black-box oracle for answers, and questioning its outputs is treated as anti-innovation, while no one really understand how the outputs were generated. After several experiences like this, I’m seriously considering focusing on academia. Working on projects like these is eroding any hope I have in the field. I know this won’t work and yet, the label generative AI seems to make it unquestionable. So I came here to ask if is this experience shared among other DSs?

r/datascience2082 upvotes

My Data Science Manifesto from a Self Taught Data Scientist

**Background** I’m a self-taught data scientist, with about 5 years of data analyst experience and now about 5 years as a Data Scientist. I’m more math minded than the average person, but I’m not special. I have a bachelor’s degree in mechanical engineering, and have worked alongside 6 data scientists, 4 of which have PHDs and the other 2 have a masters. Despite being probably, the 6th out of 7 in natural ability, I have been the 2nd most productive data scientist out of the group. **Gatekeeping** Every day someone on this subreddit asks some derivative of ā€œwhat do I need to know to get started in ML/DS?ā€ The answers are always smug and give some insane list of courses and topics one must master. As someone who’s been on both sides, this is attitude extremely annoying and rampart in the industry. I don’t think you can be bad at math and have no pre-requisite knowledge, and be successful, but the levels needed are greatly exaggerated. Most of the people telling you these things are just posturing due to insecurity. As a mechanical engineering student, I had at least 3 calculus courses, a linear algebra course, and a probability course, but it was 10+ years before I attempted to become a DS, and I didn’t remember much at all. This sub, and others like it, made me think I had to be an expert in all these topics and many more to even think about trying to become a data scientist. When I started my journey, I would take coding, calculus, stats, linear algebra, etc. courses. I’d take a course, do OK in it, and move onto the next thing. However, eventually I’d get defeated because I realized I couldn’t remember much from the courses I took 3 months prior. It just felt like too much information for me to hold at a single time while working a full-time job. I never got started on actually solving problems because the internet and industry told me I needed to be an expert in all these things. **What you actually need** The reality is, 95% of the time you only need a basic understanding of these topics. Projects often require a deeper dive into something else, but that's a case by case basis, and you figure that out as you go. For calculus, you don't need to know how to integrate multivariable functions by hand. You need to know that derivatives create a function that represents the slope of the original function, and that where the derivative = 0 is a local min/max. You need to know integrals are area under the curve. For stats, you need to understand what a p value represents. You don't need to know all the different tests, and when to use them. You need to know that they exist and why you need them. When it's time to use one, just google it, and figure out which one best suits your use case. For linear algebra, you don't need to know how to solve for eigenvectors by hand, or whatever other specific things you do in that class. You need to know how to ā€˜read’ it. It is also helpful to know properties of linear algebra. Like the cross product of 2 vectors yields a vector perpendicular to both. For probability, you need to understand basic things, but again, just google your specific problem. You don't need to be an expert software dev. You need to write ok code, and be able to use chatGPT to help you improve it little by little. You don't need to know how to build all the algorithms by hand. A general understanding of how they work is enough in 95% of cases. Of all of those things, the only thing you absolutely NEED to get started is basic coding ability. By far the number one technical ability needed to 'master' is understanding how to "frame" your problem, and how to test and evaluate and interpret performance. If you can ensure that you're accurately framing the problem and evaluating the model or alogithm, with metrics that correctly align with the use case, that's enough to start providing some real value. I often see people asking things like "should I do this feature engineering technique for this problem?" or ā€œwhich of these algorithms will perform best?ā€. The answer should usually be, "I don't know, try it, measure it, and see". Understanding how the algorithms work can give you clues into what you should try, but at the end of the day, you should just try it and see. Despite the posturing in the industry, very few people are actually experts in all these domains. Some people are better at talking the talk than others, but at the end of the day, you WILL have to constantly research and learn on a project by project basis. That’s what makes it fun and interesting. As you gain PRACTICAL experience, you will grow, you will learn, you will improve beyond what you could've ever imagined. Just get the basics down and get started, don't spin your wheels trying and failing to nail all these disciplines before ever applying anything. The reason I’m near the top in productivity while being near the bottom in natural and technical ability is my 5 years of experience as a data analyst at my company. During this time, I got really good at exploring my companies’ data. When you are stumped on problem, intelligently visualizing the data often reveals the solution. I’ve also had the luxury of analyzing our data from all different perspectives. I’d have assignments from marketing, product, tech support, customer service, software, firmware, and other technical teams. I understand the complete company better than the other data scientists. I’m also just aware of more ā€˜tips and tricks’ than anyone else. Good domain knowledge and data exploration skills with average technical skills will outperform good technical skills with average domain knowledge and data exploration almost every time. **Advice for those self taught** I’ve been on the hiring side of things a few times now, and the market is certainly difficult. I think it would be very difficult for someone to online course and side project themselves directly into a DS job. The side project would have to be EXTREMELY impressive to be considered. However, I think my path is repeatable. I taught myself basic SQL and Tableau and completed a few side projects. I accepted a job as a data analyst, in a medium sized (100-200 total employees) on a team where DS and DA shared the same boss. The barrier to DA is likely higher than it was ~10 years ago, but it's definitely something achievable. My advice would be to find roles that you have some sort of unique experience with, and tailor your resume to that connection. No connection is too small. For example, my DA role required working with a lot of accelerometer data. In my previous job as a test engineer, I sometimes helped set up accelerometers to record data from the tests. This experience barely helped me at all when actually on the job, but it helped my resume actually get looked at. For entry level jobs employers are looking for ANY connection, because most entry level resumes all look the same. The first year or two I excelled at my role as a DA. I made my boss aware that I wanted to become a DS eventually. He started to make me a small part of some DS projects, running queries, building dashboards to track performance and things like that. I was also a part of some of the meetings, so I got some insight into how certain problems were approached. My boss made me aware that I would need to teach myself to code and machine learning. My role in the data science projects grew over time, but I was ultimately blocked from becoming a DS because I kept trying and failing to learn to code and the 25 areas of expertise reddit tells you that you need by taking MOOCs. Eventually, I paid up for DataQuest. I naively thought the course would teach me everything I needed to know. While you will not be proficient in anything DS upon completing, the interactive format made it easy to jump into 30-60 minutes of structured coding every day. Like a real language consistency is vital. Once I got to the point where I could do some basic coding, I began my own side project. THIS IS THE MOST IMPORTANT THING. ONCE YOU GET THE BASELINE KNOWLEDGE, JUST GET STARTED WORKING ON THINGS. This is where the real learning began. You'll screw things up, and that's ok. Titanic problem is fine for day 1, but you really need a project of your own. I picked a project that I was interested in and had a function that I would personally use (I'm on V3 of this project and it's grown to a level that I never could've dreamed of at the time). This was crucial in ensuring that I stuck with the project, and had real investment in doing it correctly. When I didn’t know how to do something in the project, I would research it and figure it out. This is how it works in the real world. After 3 months of Dataquest and another 3 of a project (along with 4 years of being a data analyst) I convinced my boss to assign me DS project. I worked alongside another data scientist, but I owned the project, and they were mostly there for guidance, and coded some of the more complex things. I excelled at that project, and was promoted to data scientist, and began getting projects of my own, with less and less oversight. We have a very collaborative work environment, and the data scientists are truly out to help each other. We present our progress to each other often which allows us all to learn and improve. I have been promoted twice since I began DS work. I'd like to add that you can almost certainly do all this in less time than it took me. I wasted a lot of time spinning my wheels. ChatGPT is also a great resource that could also increase your learning speed. Don't blindly use it, but it's a great resource. **Tldr:** Sir this is Wendy’s. **Edit:** I’m not saying to never go deeper into things, I’m literally always learning. I go deeper into things all the time. Often in very niche domains, but you don't need to be a master in all things get started or even excel. Be able to understand generalities of those domains, and dig deeper when the problem calls for it. Learning a concept when you have a direct application is much more likely to stick. I thought it went without saying, but I’m not saying those things I listed are literally the only things you need to know about those topics, I was just giving examples of where relatively simple concepts were way more important than specifics. **Edit #2:** I'm not saying schooling is bad. Yes obviously having a masters and/or PhD is better than not. I'm directing this to those who are working a full time job who want to break into the field, but taking years getting a masters while working full time and going another 50K into debt is unrealistic

r/datascience1899 upvotes

Am i the only one who truly love this field? It sounds like everyone here is in for the money and hate their jobs

it's funny because in real life most of the people i know in the field love it

r/datascience1824 upvotes

AI isn't taking your job. Executives are.

If AI is ready to replace developers, why aren't developers replacing themselves with AI and just taking it easy at work? I'm a Director at my company. I'm in the meetings and helping set up the tools that cost people their jobs. Here's how they work: 1. Claude AI writes some code 2. The code gets passed to a developer for validation 3. Since the developer's "just validating", he can be replaced with an overseas contractor that'll work for a fraction of the pay We've tracked the tools, and we haven't seen any evidence that having Claude take a crack at the code saves anybody any time - but it does let us justify replacing expensive employees with cheap overseas contractors. You're not getting replaced by AI. Your job's being outsourced overseas.

r/datascience1584 upvotes

Companies are finally hiring

I applied to 80+ jobs before the new year and got rejected or didn’t hear back from most of them. A few positions were a level or two lower than my currently level. I got only 1 interview and I did accept the offer. In the last week, 4 companies reached out for interviews. Just want to put this out there for those who are still looking. Keep going at it. Edit - thank you all for the congratulations and I’m sorry I can’t respond to DMs. Here are answers to some common questions. 1. The technical coding challenge was only SQL. Frankly in my 8 years of analytics, none of my peers use Python regularly unless their role is to automate or data engineering. You’re better off mastering SQL by using leetcode and DataLemur 2. Interviews at all the FAANGs are similar. Call with HR rep, first round is with 1 person and might be technical. Then a final round with a bunch of individual interviews on the same day. Most of the questions will be STAR format. 3. As for my skillsets, I advertise myself as someone who can build strategy, project manage, and can do deep dive analyses. I’m never going to compete against the recent grads and experts in ML/LLM/AI on technical skills, that’s just an endless grind to stay at the top. I would strongly recommend others to sharpen their soft skills. A video I watched recently is from The Diary of a CEO with Body Language Expert with Vanessa Edwards. I legit used a few tips during my interviews and I thought that helped

r/datascience1562 upvotes

Tired of everyone becoming an AI Expert all of a sudden

Literally every person who can type prompts into an LLM is now an AI consultant/expert. I’m sick of it, today a sales manager literally said ā€˜oh I can get Gemini to make my charts from excel directly with one prompt so ig we no longer require Data Scientists and their support hehe’ These dumbos think making basic level charts equals DS work. Not even data analytics, literally data science? I’m sick of it. I hope each one of yall cause a data leak, breach the confidentiality by voluntarily giving private info to Gemini/OpenAi and finally create immense tech debt by developing your vibe coded projects. Rant over

r/datascience1222 upvotes

I am a staff data scientist at a big tech company -- AMA

**Why I’m doing this** I am low on karma. Plus, it just feels good to help. **About me** I’m currently a staff data scientist at a big tech company in Silicon Valley. I’ve been in the field for about 10 years since earning my PhD in Statistics. I’ve worked at companies of various sizes — from seed-stage startups to pre-IPO unicorns to some of the largest tech companies. **A few caveats** * Anything I share reflects my personal experience and may carry some bias. * My experience is based in the US, particularly in Silicon Valley. * I have some people management experience but have mostly worked as an IC * Data science is a broad term. I’m most familiar with machine learning scientist, experimentation/causal inference, and data analyst roles. * I may not be able to respond immediately, but I’ll aim to reply within 24 hours. **Update:** Wow, I didn’t expect this to get so much attention. I’m a bit overwhelmed by the number of comments and DMs, so I may not be able to reply to everyone. That said, I’ll do my best to respond to as many as I can over the next week. Really appreciate all the thoughtful questions and discussions!

r/datascience1180 upvotes

Saved $100k per year by explaining how AI/LLM work.

I work in a data science field, and I bring this up because I think it's data science related. We have an internal website that is very bare bones. It's made to be simplistic, because it's the reference document for our end-users (1000 of them) use. Executives heard about a software that would be completely AI driven, build detailed statistical insights, and change the world as they know it. I had a demo with the company and they explained its RAG capabilities, but mentioned it doesn't really "learn" like the assumption AI does. Our repo is so small and not at all needed for AI. We have used a fuzzy search that has worked for the past three years. Additionally, I have already built out dashboards that retrieve all the information executives have asked for via API (who's viewing pages, what are they searching, etc.) I showed the c-suite executives our current dashboards in Tableau, and how the actual search works. I also explained what RAG is, and how AI/LLMs work at a high level. I explained to them that AI is a fantastic tool, but I'm not sure if we should be spending 100k a year on it. They also asked if I have built any predictive models. I don't think they quite understood what that was as well, because we don't have the amount of data or need to predict anything. Needless to say, they decided it was best not to move forward "for now". I am shocked, but also not, that executives want to change the structure of how my team and end-users digest information just because they heard "AI is awesome!" They had zero idea how anything works in our shop. Oh yeah, our company has already laid of 250 people this year due to "financial turbulence", and now they're wanting to spend 100k on this?! It just goes to show you how deep the AI train runs. Did I handle this correctly and can I put this on my resume? LOL

r/datascience1104 upvotes

Client told me MS Copilot replicated what I built. It didn’t.

I built three MVP models for a client over 12 weeks. Nothing fancy: an LSTM, a prophet model, and XGBoost. The difficulty, as usual, was getting and understanding the data and cleaning it. The company is largely data illiterate. Turned in all 3 models, they loved it then all of a sudden canceled the pending contract to move them to production. Why? They had a devops person do in MS Copilot Analyst (a new specialized version of MS Copilot studio) and it took them 1 week! Would I like to sign a lesser contract to advise this person though? I finally looked at their code and it’s 40 lines of code using a subset of the California housing dataset run using a Random Forest regressor. They had literally nothing. My advice to them: go f*%k yourself.

r/datascience901 upvotes

Data Science is losing its soul

DS teams are starting to lose the essence that made them truly groundbreaking. their mixed scientific and business core. What we’re seeing now is a shift from deep statistical analysis and business oriented modeling to quick and dirty engineering solutions. Sure, this approach might give us a few immediate wins but it leads to low ROI projects and pulls the field further away from its true potential. One size-fits-all programming just doesn’t work. it’s not the whole game.

r/datascience884 upvotes

NVIDIA's paid Generative AI courses for FREE (limited period)

NVIDIA has announced free access (for a limited time) to its premium courses, each typically valued between $30-$90, covering advanced topics in Generative AI and related areas. The major courses made free for now are : * **Retrieval-Augmented Generation (RAG) for Production:** Learn how to deploy scalable RAG pipelines for enterprise applications. * **Techniques to Improve RAG Systems:** Optimize RAG systems for practical, real-world use cases. * **CUDA Programming:** Gain expertise in parallel computing for AI and machine learning applications. * **Understanding Transformers:** Deepen your understanding of the architecture behind large language models. * **Diffusion Models:** Explore generative models powering image synthesis and other applications. * **LLM Deployment:** Learn how to scale and deploy large language models for production effectively. **Note:** There are redemption limits to these courses. A user can enroll into any one specific course. **Platform Link**: [NVIDIA TRAININGS](https://nvda.ws/4jtHiOf)

r/datascience853 upvotes

My data science dream is slowly dying

I am currently studying Data Science and really fell in love with the field, but the more i progress the more depressed i become. Over the past year, after watching job postings especially in tech I’ve realized most Data Scientist roles are basically advanced data analysts, focused on dashboards, metrics, A/B tests. (It is not a bad job dont get me wrong, but it is not the direction i want to take) The actual ML work seems to be done by ML Engineers, which often requires deep software engineering skills which something I’m not passionate about. Right now, I feel stuck. I don’t think I’d enjoy spending most of my time on product analytics, but I also don’t see many roles focused on ML unless you’re already a software engineer (not talking about research but training models to solve business problems). Do you have any advice? **Also will there ever be more space for Data Scientists to work hands on with ML or is that firmly in the engineer’s domain now? I mean which is your idea about the field?**

r/datascience851 upvotes

Is there a large pool of incompetent data scientists out there?

Having moved from academia to data science in industry, I've had a strange series of interactions with other data scientists that has left me very confused about the state of the field, and I am wondering if it's just by chance or if this is a common experience? Here are a couple of examples: I was hired to lead a small team doing data science in a large utilities company. Most senior person under me, who was referred to as the senior data scientists had no clue about anything and was actively running the team into the dust. Could barely write a for loop, couldn't use git. Took two years to get other parts of business to start trusting us. Had to push to get the individual made redundant because they were a serious liability. It was so problematic working with them I felt like they were a plant from a competitor trying to sabotage us. Start hiring a new data scientist very recently. Lots of applicants, some with very impressive CVs, phds, experience etc. I gave a handful of them a very basic take home assessment, and the work I got back was mind boggling. The majority had no idea what they were doing, couldn't merge two data frames properly, didn't even look at the data at all by eye just printed summary stats. I was and still am flabbergasted they have high paying jobs in other places. They would need major coaching to do basic things in my team. So my question is: is there a pool of "fake" data scientists out there muddying the job market and ruining our collective reputation, or have I just been really unlucky?

r/datascience848 upvotes

Don’t be the data scientist who’s in love with models, be the one who solves real problems

work at a company with around 100 data scientists, ML and data engineers. The most frustrating part of working with many data scientists and honestly, I see this on this sub all the time too, is how obsessed some folks are with using ML or whatever the latest SoTA causal inference technique is. Earlier in my career plus during my masters, I was exactly the same, so I get it. But here’s the best advice I can give you: **don’t be that person.** Unless you’re literally working on a product where ML is the core feature, **your job is basically being an internal consultant.** That means understanding what stakeholders actually want, challenging their assumptions when needed, and giving them something useful, not just something that will disappear into a slide deck or notebook. Always try and make something run in production, don’t do endless proof of concepts. If you’re doing deep dives / analysis, define success criteria of your initiatives, try and measure them (e.g., some of my less technical but awesome DS colleagues made their career of finding drivers of key KPIs, reporting them to key stakeholders and measuring improvement over time). In short, **prove you’re worth it**. A lot of the time, that means building a dashboard. Or doing proper data/software engineering. Or using GenAI. Or whatever else some of my colleagues (and a loads of people on this sub) roll their eyes at. Solve the problem. Use whatever gets the job done, not just whatever looks cool on a rĆ©sumĆ©.

r/datascience848 upvotes

AI isn’t evolving, it’s stagnating

AI was supposed to revolutionize intelligence, but all it’s doing is shifting us from discovery to dependency. Development has turned into a cycle of fine-tuning and API calls, just engineering. Let’s be real, the power isn’t in the models it’s in the infrastructure. If you don’t have access to massive compute, you’re not training anything foundational. Google, OpenAI, and Microsoft own the stack, everyone else just rents it. This isn’t decentralizing intelligence it’s centralizing control. Meanwhile, the viral hype is wearing thin. Compute costs are unsustainable, inference is slow and scaling isn’t as seamless as promised. We are deep in Amara’s Law, overestimating short-term effects and underestimating long-term ones.

r/datascience839 upvotes

I have run DS interviews and wow!

Hey all, I have been responsible for technical interviews for a Data Scientist position and the experience was quite surprising to me. I thought some of you may appreciate some insights. A few disclaimers: I have no previous experience running interviews and have had no training at all so I have just gone with my intuition and any input from the hiring manager. As for my own competencies, I do hold a Master’s degree that I only just graduated from and have no full-time work experience, so I went into this with severe imposter syndrome as I do just holding a DS title myself. But after all, as the only data scientist, I was the most qualified for the task. For the interviews I was basically just tasked with getting a feeling of the technical skills of the candidates. I decided to write a simple predictive modeling case with no real requirements besides the solution being a notebook. I expected to see some simple solutions that would focus on well-structured modeling and sound generalization. No crazy accuracy or super sophisticated models. For all interviews the candidate would run through his/her solution from data being loaded to test accuracy. I would then shoot some questions related to the decisions that were made. This is what stood out to me: 1. Very few candidates really knew of other approaches to sorting out missing values than whatever approach they had taken. They also didn’t really know what the pros/cons are of imputing rather than dropping data. Also, only a single candidate could explain why it is problematic to make the imputation before splitting the data. 2. Very few candidates were familiar with the concept of class imbalance. 3. For encoding of categorical variables, most candidates would either know of label or one-hot and no alternatives, they also didn’t know of any potential drawbacks of either one. 4. Not all candidates were familiar with cross-validation 5. For model training very few candidates could really explain how they made their choice on optimization metric, what exactly it measured, or how different ones could be used for different tasks. Overall the vast majority of candidates had an extremely superficial understanding of ML fundamentals and didn’t really seem to have any sense for their lack of knowledge. I am not entirely sure what went wrong. My guesses are that either the recruiter that sent candidates my way did a poor job with the screening. Perhaps my expectations are just too unrealistic, however I really hope that is not the case. My best guess is that the Data Scientist title is rapidly being diluted to a state where it is perfectly fine to not really know any ML. I am not joking - only two candidates could confidently explain all of their decisions to me and demonstrate knowledge of alternative approaches while not leaking data. Would love to hear some perspectives. Is this a common experience?

r/datascience616 upvotes

The "Unicorn" is Dead: A Four-Era History of the Data Scientist Role and Why We're All Engineers Now

Hey everyone, I’ve been in this field for a while now, starting back when "Big Data" was the big buzzword, and I've been thinking a lot about how drastically our roles have changed. It feels like the job description for a "Data Scientist" has been rewritten three or four times over. The "unicorn" we all talked about a decade ago feels like a fossil today. I wanted to map out this evolution, partly to make sense of it for myself, but also to see if it resonates with your experiences. I see it as four distinct eras. --- ### **Era 1: The BI & Stats Age (The "Before Times," Pre-2010)** Remember this? Before "Data Scientist" was a thing, we were all in our separate corners. * **Who we were:** BI Analysts, Statisticians, Database Admins, Quants. * **What we did:** Our world revolved around historical reporting. We lived in SQL, wrestling with relational databases and using tools like Business Objects or good old Excel to build reports. The core question was always, **"What happened last quarter?"** * **The "advanced" stuff:** If you were a true statistician, maybe you were building logistic regression models in SAS, but that felt very separate from the day-to-day business analytics. It was more academic, less integrated. The mindset was purely descriptive. We were the historians of the company's data. ### **Era 2: The Golden Age of the "Unicorn" (Roughly 2011-2018)** This is when everything changed. *HBR* called our job the "sexiest" of the century, and the hype was real. * **The trigger:** Hadoop and Spark made "Big Data" accessible, and Python with Scikit-learn became an absolute powerhouse. Suddenly, you could do serious modeling on your own machine. * **The mission:** The game changed from "What happened?" to **"What's *going* to happen?"** We were all building churn models, recommendation engines, and trying to predict the future. The Jupyter Notebook was our kingdom. * **The "unicorn" expectation:** This was the peak of the "full-stack" ideal. One person was supposed to understand the business, wrangle the data, build the model, and then explain it all in a PowerPoint deck. The *insight* from the model was the final product. It was an incredibly fun, creative, and exploratory time. ### **Era 3: The Industrial Age & The Great Bifurcation (Roughly 2019-2023)** This is where, in my opinion, the "unicorn" myth started to crack. Companies realized a model sitting in a notebook doesn't actually *do* anything for the business. The focus shifted from *building models* to *deploying systems*. * **The trigger:** The cloud matured. AWS, GCP, and Azure became the standard, and the discipline of MLOps was born. The problem wasn't "can we predict it?" anymore. It was, **"Can we serve these predictions reliably to millions of users with low latency?"** * **The splintering:** The generalist "Data Scientist" role started to fracture into specialists because no single person could master it all: * **ML Engineers:** The software engineers who actually productionized the models. * **Data Engineers:** The unsung heroes who built the reliable data pipelines with tools like Airflow and dbt. * **Analytics Engineers:** The new role that owned the data modeling layer for BI. * The mindset became engineering-first. We were building factories, not just artisanal products. ### **Era 4: The Autonomous Age (2023 - Today and Beyond)** And then, everything changed again. The arrival of truly powerful LLMs completely upended the landscape. * **The trigger:** ChatGPT went public, GPT-4 was released, and frameworks like LangChain gave us the tools to build on top of this new paradigm. * **The mission:** The core question has evolved again. It's not just about prediction anymore; it's about **action and orchestration**. The question is, **"How do we build a system that can understand a goal, create a plan, and execute it?"** * **The new reality:** * **Prediction becomes a feature, not the product.** An AI *agent* doesn't just predict churn; it takes an *action* to prevent it. * **We are all systems architects now.** We're not just building a model; we're building an intelligent, multi-step workflow. We're integrating vector databases, multiple APIs, and complex reasoning loops. * **The engineering rigor from Era 3 is now the mandatory foundation.** You can't build a reliable agent without solid MLOps and real-time data engineering (Kafka, Flink, etc.). It feels like the "science" part of our job is now less about statistical analysis (AI can do a lot of that for us) and more about the rigorous, empirical science of architecting and evaluating these incredibly complex, often non-deterministic systems. So, that's my take. The "Data Scientist" title isn't dead, but the "unicorn" generalist ideal of 2015 certainly is. We've been pushed to become deeper specialists, and for most of us on the building side, that specialty looks a lot more like engineering than anything else. Curious to hear if this matches up with what you're all seeing in your roles. Did I miss an era? Is your experience different? EDIT: In response to comments asking if this was written by AI: The underlying ideas are based on my own experience. However, I want to be transparent that I would not have been able to articulate my vague, intuitive thoughts about the changes in this field with such precision. I used AI specifically for the structurization and organization of the content.

r/datascience600 upvotes

Tired of AI

One of the reasons I wanted to become an AI engineer was because I wanted to do cool and artsy stuff in my free time and automate away the menial tasks. But with the continuous advancements I am finding that it is taking away the fun in doing stuff. The sense of accomplishment I once used to have by doing a task meticulously for 2 hours can now be done by AI in seconds and while it's pretty cool it is also quite demoralising. The recent 'ghibli style photo' trend made me wanna vomit, because it's literally nothing but plagiarism and there's nothing novel about it. I used to marvel at the art created by Van Gogh or Picasso and always tried to analyse the thought process that might have gone through their minds when creating such pieces as the Starry night (so much so that it was one of the first style transfer project I did when learning Machine Learning). But the images now generated while fun seems soulless. And the hypocrisy of us using AI for such useless things. Oh my god. It boils my blood thinking about how much energy is being wasted to do some of the stupid stuff via AI, all the while there is continuously increasing energy shortage throughout the world. And the amount of job shortage we are going to have in the near future is going to be insane! Because not only is AI coming for software development, art generation, music composition, etc. It is also going to expedite the already flourishing robotics industry. Case in point look at all the agentic, MCP and self prompting techniques that have come out in the last 6 months itself. I know that no one can stop progress, and neither should we, but sometimes I dread to imagine the future for not only people like me but the next generation itself. Are we going to need a universal basic income? How is innovation going to be shaped in the future? Apologies for the rant and being a downer but needed to share my thoughts somewhere. PS: I am learning to create MCP servers right now so I am a big hypocrite myself.

r/datascience569 upvotes

How I scraped 4.1 million jobs with GPT4o-mini

**Background**: During my PhD in Data Science at Stanford, I got sick and tired of ghost jobs & 3rd party offshore agencies on LinkedIn & Indeed. So I wrote a script that fetches jobs from 100k+ company websites' career pages and uses GPT4o-mini to extract relevant information (ex salary, remote, etc.) from job descriptions. I made it publicly available here [https://hiring.cafe](https://hiring.cafe) and you can follow my progress and give me feedback at r/hiringcafe **Tech details (from a DS perspective)** 1. Verifying legit companies. This I did manually, but it was crucial that I exclude any recruiting firms, 3rd party offshore agencies, etc. I manually sorted through the \~100,000 company career pages (this took several weeks) and picked the ones that looked legit. At Stanford, we call this technique "occular regression" :) 2. Removing ghost jobs. I discovered that a strong predictor of if a job is a ghost job is that if it keeps being reposted. I was able to identify reposting by doing a embedding text similarity search for jobs from the same company. If 2 job descriptions overlap too much, I only show the date posted for theĀ *earliest*Ā listing. This allowed me to weed out most ghost jobs simply by using a date filter (for example, excluding any jobs posted over a month ago). 3. Scraping fresh jobs 3x/day. To ensure that my database is reflective of the company career page, I check each company career page 3x/day. To avoid rate-limits, I used a rotating proxy from Oxylabs for now. 4. Building advanced NLP text filters. After playing with GPT4o-mini API, I realized I could can effectively dump raw job descriptions (in HTML) and ask it to give me back formatted information back in JSON (ex salary, yoe, etc). I used this technique to extract a variety of information, including technical keywords, job industry, required licenses & security clearance, if the company sponsors visa, etc. **Question for the DS community:** Beyond job search, one thing I'm really excited about this 4.1 million job dataset is to be able to do a yearly or quarterly trend report. For instance, to look at what technical skills are growing in demand. What kinds of cool job trends analyses would you do if you had access to this data. **Edit:** A few folks DMed asking to explore the data for job searching. I put together a minimal frontend to make the scraped jobs searchable: [https://hiring.cafe](https://hiring.cafe) — note that it's currently non-commercial, unsupported, just a PhD side-project at the moment until I gradute. **Edit 2::** thank you for all the super positive comments. you can follow my progress on scraping more jobs on r/hiringcafe .Aalso to comments saying this is an ad, my full-time job is my phd, this is just a fun side project beofore I get an actual job haha

r/datascience405 upvotes

[Official] 2024 End of Year Salary Sharing thread

This is the official thread for sharing your current salaries (or recent offers). SeeĀ [last year's Salary Sharing thread here](https://www.reddit.com/r/datascience/comments/18tevwk/official_2023_end_of_year_salary_sharing_thread/). There was alsoĀ [an unofficial one from an hour ago here](https://www.reddit.com/r/datascience/comments/1i9zcgm/unofficial_2024_salary_thread/). Please only post salaries/offers if you're including hard numbers, but feel free to use a throwaway account if you're concerned about anonymity. You can also generalize some of your answers (e.g. "Large biotech company"), or add fields if you feel something is particularly relevant. **Title:** * **Tenure length:** * **Location:** * **$Remote:** * **Salary:** * **Company/Industry:** * **Education:** * **Prior Experience:** * **$Internship** * **$Coop** * **Relocation/Signing Bonus:** * **Stock and/or recurring bonuses:** * **Total comp:** Note that while the primary purpose of these threads is obviously to share compensation info, discussion is also encouraged.

r/datascience394 upvotes

What technical skills should young data scientists be learning?

Data science is obviously a broad and ill-defined term, but most DS jobs today fall into one of the following flavors: - Data analysis (a/b testing, causal inference, experimental design) - Traditional ML (supervised learning, forecasting, clustering) - Data engineering (ETL, cloud development, model monitoring, data modeling) - Applied Science (Deep learning, optimization, Bayesian methods, recommender systems, typically more advanced and niche, requiring doctoral education) The notion of a ā€œfull stackā€ data scientist has declined in popularity, and it seems that many entrants into the field need to decide one of the aforementioned areas to specialize in to build a career. For instance, a seasoned product DS will be the best candidate for senior product DS roles, but not so much for senior data engineering roles, and vice versa. Since I find learning and specializing in everything to be infeasible, I am interested in figuring out which of these ā€œpathsā€ will equip one with the most employable skillset, especially given how fast ā€œAIā€ is changing the landscape. For instance, when I talk to my product DS friends, they advise to learn how to develop software and use cloud platforms since it is essential in the age of big data, even though they rarely do this on the job themselves. My data engineer friends on the other hand say that data engineering tools are easy to learn, change too often, and are becoming increasingly abstracted, making developing a strong product/business sense a wiser choice. Is either group right? Am I overthinking and would be better off just following whichever path interests me most? EDIT: I think the essence of my question was to assume that candidates have solid business knowledge. Given this, which skillset is more likely to survive in today and tomorrow’s job market given AI advancements and market conditions. Saying all or multiple pathways will remain important is also an acceptable answer.

r/datascience386 upvotes

New Grad Data Scientist feeling overwhelmed and disillusioned at first job

Hi all, I recently graduated with a degree in Data Science and just started my first job as a data scientist. The company is very focused on staying ahead/keeping up with the AI hype train and wants my team (which has no other data scientists except myself) to explore deploying AI agents for specific use cases. The issue is, my background, both academic and through internships, has been in more traditional machine learning (regression, classification, basic NLP, etc.), not agentic AI or LLM-based systems. The projects I’ve been briefed on, have nothing to do with my past experiences and are solely concerned with how we can infuse AI into our workflows and within our products. I’m feeling out of my depth and worried about the expectations being placed on me so early in my career. I was wondering if anyone had advice on how to quickly get up to speed with newer techniques like agentic AI, or how I should approach this situation overall. Any learning resources, mindset tips, or career advice would be greatly appreciated.

r/datascience342 upvotes

To the avid fans of R, I respect your fight for it but honestly curious what keeps you motivated?

I started my career as an R user and loved it! Then after some years in I started looking for new roles and got the slap of reality that no one asks for R. Gradually made the switch to Python and never looked back. I have nothing against R and I still fend off unreasonable attacks on R by people who never used it calling it only good for adhoc academic analysis and bla bla. But, is it still worth fighting for?

r/datascience310 upvotes

Where do you go to stay up to date on data analytics/science?

Are there any people or organizations you follow on Youtube, Twitter, Medium, LinkedIn, or some other website/blog/podcast that you always tend to keep going back to? My previous career absolutely lacked all the professional "content creators" that data analytics have, so I was wondering what content you guys tend to consume, if any. Previously I'd go to two sources: one to stay up to date on semi-relevant news, and the other was a source that'd do high level summaries of interesting research papers. Really, the kind of stuff would be talking about new tools/products that might be of use, tips and tricks, some re-learning of knowledge you might have learned 10+ years ago, deep dives of random but pertinent topics, or someone that consistently puts out unique visualizations and how to recreate them. You can probably see what I'm getting at: sources for stellar information.

r/datascience300 upvotes

Is studying Data Science still worth it?

Hi everyone, I’m currently studying data science, but I’ve been hearing that the demand for data scientists is decreasing significantly. I’ve also been told that many data scientists are essentially becoming analysts, while the machine learning side of things is increasingly being handled by engineers. - Does it still make sense to pursue a career in data science or should i switch to computer science? I mean i dont think i want to do just AB tests for a living - Also, are machine learning engineers still building models or are they mostly focused on deploying them?

r/datascience284 upvotes

Resources for Data Science & Analysis: A curated list of roadmaps, tutorials, Python libraries, SQL, ML/AI, data visualization, statistics, cheatsheets

Hello everyone! Staying on top of the constantly growing skill requirements in Data Science is quite a challenge. To manage my own learning and growth, I've been curating a list of useful resources and tools that cover the full spectrum of the field — from data analysis and engineering to deep learning and AI. I'd love to get your professional opinion. Could you please take a look? Have I missed anything crucial? What else would you recommend adding or focusing on? To give you an immediate sense of the list's scope and structure, I've attached screenshots of the table of contents below. The full version with all the active links and additional resources is available on GitHub. You can find the link at the end of the post. https://preview.redd.it/egbe8jmruotf1.png?width=890&format=png&auto=webp&s=0256f4ea30e7843bca8e77545ea46cc5ba25b72c https://preview.redd.it/3vq4pm8k1evf1.png?width=882&format=png&auto=webp&s=1dcdbb6f9188535ae872bc40b77ede45833a6d4f I'd be happy if this list is useful to others. You can view the full list here [View on GitHub](https://github.com/PavelGrigoryevDS/awesome-data-analysis?#awesome-data-analysis-) Thanks for your time! Your advice is invaluable!

r/datascience269 upvotes

Ridiculous offer, how to proceed?

Hello All, after a very long struggle with landing my first data science job, I got a ridiculous offer and would like to know how to proceed. For context, I have 7 years of medtech experience, not specifically in data science but similar and an undergrad in stats and now a masters in data science. I am located in the US. I've been talking with a company for months now and had several interviews even without a specific position available. Well they finally opened two positions, one associate and one senior with salary ranges of 66-99k and 130k-180k respectively. I applied for both and when HR got involved for the offer they said they could probably just split the difference for 110k. Sure that's fine. However, a couple days later, they called again and offered 60-70k, below even the lower limit of the associate range. So my question is has this happened to anyone else? Is this HR's way of trying to get me to just go away? Maybe I'm just frustrated since HR said the salary range listed on the job req isn't actually what they are willing to pay

r/datascience251 upvotes

What’s the best business book you’ve read?

I came across this question on a job board. After some reflection, I realized that some of the best business books helped me understand the strategy behind the company’s growth goals, better empathizing with others, and getting them to care about impactful projects like I do. What are some useful business-related books for a career in data science?

r/datascience242 upvotes

This is how I got a (potential) offer revoked: A learning lesson

I’m based in the Bay Area with 5 YOE. A couple of months ago, I interviewed for a role I wasn’t too excited about, but the pay was super compelling. In the first recruiter call, they asked for my salary expectations. I asked for their range, as an example here, let’s say they said $150K–$180K. I said, ā€œThat works, I’m looking for something above $150K.ā€ I think this was my first mistake, more on that later. I am a person with low self esteem(or serious imposter syndrome) and when I say I nailed all 8 rounds, I really must believe that. The recruiter followed up the day after 8th round saying team is interested in extending an offer. Then on compensation expectations the recruiter said, ā€œYou mentioned $150K earlier.ā€ I clarified that I was targeting the upper end based on my fit and experience. They responded with, ā€œSo $180K?ā€ and I just said yes. It felt a bit like putting words in my mouth. Next day, I got an email saying that I have to wait for the offer decision as they are interviewing other candidates. Haven’t heard back since. I don’t think I did anything fundamentally wrong or if I should have regrets but curious what others think. Edit: Just to clarify, in my mind I thought that’s how negotiations work. They will come back and say can’t do 150 but can do 140. But I guess not.

r/datascience241 upvotes

Is it normal to be scared for the future finding a job

I am a rising senior at a large state school studying data science. I am currently working an internship as a software engineer for the summer. And I get my tickets done for the most part albeit with some help from ai. But deep down I feel a pit in my stomach that I won’t be able to end up employed after all of this. I plan to go for a masters in applied statistics or data science after my bachelors. Thought I definitely don’t have great math grades from my first few semesters of college. But after those semesters all my upper division math/stats/cs/data science courses have been A’s and B’s. And I feel like ik enough python, R, and SAS to work through and build models for most problems I run into, as well as tableau, sql and alteryx. But I can’t shake the feeling that it won’t be enough. Also that my rough math grades in my first few semesters will hold me back from getting into a masters programs. I have tried to supplement this by doing physics and applied math research. But I’m just not sure I’m doing enough and I’m scared for like after I finish my education. Im just venting here but I’m hoping there r others in this sub who have been in similar positions and gotten employed. Or r currently in my same shoes I just need to hear from other people that it’s not as hopeless as it feels. I just want to get a job as a data analyst, scientist, or statistician working on interesting problems and have a decent career.

r/datascience241 upvotes

What’s next for a 11 YOE data scientist?

Hi folks, Hope you’re having a great day wherever you are in the world. Context: I’ve been in the data science industry for the past 11 years. I started my career in telecom, where I worked extensively on time series analysis and data cleaning using R, Java, and Pig. After about two years, I landed my first ā€œdata scientistā€ role in a bank, and I’ve been in the financial sector ever since. Over time, I picked up Python, Spark, and TensorFlow to build ML models for marketing analytics and recommendation systems. It was a really fun period — the industry wasn’t as mature back then. I used to get ridiculously excited whenever new boosting algorithms came out (think XGBoost, CatBoost, LightGBM) and spent hours experimenting with ensemble techniques to squeeze out higher uplift. I also did quite a bit of statistical A/B testing — not just basic t-tests, but full experiment design with power analysis, control-treatment stratification, and post-hoc validation to account for selection bias and seasonality effects. I enjoyed quantifying incremental lift properly, whether through classical hypothesis testing or uplift modeling frameworks, and working with business teams to translate those metrics into campaign ROI or customer conversion outcomes. Fast forward to today — I’ve been at my current company for about two years. Every department now wants to apply Gen AI (and even ā€œagentic AIā€) even though we haven’t truly tested or measured many real-world efficiency gains yet. I spend most of my time in meetings listening to people talk all day about AI. Then I head back to my table to do prompt engineering, data cleaning, testing, and evaluation. Honestly, it feels off-putting that even my business stakeholders can now write decent prompts. I don’t feel like I’m contributing much anymore. Sure, the surrounding processes are important — but they’ve become mundane, repetitive busywork. I’m feeling understimulated intellectually and overstimulated by meetings, requests, and routine tasks. Anyone else in the same boat? Does this feel like the end of a data science journey? Am I far too gone? It’s been 11 years for me, and lately, I’ve been seriously considering moving into education — somewhere I might actually feel like I’m contributing again.

r/datascience233 upvotes

Since when did ā€œmeetsā€ expectations become a bad thing in this industry?

I work at a pretty big named company on west coast. It is pretty shocking to see that in my company anyone who gets ā€œmeetsā€ expectations have not been getting any salary increments, not even a dollar each year. I’d think if you are meeting expectations, it means you are holding up your end of the deal and it shouldn’t be a bad thing. But now, you actually have to exceeds expectations to get measly 1% salary raises and sometimes to just keep your job. Did this used to happen pre covid as well?

r/datascience222 upvotes

Ace The Interview - SQL Intuitively and Exhaustively Explained

SQL is easy to learn and hard to master. Realistically, the difficulty of the questions you get will largely be dictated by the job role you're trying to fill. From it's highest level, SQL is a "declarative language", meaning it doesn't define a set of operations, but rather a desired end result. This can make SQL incredibly expressive, but also a bit counterintuitive, especially if you aren't fully aware of it's declarative nature. SQL expressions are passed through an SQL engine, like PostgreSQL, MySQL, and others. Thes engines parse out your SQL expressions, optimize them, and turn them into an actual list of steps to get the data you want. While not as often discussed, for beginners I recommend SQLite. It's easy to set up in virtually any environment, and allows you to get rocking with SQL quickly. If you're working in big data, I recommend also brushing up on something like PostgreSQL, but the differences are not so bad once you have a solid SQL understanding. In being a high level declaration, SQL’s grammatical structure is, fittingly, fairly high level. It’s kind of a weird, super rigid version of English. SQL queries are largely made up of: * **Keywords:**Ā special words in SQL that tell an engine what to do. Some common ones, which we’ll discuss, areĀ `SELECT, FROM, WHERE, INSERT, UPDATE, DELETE, JOIN, ORDER BY, GROUP BY`Ā . They can be lowercase or uppercase, but usually they’re written in uppercase. * **Identifiers:**Ā Identifiers are the names of database objects like tables, columns, etc. * **Literals:**Ā numbers, text, and other hardcoded values * **Operators:**Ā Special characters or keywords used in comparison and arithmetic operations. For exampleĀ `!=`,Ā `<`Ā ,`OR`,Ā `NOT`Ā ,Ā `*`,Ā `/`,Ā `%`Ā ,Ā `IN`,Ā `LIKE`Ā . We’ll cover these later. * **Clauses:**Ā These are the major building block of SQL, and can be stitched together to combine a queries general behavior. They usually start with a keyword, like * `SELECT` – defines which columns to return * `FROM` – defines the source table * `WHERE` – filters rows * `GROUP BY` – groups rows etc. By combining these clauses, you create an SQL query There are a ton of things you can do in SQL, like create tables: CREATE TABLE People(first_name, last_name, age, favorite_color) Insert data into tables: INSERT INTO People VALUES ('Tom', 'Sawyer', 19, 'White'), ('Mel', 'Gibson', 69, 'Green'), ('Daniel', 'Warfiled', 27, 'Yellow') Select certain data from tables: SELECT first_name, favorite_color FROM People Search based on some filter SELECT * FROM People WHERE id = 3 And Delete Data DELETE FROM People WHERE age < 30 What was previously mentioned makes up the cornerstone of pretty much all of SQL. Everything else builds on it, and there is a lot. **Primary and Foreign Keys** A *primary key* is a unique identifier for each record in a table. A *foreign key* references a primary key in another table, allowing you to relate data across tables. This is the backbone of relational database design. **Super Keys and Composite Keys** A *super key* is any combination of columns that can uniquely identify a row. When a unique combination requires multiple columns, it’s often called a *composite key* — useful in complex schemas like logs or transactions. **Normalization and Database Design** Normalization is the process of splitting data into multiple related tables to reduce redundancy. First Normal Form (1NF) ensures atomic rows, Second Normal Form (2NF) separates logically distinct data, and Third Normal Form (3NF) eliminates derived data stored in the same table. **Creating Relational Schemas in SQLite** You can explicitly define tables with `FOREIGN KEY` constraints using `CREATE TABLE`. These relationships enforce referential integrity and enable behaviors like cascading deletes. SQLite enforces `NOT NULL` and `UNIQUE` constraints strictly, making your schema more robust. **Entity Relationship Diagrams (ERDs)** ERDs visually represent tables and their relationships. Dotted lines and cardinality markers like `{0,1}` or `0..N` indicate how many records in one table relate to another, which helps document and debug schema logic. **JOINs** JOIN operations combine rows from multiple tables using foreign keys. `INNER JOIN` includes only matched rows, `LEFT JOIN` includes all from the left table, and `FULL OUTER JOIN` (emulated in SQLite) combines both. Proper JOINs are critical for data integration. **Filtering and LEFT/RIGHT JOIN Differences** JOIN order affects which rows are preserved when there’s no match. For example, using `LEFT JOIN` ensures all left-hand rows are kept — useful for identifying unmatched data. SQLite lacks `RIGHT JOIN`, but you can simulate it by flipping the table order in a `LEFT JOIN`. **Simulating FULL OUTER JOINs** SQLite doesn’t support `FULL OUTER JOIN`, but you can emulate it with a `UNION` of two `LEFT JOIN` queries and a `WHERE` clause to catch nulls from both sides. This approach ensures no records are lost in either table. **The WHERE Clause and Filtration** `WHERE` filters records based on conditions, supporting logical operators (`AND`, `OR`), numeric comparisons, and string operations like `LIKE`, `IN`, and `REGEXP`. It's one of the most frequently used clauses in SQL. **DISTINCT Selections** Use `SELECT DISTINCT` to retrieve unique values from a column. You can also select distinct combinations of columns (e.g., `SELECT DISTINCT name, grade`) to avoid duplicate rows in the result. **Grouping and Aggregation Functions** With `GROUP BY`, you can compute metrics like `AVG`, `SUM`, or `COUNT` for each group. `HAVING` lets you filter grouped results, like showing only departments with an average salary above a threshold. **Ordering and Limiting Results** `ORDER BY` sorts results by one or more columns in ascending (`ASC`) or descending (`DESC`) order. `LIMIT` restricts the number of rows returned, and `OFFSET` lets you skip rows — useful for pagination or ranked listings. **Updating and Deleting Data** `UPDATE` modifies existing rows using `SET`, while `DELETE` removes rows based on `WHERE` filters. These operations can be combined with other clauses to selectively change or clean up data. **Handling NULLs** `NULL` represents missing or undefined values. You can detect them using `IS NULL` or replace them with defaults using `COALESCE`. Aggregates like `AVG(column)` ignore NULLs by default, while `COUNT(*)` includes all rows. **Subqueries** Subqueries are nested `SELECT` statements used inside `WHERE`, `FROM`, or `SELECT`. They’re useful for filtering by aggregates, comparisons, or generating intermediate results for more complex logic. **Correlated Subqueries** These are subqueries that reference columns from the outer query. Each row in the outer query is matched against a custom condition in the subquery — powerful but often inefficient unless optimized. **Common Table Expressions (CTEs)** CTEs let you define temporary named result sets with `WITH`. They make complex queries readable by breaking them into logical steps and can be used multiple times within the same query. **Recursive CTEs** Recursive CTEs solve hierarchical problems like org charts or category trees. A base case defines the start, and a recursive step extends the output until no new rows are added. Useful for generating sequences or computing reporting chains. **Window Functions** Window functions perform calculations across a set of table rows related to the current row. Examples include `RANK()`, `ROW_NUMBER()`, `LAG()`, `LEAD()`, `SUM() OVER ()`, and moving averages with sliding windows. These all can be combined together to do a lot of different stuff. In my opinion, this is too much to learn efficiently learn outright. It requires practice and the slow aggregation of concepts over many projects. If you're new to SQL, I recommend studying the basics and learning through doing. However, if you're on the job hunt and you need to cram, you might find this breakdown useful: [https://iaee.substack.com/p/structured-query-language-intuitively](https://iaee.substack.com/p/structured-query-language-intuitively)

r/datascience215 upvotes

Mid career data scientist burnout

Been in the industry since 2012. I started out in data analytics consulting. The first 5 were mostly that, and didn't enjoy the work as I thought it wasn't challenging enough. In the last 6 years or so, I've moved to being a Senior Data Scientist - the type that's more close to a statistical modeller, not a full-stack data scientist. Currently work in health insurance (fairly new, just over a year in current role). I suck at comms and selling my work, and the more higher up I'm going in the organization, I realize I need to be strategic with selling my work, and also in dealing with people. It always has been an energy drainer for me - I find I'm putting on a front. Off late, I feel 'meh' about everything. The changes in the industry, the amount of knowledge some technical, some industry based to keep up with seems overwhelming. Overall, I chart some of these feelings to a feeling of lacking capability to handling stakeholders, lack of leadership skills in the role/ tying to expectations in the role. (also want to add that I have social anxiety). Perhaps one of the things might help is probably upskilling on the social front. Anyone have similar journeys/ resources to share? I started working with a generic career coach, but haven't found it that helpful as the nuances of crafting a narrative plus selling isn't really coming up (a lot more of confidence/ presence is what is focused on). Edit: Lots of helpful directions to move in, which has been energizing.

r/datascience187 upvotes

Europe Salary Thread 2025 - What's your role and salary?

The yearly Europe-centric salary thread. You can find the last one here: https://old.reddit.com/r/datascience/comments/1fxrmzl/europe_salary_thread_2024_whats_your_role_and/ I think it's worthwhile to learn from one another and see what different flavours of data scientists, analysts and engineers are out there in the wild. In my opinion, this is especially useful for the beginners and transitioners among us. So, do feel free to talk a bit about your work if you can and want to. šŸ™‚ While not the focus, non-Europeans are of course welcome, too. Happy to hear from you! **Data Science Flavour:** . **Location:** . **Title:** . **Compensation (gross):** . **Education level:** . **Experience:** . **Industry/vertical:** . **Company size:** . **Majority of time spent using (tools):** . **Majority of time spent doing (role):** .

r/datascience165 upvotes

2 YOE Data Scientist [Unemployed in data field] Burnt out and feeling helpless.

Full resume [Link](https://drive.google.com/file/d/1PpN1hNRPFlGQ0vq6OwaJXZDZ2zMMp3MU/view?usp=sharing). Hello everyone. I am a 25 year old international student in the UK, who is heavily struggling to even land interviews and drowning in debt. I have tried retail/marketing industry and even Finance industry as I have the experience related to both of them. I also apply do not spray and pray. I send emails to hiring teams and people of the company after applying just to get in their radar. The freelancing job (The remote one) that I had, came from my Fiverr Gigs and It was going pretty well. I had to stop it because I moved to the UK for further studies in the hopes of getting better career progression. I think that I kinda messed up too by not applying for internships or even graduate programs (As I had experience on my CV). The last job I had was also a contractual job for 4 months and It came from the same company where I was working as a store manager (Retail). I have landed like 3 or 4 interviews in 3 years and am really really really struggling to understand what is going wrong. Is it my freelancing experience? Because I have learned a lot about CV's, applying to specific industry, working on stuff that the specific industry needs/wants. But I just simply do not understand. I am just lost literally lost. I would really really appreciate any help and honest feedback/advice, I know I will be grilled but sure bring it in it might help me. Thank you so much.

r/datascience158 upvotes

Where to Go After Data Science: Unconventional / Weird Exits?

Data science careers often feel like they funnel into the same few paths—FAANG, ML/AI engineering, or analytics leadership—but people actually branch into wildly unexpected directions. I’m curious about those off-the-beaten-path exits: roles in unexpected industries, analytics-adjacent pivots, international moves, or entirely new ventures. Would love to hear some stories. P.S. Thread inspired from a thread in the consulting subreddit but adapted to DS.

r/datascience157 upvotes

Meta: Career Advice vs Data Science

I joined the thread to learn about Data Science. Something like 75 percent of the posts are peoples resumes and requests for career advice. I thought these were supposed to go into a weekly thread or something - I'm getting a warning about the weekly thread even as I'm posting this comment. Can anyone suggest alternative subs with more educational content?

r/datascience148 upvotes

Research Data Scientists without heavy coding backgrounds (stats, econ, etc), has LLM's improved your workflow?

I remember for a while there were many CS folks saying that Data Science has become software engineering, and that if you aren't fluent in software engineering fundamentals then you're going to fall behind. It became enough of a popular rhetoric that people said they preferred to hire a coder with some math knowledge than a math person with some coding knowledge. As a Statistician that works in Research Data Science with an average level of coding experience, enough to write my own code in notebooks, but translating it into a fully fleshed Python module with classes and functions was much more difficult for me. For a while I thought my lack of advanced software engineering knowledge would become a crutch in my career and as someone with a busy personal life I didn't want to spend that much time learning these fundamentals. Then, my company rolled out LLM's integrated into the software we use, like Visual Studio. Suddenly I'm able to create fully fleshed out modules from my notebooks in a flash. I can ask the LLM to write unit tests to test out how my code processes data or test its various subfunctions. I can use it to code up various types of models quickly to compare results. Handing off my code to engineering in the form of a Python package wasn't such a pain anymore. Sure the LLM produces some weird results sometimes, and I do have to spend time making sure I ask it the correct things and/or cleaning up the code so that it works properly. But now I feel like that crutch I had is no longer present.

r/datascience146 upvotes

Long-timers at companies — what’s your secret?

Hi everyone, I’ve been a job hopper throughout my career—never stayed at one place for more than 1-2 years, usually for various reasons. Now, I’m entering a phase where I want to get more settled. I’m about to start a new job and would love to hear from those who have successfully stayed long-term at a job. What’s the secret sauce besides just hard work and taking ownership? Lay your knowledge on me—your hacks, tips, rituals. Thanks in advance.

r/datascience122 upvotes

Should I invest time learning a language other than Python?

I finished my PhD in CS three years ago, and I've been working as a data scientist for the past two years, exclusively using Python. I love it, especially the statistical side and scripting capabilities, but lately, I've been feeling a bit constrained by only using one language. I'm debating whether it's worthwhile to branch out and learn another language to broaden my horizons. R seems appealing given my interests in stats, but I'm also curious about languages like Julia, Scala, or even something completely different. Has anyone here faced a similar decision? Did learning another language significantly boost your career, or was it just a nice-to-have skill? Or maybe this is just a waste of time? Thanks for any insights! Update: I'm not completely sure about my long term goals, tbh. I do like statistics and stuff like causal inference, and Bayesian inference looks appealing. At the same time I feel that doing some DL might also be great and practical as they are the most requested in the industry (took some courses about NLP but at my work we mostly do tabular data with classical ML). Those are the main direction, but I'm aware that they might be too broad.

r/datascience111 upvotes

Lost and Feel Like a Fraud

This might not be the appropriate place to say this, but I honestly feel like the biggest fraud ever. If I could go back, I don’t think I would have went into data science. I did my undergraduate in biology, and then did a masters in data science. I’ve continued to get better with coding (still not good enough like a CS major), learning, using AI, but I feel like I’m getting no where. In fact, I’m just getting more frustrated. My job is not related to data science AT ALL, just analyzing incoming live data. I’ve been polishing my resume, no luck at all for even 1 interview. I know the market is brutal, but even when you’re lucky enough to land a job, the salary is horrible in Canada. I don’t even think I enjoy doing data science work anymore since it’s becoming more and more dependant on AI. I’m too out of it to go back to school to do something else. In truth, I don’t know what I’m doing. I don’t even know why I’m writing this.

r/datascience92 upvotes

Can a PhD be harmful for your career?

I have my MS degree in a Data Science adjacent field. I currently work in a Data Science / Software Engineering hybrid role, but I also work a second job as an adjunct professor in data science/analytics. I find teaching unbelievably rewarding, but I could make more money being a cashier at Target. That's no exaggeration. Part of me thinks teaching is my calling. My workplace will pay for my PhD, however, if I receive my PhD, and discover that I may not want to be a professor... would this result in a hard time finding data science jobs that aren't solely research based? I try to think of the recruiter perspective, and if I applied to a job with a PhD they may think I will be asking for too much money or be too overqualified. I'm just wondering if anyone has been in the same scenario, or had thoughts on this. Thank you for your time!

r/datascience90 upvotes

Data analyst vs. engineer? At non-profit

Hi all, I am the only Data Analyst at a medium-sized company related to shared transportation (adjacent to Lime Scooter/Bike). I'm pretty early in my career (grad from college 3 years ago). My role encompasses a LOT of responsibilities that aren't traditionally under "data analyst", the biggest of which being that I build and maintain all the data pipelines from our partner companies via API and webhooks to our own SQL database. This feels very much like the role of Data Engineer. From there, I use the SQL data to build dashboards / do analyses, etc, which is what I usually think of as "Data Analyst". I am trying to argue for a raise (since data engineers are usually paid more than analysts), and I am trying to figure out if I should ask for a title change too. I'd like to have engineering somehow in it, but "Data Engineer and Analyst" doesn't sound great. Does anyone have any experience or advice with this? Thanks!!

r/datascience76 upvotes

What do you use to build dashboards?

Hi guys, I've been a data scientist for 5 years. I've done lots of different types of work and unfortunately that has included a lot of dashboarding (no offense if you enjoy making dashboards). I'm wondering what tools people here are using and if you like them. In my career I've used mode, looker, streamlit and retool off the top of my head. I think mode was my favorite because you could type sql right into it and get the charts you wanted but still was overall unsatisfied with it. I'm wondering what tools the people here are using and if you find it meets all your needs? One of my frustrations with these tools is that even platforms like Looker—designed to be self-serve for general staff—end up being confusing for people without a data science background. Are there any tools (maybe powered my LLMs now) that allow non data science people to write prompts that update production dashboards? A simple example is if you have a revenue dashboard showing net revenue and a PM, director etc wanted you to add an additional gross revenue metric. With the tools I'm aware of I would have to go into the BI tool and update the chart myself to show that metric. Are there any tools that allow you to just type in a prompt and make those kinds of edits?

r/datascience67 upvotes

Am I underpaid/underemployed at $65k for a Data Analyst position in a MCOL city?

I'm in a mcol city. I have a master's in Data Analytics that I finished in October 2024, and I've been working as a Data Analyst for 1.5 years. Before that, I was a study lead Clinical Data Manager for over a year (and before that I was a tax researcher and worked in HR). Currently, I make $65k base salary, but $85k total compensation. I keep getting interviews for Data Scientist positions that are well into the $100k+ base salary range, but I haven't landed an offer yet (it's really disheartening). Am I underpaid? P.S. I'm open to job suggestions lol

r/datascience62 upvotes

Would you move from DS to BI/DA/DE for a salary increase?

I’m a DS but salary is below average. Getting recruiters reaching out for other data roles though because my experience is broad. Sometimes these roles start at ~$40k over what I’m making now, and even over other open DS roles I see on LinkedIn in my area for my yoe. The issue is I love DS work, and don’t want to make it super difficult to get future DS jobs. But I also wouldn’t mind working in another data role for a bit to get that money though. What are everyone’s thoughts on this? Would you leave DS for more money?

r/datascience62 upvotes

Deciding on an offer: Higher Salary vs Stability

Trying to decide between staying in a stable, but stagnating position or move for higher pay and engagement with higher risk of layoff. Would love to hear the subreddits thoughts on a move in this climate. I currently work for a city as a Senior DS. The position has good WLB, early retirement healthcare (in 5 years), and relative security. However, my role has shifted to mostly reporting in Tableau and Excel with shrinking DS opportunities. There is no growth in terms of salary or position. I have an offer from a mature startup that would give me a large pay bump and allow me to work on DS projects with a more contemporary tech stack. However, their reviews have mentioned recent layoffs and slow career growth. Below are some more specifics: I am 35 in a VHCOL city. DINK with a mortgage and student loans Current Job: -$130k - Okay pension with early retirement Healthcare in 5 years - Good WLB, but non-DS work with an aging tech stack - Raises and promotions are extremely rare (none for my team in the last 4 years) - 2 days in office New Job - same title: - $170k - DS work with a much more modern tech stack stack - fully remote - 1st year off 2 years of layoffs - reviews frequently cite few raises and promotions; however, really good wlb. One nice thing is I don't lose my pension progress if I leave, so if I do end up in a city or state position again I start up where I left off.

r/datascience25 upvotes

How to tell the difference between whether managers are embracing reality of AI or buying into hype?

I work in data science with a skillset that comprises of data science, data engineering and analytics. My team seems to want to eventually make my role completely non-technical (I'm not sure what a non-technical role would entail). The reason is because there's a feeling all the technical aspects will be completely eliminated by AI. The rationale, in theory, makes sense - we focus on the human aspects of our work, which is to develop solutions that can clearly be transferred to a fully technical team or AI to do the job for us. The reality in my experience is that this makes a strong assumptions data processes have the capacity to fit cleanly and neatly into something like a written prompt that can easily be given to somebody or AI with no 'context' to develop. I don't feel like in my work, our processes are there yet....like at all. Some things, maybe, but most things no. I also feel I'm navigating a lot of ever evolving priorities, stakeholder needs, conflicting advice (do this, no revert this, do this, rinse, repeat). This is making my job honestly frustrating and burning me out FAST. I'm working 12 hour days, sometimes up to 3 AM. My technical skills are deteriorating and I feel like my mind is becoming into a fried egg. Don't have time or energy to do anything to upskill. On one hand, I'm not sure if management has a point - if I let go of the 'technical' parts that I like b/c of AI and instead just focus on more of the 'other stuff', would I have more growth, opportunity and salary increase in my career? Or is it better off to have a balance between those skills and the technical aspects? In an ideal world, I want to be able to have a good compromise between subject matter and technical skills and have a job where I get to do a bit of both. I'm not sure if the narrative I'm hearing is one of hype or reality. Would be interested in hearing thoughts.

r/datascience24 upvotes

Would you rather be comfortable or take risks moving around?

I recently received a job offer from a mid-to-large tech company in the gig economy space. The role comes with a competitive salary, offering a 15-20k increase over my current compensation. While the pay bump is nice, the job itself will be challenging as it focuses on logistics and pricing. However, I do have experience in pricing and have demonstrated my ability to handle optimization work. This role would also provide greater exposure to areas like causal inference, optimization, and real-time analytics, which are areas I’d like to grow in. That said, I’m concerned about my career trajectory. I’ve moved around frequently in the past—for example, I spent 1.5 years at a big bank in my first role but left due to a toxic team. While I’m currently happy and comfortable in my role, I haven’t been here for a full year yet. My current total compensation is $102k. While the work-life balance is great, my team is lacking in technical skills, and I’ve essentially been responsible for upskilling the entire practice. Another area of concern is that technically we are not able to keep up with bigger companies and the work is highly regulated so innovation isnt as easy. Given the frequency move what would you do in my shoes? Take it and try to improve career opportunities for big tech?

r/datascience14 upvotes

Do these recruiters sound like a scam?

Hi all, unsure of where else to ask this so asking here. I had a recruiter (heavy Indian accent) call/email me with an interesting proposition. They work for the candidate rather than the company. If they place you in a job within 45 days they ask for 9% of your first year's salary. They claim their value add is in a couple of things. First they promise that they have advanced ATS software that will help tweak professional qualifications. Second, they say they will apply to approximately 50 JDs per day (I am skeptical this many relevant jobs are even being posted). I have never had luck with Indian recruiters before but I have had good experiences professionally in offshoring some repetitive tasks for cheap. This process sounds like it fits the bill. The part where it gets sketchy is they want either access to my LinkedIn/Gmail or they want me to create second LinkedIn/Gmail accounts that they would have control over. Access to my gmail is a nonstarter obviously. But creating spoof LinkedIn/Gmails feels a little sketchy. If we're living in a universe where these guys are simply trying to provide the service they've described, I'm all in. I just don't want to get soft-rolled into some sort of scam.

r/datascience12 upvotes

Got an offer manager track in my smaller fintech or go to major retailer

I have a job offer of manager with big retailer around 160-170 total comp with all the benefits. I expect just salary and bonus to be 143k then we add in the profit sharing, stocks and equity, rrsp contributions we expect the comp to push that generous number. Big retailer. Currently i make 120.5k. Small niche fintech. 3 years of experience i perform as a DS but did a pretty good job in my current role and i do genuinely innovate. So i am also on track to be manager in my current role. Type of work: Retailer is a lot of causal inference. I have to manage 4 people eventually 6. Building team from scratch in a pressure cooker environment. Fintech is a lot of credit risk and end to end ownership + docker + portfolio management + causal inference. I am going to take it to my manager and see the offer on the table. My big boss is super generous so it’s not out of the table to get great salaries. Unprompted i got an offer from 102500 total to 120.5. So i am 100%. Environment: Big retailer: 4 days in office Fintech: 2-3 days in offie probably 3 by next years. People: Big retailer: dont know but i go back to corporate. Fintech: we do have a bunch of idiots in the company and execs are not really my favorite. I do like some of our senior leadership but the top exec other than 1 exec i dont really like them. Career outlook: i came from original bank i had more interviews with big tech in the big bank than i did with fintech. Most of my interviews came from the fact i work in a big bank. So maybe going to big tech might be the play. I am gunning for the big tech roles so i am pushing as much as possible to hit the 180-200k comps so i can then climb the ladder. Do note for retailer I rejected their senior ds offer as it matched my comp. So they went in with manager and then svps sought me out. I interviewed and left a strong impression of how I explain + scope things as I do end to end ownership on my fintech role. Career insight is appreciated.

r/datascience9 upvotes

How do you calculate your hourly rate, if you were to consider contract over FTE?!

I have always been an FTE in this field, receiving compensations and benefits that extend far beyond the base salary. For many years now, every contract opportunity a recruiter presented never made financial sense to me, regardless of the level, and even for top FAANG employers known for generous pay packages. Is this really the case and contract workers are scammed in this field? or is it just my luck? Or is it the recruiters robbing us? For reference, I take my annual TC, divide it by 48 Ɨ 40 (weeks times hours), because there will be at least 4 unpaid vacation weeks if I contract, to estimate my hourly rate, which isn't even fair to me because I am not factoring benefits. Anyway, the value I get is always multiples more than the best contract offer a recruiter presented. So am I doing it wrong?! t

r/datascience6 upvotes

Data Team Benchmarks

I put together some charts to help benchmark data teams:Ā [http://databenchmarks.com/](http://databenchmarks.com/) For example * Average data team size as % of the company (hint: 3%) * Median salary across data roles for 500 job postings in Europe * Distribution of analytics engineers, data engineers, and analysts * The data-to-engineer ratio at top tech companies The data comes from LinkedIn, open job boards, and a few other sources.

šŸ”—Data Sources

Last updated: 2025-12-27O*NET Code: 15-2051.00

Work as a Data Scientists?

Help us make this page better. Share your real-world experience, correct any errors, or add context that helps others.