Sat.Jan 14, 2023 - Fri.Jan 20, 2023

article thumbnail

Replacing Pandas with Polars. A Practical Guide.

Confessions of a Data Guy

I remember those days, oh so long ago, it seems like another lifetime. I haven’t used Pandas in many a year, decades, or whatever. We’ve all been there, done that. Pandas I mean. I would dare say it’s a rite of passage for most data folk. For those using Python, it’s probably one of the […] The post Replacing Pandas with Polars.

Python 361
article thumbnail

How To Hire Junior Data Engineers

Seattle Data Guy

With all the recent data events I have put together I inevitably run into new data engineers who are either finishing up college or looking to transition into a data engineer or data scientist position. In fact I have talked to several newly graduated engineers who are struggling to find work. A few told me… Read more The post How To Hire Junior Data Engineers appeared first on Seattle Data Guy.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

What Big Tech layoffs suggest for the industry

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. To get the full issues, twice a week: subscribe here. Update on 20 January: less than a day after publishing this article, Google announced historic layoffs that will impact ~12,000 positions.

Banking 141
article thumbnail

Data News — Week 23.03

Christophe Blefari

Summer in coming ( credits ) Hey, new Friday, new Data News edition. I'm so happy to see new people coming every week. Thank you for every recommendation you do about the blog or the Data News. This kindness for my content gives me wings. This week I don't want to be late, so let's start the weekly wrap-up. I got less inspired this week, it means shorter edition.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Building Applications With Data As Code On The DataOS

Data Engineering Podcast

Summary The modern data stack has made it more economical to use enterprise grade technologies to power analytics at organizations of every scale. Unfortunately it has also introduced new overhead to manage the full experience as a single workflow. At the Modern Data Company they created the DataOS platform as a means of driving your full analytics lifecycle through code, while providing automatic knowledge graphs and data discovery.

Coding 130
article thumbnail

What Is The State Of Data Engineering And Infrastructure In 2023

Seattle Data Guy

2022 is coming to an end. What is the state of data infra? Are Snowflake and Databricks still fighting over total cost of ownership? Is everyone switching to DuckDB? Are data engineers all learning Rust? Let’s try to answer these questions. Our team is putting together an all day event focused on helping answer some… Read more The post What Is The State Of Data Engineering And Infrastructure In 2023 appeared first on Seattle Data Guy.

More Trending

article thumbnail

Data News — Week 23.02

Christophe Blefari

Abandoned Pandas ( credits ) Hey. I have busy weeks, I'm sorry Data News are coming on Saturday again. This is a bit hard to travel by train, work and write at the same time. Plus I'm a fast context switcher, so it piles up. Also a few of you have sent me messages recently and I've not yet answered, I see you and I did not forget you.

Python 130
article thumbnail

Driving Data, Delivering Value: Data Leaders to Watch in 2023

Snowflake

The Chief Data Officer is arguably one of the most important roles at a company, particularly those that aspire to be data-driven. CDO appointments and the elevation of data leaders have accelerated in recent years, and the role has morphed as perceptions of data have evolved. Responsibilities span strategy and execution, people and processes, and the technology needed to deliver on the promise of data.

Data 111
article thumbnail

Why You Should Simplify Your Data Infrastructure

Seattle Data Guy

Good Design Is Easier to Change Than Bad Design – The Pragmatic Programmer Programming is just one aspect of the difficulties of tech work for data engineers. Creating simple yet robust systems that help manage your data infrastructure is equally important. This challenge of building a simple yet robust data infrastructure remains even with no-code/low-code solutions.

Data 130
article thumbnail

20 Questions (with Answers) to Detect Fake Data Scientists: ChatGPT Edition, Part 1

KDnuggets

Can ChatGPT provide answers to data science questions to the same standard of humans? Check out this attempt to do so, and compare the answers to those from experts.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Devpod: Improving Developer Productivity at Uber with Remote Development

Uber Engineering

In this blog, we share how we improved the daily edit-build-run developer experience using DevPods, Uber’s remote development environment. We cover the challenges, pain points, our architecture, and lastly the future of remote development at Uber.

article thumbnail

Data Integrity Trends for 2023

Precisely

For most enterprises, 2022 was a year of transition, as companies struggled to figure out how to accomplish more with fewer resources. Technology helped to bridge the gap, as AI, machine learning, and data analytics drove smarter decisions, and automation paved the way for greater efficiency. Data integrity trends for 2023, has agility toping the list of success factors for most firms, as business leaders focus on rapid time to value and an emphasis on responding quickly to emerging opportunitie

article thumbnail

Functional Python, Part II: Dial M for Monoid

Tweag

Tweagers have an engineering mantra — Functional. Typed. Immutable. — that begets composable software which can be reasoned about and avails itself to static analysis. These are all “good things” for building robust software, which inevitably lead us to using languages such as Haskell, OCaml and Rust. However, it would be remiss of us to snub languages that don’t enforce the same disciplines, but are nonetheless popular choices in industry.

Python 102
article thumbnail

SQL and Data Integration: ETL and ELT

KDnuggets

In this article, we will discuss use cases and methods for using ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) processes along with SQL to integrate data from various sources.

SQL 129
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Reducing Logging Cost by Two Orders of Magnitude using CLP

Uber Engineering

Uber’s Data team discusses how they used CLP to scale log ingestion, retention, and analytics for Petabytes of Spark logs, reducing log storage and management costs by 169x.

article thumbnail

New! Diversity, equity, and inclusion analysis SpotApp helps businesses improve employee diversity

ThoughtSpot

Tech has a diversity problem. As a veteran People leader, I see and hear about it all the time — in media , in the board room, and in my daily work. And yet, as much as our industry is known for solving large-scale problems and disrupting the status quo, improvement in this area doesn’t seem to be happening fast enough. Why not? When I look at companies leading our industry in DEI, there’s one thing that stands out: data.

article thumbnail

What’s New With SQL User-Defined Functions

databricks

Since their initial release, SQL user-defined functions have become hugely popular among both Databricks Runtime and Databricks SQL customers. This simple yet powerful.

SQL 92
article thumbnail

Fast-track your next move with in-demand data skills

KDnuggets

DataCamp offers over 400 interactive courses, projects, and career tracks in the most popular data technologies such as Python, SQL, R, Power BI, and Tableau. Start today and save up to 67% on career-advancing learning.

BI 129
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Uber’s Next Gen Push Platform on gRPC

Uber Engineering

Uber’s API platform team talks about how they built their Next Generation Push Platform on gRPC which helped improve the reliability and latency of messages significantly.

98
article thumbnail

Leveraging Snowflake to Enable Genomic Analytics at Scale

Snowflake

Genomic data, which is the DNA data of organisms, is essential to life sciences companies. For population studies, anonymized data sets can link long-term health histories with treatment patterns and genomic variations, making it possible to analyze effective approaches for subpopulations. In clinical trials and drug discovery, pharmaceutical research that combines patient health data, drug effectiveness, and genomic variations can improve outcomes and speed time to market.

article thumbnail

Easy Ingestion to Lakehouse With COPY INTO

databricks

A new data management architecture known as the data lakehouse emerged independently across many organizations and use cases to support AI and BI.

BI 96
article thumbnail

How to Use Python and Machine Learning to Predict Football Match Winners

KDnuggets

We will be learning web scraping and training supervised machine-learning algorithms to predict winning teams.

article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

Deduping and Storing Images at Uber Eats

Uber Engineering

Our engineers discuss how we dedupe and store millions of product images at Uber Eats using a content-addressable caching layer, which saves millions of image downloads every hour and ensures that every image is only stored once.

article thumbnail

DevTernity conference 2022 by Robat Williams

Scott Logic

Late last year I had the chance to attend DevTernity , an all-remote generalist software development conference. The first day was the main conference day, with the second (optional) day offering a choice of workshops by some of the speakers. It was a great conference. In this post I’ll cover off some points of interest from some of the talks I chose to attend, and reflect on the remote conference experience.

article thumbnail

New Built-in Functions for Databricks SQL

databricks

Built-in functions extend the power of SQL with specific transformations of values for common needs and use cases. For example, the LOG10 function.

SQL 94
article thumbnail

Data Lakes and SQL: A Match Made in Data Heaven

KDnuggets

In this article, we will discuss the benefits of using SQL with a data lake and how it can help organizations unlock the full potential of their data.

Data Lake 108
article thumbnail

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Speaker: Evelyn Chou

Choosing the right business intelligence (BI) platform can feel like navigating a maze of features, promises, and technical jargon. With so many options available, how can you ensure you’re making the right decision for your organization’s unique needs? 🤔 This webinar brings together expert insights to break down the complexities of BI solution vetting.

article thumbnail

How Uber Optimizes the Timing of Push Notifications using ML and Linear Programming

Uber Engineering

The Uber Eats team shares how they built a novel system with machine learning and linear programming to send the right message at the right time to its users.

article thumbnail

The Insurance Industry is Ready for a lot More Change

Teradata

The dwindling personal auto insurance market is a harbinger of a lot more change to come. Find out more.

article thumbnail

Language Models, Explained: How GPT and Other Models Work

AltexSoft

In 2020, a remarkable AI took Silicon Valley by storm. Dubbed GPT-3 and developed by OpenAI in San Francisco, it was the latest and strongest of its kind — a “large language model” capable of producing fluent text after having ingested billions of words from books, articles, and websites. According to the paper “Language Models are Few-Shot Learners” by OpenAI, GPT-3 was so advanced that many individuals had difficulty distinguishing between news stories generated by the model and those written

article thumbnail

Top Posts January 9-15: Python Matplotlib Cheat Sheets

KDnuggets

Python Matplotlib Cheat Sheets • How to Select Rows and Columns in Pandas • 7 Best Platforms to Practice SQL • How to Perform Unit Testing in Python? • Google Data Analytics Certification Review.

Python 101
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.