Sat.May 20, 2023 - Fri.May 26, 2023

article thumbnail

7 Data Engineering Projects To Put On Your Resume

Seattle Data Guy

Starting new data engineering projects can be challenging. Data engineers can get stuck on finding the right data for their data engineering project or picking the right tools. And many of my Youtube followers agree as they confirmed in a recent poll that starting a new data engineering project was difficult. Here were the key… Read more The post 7 Data Engineering Projects To Put On Your Resume appeared first on Seattle Data Guy.

article thumbnail

Layoffs push down scores on Glassdoor: this is how companies respond

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and high-growth startups through the lens of engineering managers and senior engineers. In this issue, we cover one out of six topics from today’s subscriber-only The Scoop issue. To get full articles twice a week, subscribe here.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Conversation with Sumeet, Software Engineer at Natwest Group

Analytics Vidhya

Introduction Join us in this interview as Sumeet shares his background, journey as a former Data Scientist to a software engineer, and learn the captivating aspects of his current job. He provides insights into the future of data science and software engineering and offers valuable advice for career transitioners. Let’s dive into our conversation with […] The post Conversation with Sumeet, Software Engineer at Natwest Group appeared first on Analytics Vidhya.

article thumbnail

Keep Your Data Lake Fresh With Real Time Streams Using Estuary

Data Engineering Podcast

Summary Batch vs. streaming is a long running debate in the world of data integration and transformation. Proponents of the streaming paradigm argue that stream processing engines can easily handle batched workloads, but the reverse isn't true. The batch world has been the default for years because of the complexities of running a reliable streaming system at scale.

Data Lake 162
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

GPT-4 is Vulnerable to Prompt Injection Attacks on Causing Misinformation

KDnuggets

ChatGPT might have some loophole to provide unreliable facts.

160
160
article thumbnail

Neeva Acquired by Snowflake

Snowflake

Comments

144
144

More Trending

article thumbnail

Data Modeling - The Unsung Hero of Data Engineering: Architecture Pattern, Tools and the Future (Part 3)

Simon Späti

Welcome to the third and final installment of our series “Data Modeling: The Unsung Hero of Data Engineering.” If you’ve journeyed with us from Part 1, where we dove into the importance and history of data modeling, or joined us in Part 2 to explore various approaches and techniques, I’m delighted you’ve stuck around. In this third part, we’ll delve into data architecture patterns and their influence on data modeling.

article thumbnail

AI is Eating Data Science

KDnuggets

When it's all said and done, and AI has been universally recognized as our rightful overlords, the idea of data science as a standalone field will have been but a blip on our collective radar.

article thumbnail

What's new in Apache Spark 3.4.0 - Structured Streaming and correctness issue

Waitingforcode

Apache Spark is infamous for its correctness issue for chained stateful operations. Fortunately things get improved in each release. The most recent one, the 3.4.0, also got some important changes on that field!

IT 130
article thumbnail

Functional Python, Part III: The Ghost in the Machine

Tweag

Tweagers have an engineering mantra — Functional. Typed. Immutable. — that begets composable software which can be reasoned about and avails itself to static analysis. These are all “good things” for building robust software, which inevitably lead us to using languages such as Haskell, OCaml and Rust. However, it would be remiss of us to snub languages that don’t enforce the same disciplines, but are nonetheless popular choices in industry.

Python 113
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Data Modeling - The Unsung Hero of Data Engineering: Architecture Pattern, Tools and the Future (Part 3)

Simon Späti

Welcome to the third and final installment of our series “Data Modeling: The Unsung Hero of Data Engineering.” If you’ve journeyed with us from Part 1, where we dove into the importance and history of data modeling, or joined us in Part 2 to explore various approaches and techniques, I’m delighted you’ve stuck around. In this third part, we’ll delve into data architecture patterns and their influence on data modeling.

article thumbnail

The Future of AI: Exploring the Next Generation of Generative Models

KDnuggets

What Generative AI is currently capable of and the current challenges it needs to overcome to explore the next wave of generative AI models?

IT 145
article thumbnail

ArcGIS and Apache Log4j Vulnerabilities

ArcGIS

Esri's updated statement regarding Log4j vulnerabilities (Log4Shell) and ArcGIS products

113
113
article thumbnail

Model Risk Management, a true accelerator to corporate AI

databricks

Special thanks to EY's Mario Schlener, Wissem Bouraoui and Tarek Elguebaly for their support throughout this journey and their contributions to this blog.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Data Freshness Explained: Making Data Consumers Wildly Happy

Monte Carlo

What is data freshness and why is it important? Data freshness, sometimes referred to as data timeliness, is the frequency in which data is updated for consumption. It is an important dimension of data quality and a pillar of data observability because recently refreshed data is more accurate, and thus more valuable. Since it is impractical and expensive to have all data refreshed on a near real-time basis, data engineers ingest and process most analytical data in batches with pipelines designed

article thumbnail

A Deep Dive into GPT Models: Evolution & Performance Comparison

KDnuggets

The blog focuses on GPT models, providing an in-depth understanding and analysis. It explains the three main components of GPT models: generative, pre-trained, and transformers.

IT 132
article thumbnail

Discover Your Data’s Depth: Applications of ArcGIS Bathymetry Webinar

ArcGIS

Discover the power of ArcGIS Bathymetry in our upcoming webinar on June 20th. Learn how this advanced tool can empower your organization.

104
104
article thumbnail

Driving a Large Language Model Revolution in Customer Service and Support

databricks

Want to build your own LLM-enabled bot? Download our end-to-end solution accelerator here. Business leaders are universally excited for the potential of large.

Building 105
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Representation online matters: practical end-to-end diversification in search and recommender…

Pinterest Engineering

Representation online matters: practical end-to-end diversification in search and recommender systems Bhawna Juneja | Senior Machine Learning Engineer; Pedro Silva | Senior Machine Learning Engineer; Shloka Desai | Machine Learning Engineer II; Ashudeep Singh | Machine Learning Engineer II; Nadia Fawaz | (former) Inclusive AI Tech Lead Introduction Pinterest is a platform designed to bring everyone the inspiration to create a life they love.

article thumbnail

Free ChatGPT Course: Use The OpenAI API to Code 5 Projects

KDnuggets

With all the buzz surrounding the ChatGPT. Are you eager to make the most out of it? Here is the FREE video course that offers a comprehensive education about OpenAI API through detailed explanations and hands-on projects.

Project 132
article thumbnail

A suite of sample geoprocessing tools for managing hyperlinks

ArcGIS

Learn more about a suite of sample data management tools to enable, add, remove or disable media hyperlinks to feature classes in geodatabases.

article thumbnail

Asian Employee Network: Celebrating the Expansive Asian Culture

databricks

The Asian Employee Network (AEN) launched two years ago, during Lunar New Year 2021. AEN was created with the objective of building a.

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

How to mask PII data with FPE using Azure Synapse

Towards Data Science

Learn to do Format Preserving Encryption (FPE) at scale, securely move data from production to test environments Continue reading on Towards Data Science »

article thumbnail

Introducing MPT-7B: A New Open-Source LLM

KDnuggets

An LLM Trained on 1T Tokens of Text and Code by MosaicML Foundation Series.

Coding 112
article thumbnail

Porting ArcGIS Desktop Schematic Diagrams to ArcGIS Pro Network Diagrams

ArcGIS

Learn how to port schematic diagrams created with ArcGIS Schematics to network diagrams from utility or trace networks using ArcGIS Pro

article thumbnail

The Executive’s Guide to Data, Analytics and AI Transformation, Part 5: Make informed build vs. buy decisions

databricks

A key piece of your data and AI transformation strategy will involve the decision around which components of the data ecosystem are built.

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Writing design docs for data pipelines

Towards Data Science

Exploring the what, why, and how of design docs for data components  —  and why they matter.

article thumbnail

12 VSCode Tips and Tricks for Python Development

KDnuggets

Simple tips on doing less and achieving more from VSCode.

Python 112
article thumbnail

Top 5 Marketing Trends from a Chief Marketing Officer

Precisely

Author’s note: this article about marketing trends has been adapted from an article originally published in The CMO. What are your goals in 2023, and which marketing trends can help you achieve them? In my role as Chief Marketing Officer (CMO) here at Precisely, an important part of what I do is to keep a finger on the pulse of the latest marketing innovations and strategize with my team around how we may be able to capitalize on industry trends to produce even bigger and better results.

article thumbnail

Announcing the Public Preview of Azure Databricks support for Azure confidential computing

databricks

We are excited to announce Azure Databricks support for Azure confidential computing (ACC) in preview! With this announcement, customers can run their Azure.

98
article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m