Sat.Apr 01, 2023 - Fri.Apr 07, 2023

article thumbnail

Data Engineering for Streaming Data on GCP

Analytics Vidhya

Introduction Companies can access a large pool of data in the modern business environment, and using this data in real-time may produce insightful results that can spur corporate success. Real-time dashboards such as GCP provide strong data visualization and actionable information for decision-makers. Nevertheless, setting up a streaming data pipeline to power such dashboards may […] The post Data Engineering for Streaming Data on GCP appeared first on Analytics Vidhya.

article thumbnail

Behind the Scenes with Two New Salary Transparency Websites

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. If you’re not yet a full subscriber, you missed this week’s deep-dive into Figma’s engineering culture. To get full newsletters twice a week, subscribe here.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

The Future of Work: How AI is Changing the Job Landscape

KDnuggets

With more and more companies integrating artificial intelligence into the workplace, what does this mean for employees' futures and careers?

160
160
article thumbnail

Build faster with Buck2: Our open source build system

Engineering at Meta

Buck2, our new open source, large-scale build system , is now available on GitHub. Buck2 is an extensible and performant build system written in Rust and designed to make your build experience faster and more efficient. In our internal tests at Meta, we observed that Buck2 completed builds 2x as fast as Buck1. Buck2, Meta’s open source large-scale build system, is now publicly available via the Buck2 website and the Buck2 GitHub repository.

Building 145
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Table file formats - Z-Order compaction: Apache Iceberg

Waitingforcode

Last time you discovered the Z-Order compaction in Delta Lake. But guess what? Apache Iceberg also has this feature!

130
130
article thumbnail

QuickSort in Rust!

Confessions of a Data Guy

The post QuickSort in Rust! appeared first on Confessions of a Data Guy.

Data 130

More Trending

article thumbnail

Data Modeling – The Unsung Hero of Data Engineering: An Introduction to Data Modeling (Part 1)

Simon Späti

Amidst the excitement and hype surrounding artificial intelligence, the significance of data engineering and its critical foundation—data modeling—can often be overlooked. This article is the first in a three-part series that will shine a spotlight on the fascinating world of data modeling, delving into its crucial importance within the broader context of data engineering.

article thumbnail

Mapping The Data Infrastructure Landscape As A Venture Capitalist

Data Engineering Podcast

Summary The data ecosystem has been building momentum for several years now. As a venture capital investor Matt Turck has been trying to keep track of the main trends and has compiled his findings into the MAD (ML, AI, and Data) landscape reports each year. In this episode he shares his experiences building those reports and the perspective he has gained from the exercise.

Hadoop 130
article thumbnail

Conda Init and ArcGIS Pro

ArcGIS

We're happy to announce the conda init command is now enabled for ArcGIS users of Python! Learn about how to use it, how it works, and benefits.

Python 128
article thumbnail

Exploring Data Cleaning Techniques With Python

KDnuggets

Tutorial on data cleaning techniques using Python.

Python 159
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Data Modeling – The Unsung Hero of Data Engineering: An Introduction to Data Modeling (Part 1)

Simon Späti

Amidst the excitement and hype surrounding artificial intelligence, the significance of data engineering and its critical foundation—data modeling—can often be overlooked. This article is the first in a three-part series that will shine a spotlight on the fascinating world of data modeling, delving into its crucial importance within the broader context of data engineering.

article thumbnail

Inside Look: Measuring Developer Productivity and Happiness at LinkedIn

LinkedIn Engineering

Authors: Viktoras Truchanovicius and Selina Zhang At LinkedIn, developer productivity and happiness has always been a priority. It is critical for our engineering leaders to understand how efficiently and effectively their teams are operating to continuously deliver value-added features for our members and build an industry-leading engineering culture.

MySQL 122
article thumbnail

Loading IFC files into the ArcGIS Indoors Model

ArcGIS

Organizations with IFC files can still reap the benefits of an ArcGIS Indoors deployment by following these recommendations.

article thumbnail

RAPIDS cuDF to Speed up Your Next Data Science Workflow

KDnuggets

This article will explain how RAPIDS can help you speed up your next data science workflow. RAPIDS cuDF is a GPU DataFrame library that allows you to produce your end-to-end data science pipeline development all on GPU.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Snowflake Startup Challenge 2023: Meet the 10 Semi-Finalists

Snowflake

Spring has sprung—and with it comes a new crop of Snowflake Startup Challenge semi-finalists! The 2023 submission pool was the largest to date—twice as many submissions as last year—with entries that spanned not just the globe but the breadth of the Snowflake platform. Our judges put a lot of careful consideration into selecting the top 10, and we offer our sincere thanks to every company that sent in an entry this year—we know how much hard work goes into these submissions, and we appreciate it

Raw Data 111
article thumbnail

Introducing Entity-Centric Data Modeling for Analytics

Preset

Entity-centric modeling is a data modeling approach focusing on enriching tabular datasets with useful "features" to enable segmentation, cohort creation, and complex classification analyses easier.

Datasets 111
article thumbnail

Build, Analyze, and Filter Catalog Layers in ArcGIS Pro

ArcGIS

ArcGIS Pro 3.1 introduces a new layer type—catalog layers—and this blog covers how they could be used in your analytic workflows.

Building 113
article thumbnail

8 Open-Source Alternative to ChatGPT and Bard

KDnuggets

Discover the widely-used open-source frameworks and models for creating your ChatGPT like chatbots, integrating LLMs, or launching your AI product.

Process 146
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

The BEST Resources to Level Up Your Data Streaming Knowledge!

Confluent

All the best data streaming resources, tips, and guides to help you learn introductory concepts, streaming architecture basics, common tools and technologies, and more.

article thumbnail

Databricks for GxP

databricks

What is GxP? GxP stands for "Good x Practices," where x refers to a specific discipline, such as clinical, manufacturing, or laboratory. The.

article thumbnail

Data Observability for Analytics and ML teams

Towards Data Science

Principles, practices, and examples for ensuring high quality data flows Source: DreamStudio (generated by author) Nearly 100% of companies today rely on data to power business opportunities and 76% use data as an integral part of forming a business strategy. In today’s age of digital business, an increasing number of decisions companies make when it comes to delivering customer experience, building trust, and shaping their business strategy begins with accurate data.

article thumbnail

My Data Science Six Months Success Story

KDnuggets

I will be sharing a couple of things I have learned in the past six months and tips that helped me stay dedicated and true to my journey in this article.

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Uniting the Machine Learning and Data Streaming Ecosystems - Part 2

Confluent

Machine learning and data streaming are a perfect match, but have diverging tech stacks. How can we overcome the pitfalls of SQL and the gulf between languages?

article thumbnail

Exciting new updates coming to Workflows in April

databricks

Databricks is excited to announce the release of several exciting new Workflows features that will simplify the way you create and launch automated.

98
article thumbnail

A Gentle Introduction to Analytical Stream Processing

Towards Data Science

Building a Mental Model for Engineers and Anyone in Between Stream Processing can be handled gently and with care, or wildly, and almost out of control! You be the judge of what future you’d rather embrace. credit: @psalms original_photo Introduction In many cases, processing data in-stream, or as it becomes available, can help reduce an enormous data problem (due to the volume and scale of the flow of data) into a more manageable one.

Process 98
article thumbnail

Text Summarization Development: A Python Tutorial with GPT-3.5

KDnuggets

Utilizing the power of GPT-3.5 to develop a simple summarize generator application.

Python 145
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Python Monorepo: an Example. Part 1: Structure and Tooling

Tweag

For a software team to be successful, you need excellent communication. That is why we want to build systems that foster cross-team communication. Using a monorepo is an excellent way to do that. A monorepo provides: Visibility: by seeing the pull requests (PRs) of colleagues, you are easily informed of what other teams are doing. Uniformity: by working in one central repository, it is easier to share the configuration of linters, formatters, etc.

Python 98
article thumbnail

Preview the New Workspace Browser

databricks

To simplify navigating in Databricks, we are releasing a new workspace browsing experience. The new Workspace Browser makes it easier for you to.

IT 98
article thumbnail

Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue)

Towards Data Science

Learning a little about these tools and how to integrate them Photo by Nolan Krattinger on Unsplash Introduction A few weeks ago, while doing my mental stretch to think about new post ideas, I thought: Well, I need to learn (and talk) more about cloud and these things, I’ve practiced a lot on on-premise ambients, using open-source tools, and running away from proprietary solutions… But the world is cloud and I don’t think that this is gonna change any time soon… I then wrote a post about creati

article thumbnail

5 Essential AI Tools for Data Science

KDnuggets

Learn how Bard, Bing, ChatGPT, GitHub Copilot, and Hugging Face are improving data scientists' work life.

article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m