Thu.Jun 20, 2024

article thumbnail

What I’ve Learned After A Decade Of Data Engineering

Confessions of a Data Guy

After 10 years of Data Engineering work, I think it’s time to hang up the proverbial hat and ride off into the sunset, never to be seen again. I wish. Everything has changed in 10 years, yet nothing has changed in 10 years, how is that even possible? Sometimes I wonder if I’ve learned anything […] The post What I’ve Learned After A Decade Of Data Engineering appeared first on Confessions of a Data Guy.

article thumbnail

Deploying Machine Learning Models: A Step-by-Step Tutorial

KDnuggets

Image by author Model deployment is the process of trained models being integrated into practical applications. This includes defining the necessary environment, specifying how input data is introduced into the model and the output produced, and the capacity to analyze new data and provide relevant predictions or categorizations.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

The Future of Telecoms: Embracing Gen AI as a Strategic Competitive Advantage

Snowflake

The telecom industry is undergoing an unprecedented transformation. Fueled by tech advancements such as 5G, cloud computing, Internet of Things (IoT) and machine learning (ML), telecoms have the opportunity to reshape and streamline operations and make significant improvements in service delivery, customer experience and network optimization. Key to these technologies is generative AI (gen AI), a dynamic form of artificial intelligence that leverages vast amounts of data to analyze and produce r

article thumbnail

Creating AI-Driven Solutions: Understanding Large Language Models

KDnuggets

Understanding LLMs is pivotal in unlocking the full potential of AI-driven solutions across various domains. As we navigate the process of building AI-driven solutions, it is essential to approach the development and deployment of LLMs with a focus on responsible AI practices.

Building 148
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Databricks Named a Leader in 2024 Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms

databricks

We are excited to announce that Gartner has recognized Databricks as a Leader in the 2024 Gartner® Magic Quadrant™ for Data Science and.

article thumbnail

The Only Course You Need to Smash Your Data Analyst Career

KDnuggets

Stop roaming the internet trying to find the perfect data analyst course and read this!

Data 131

More Trending

article thumbnail

A Recap of the Data Engineering Open Forum at Netflix

Netflix Tech

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024 The Data Engineering Open Forum at Netflix on April 18th, 2024. At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale. Netflix is not the only place where data engineers are solving challenging problems with creative solutions.

article thumbnail

 It’s Not Just About AI: Does Your Data Strategy Match Your Ambition? 

Snowflake

Recent Snowflake workshops and roundtables have started with the question: “Does your data strategy match your AI ambition?” It certainly sparks customer engagement, but is that the right question to ask? Right now, it seems appropriate with all of the interest — dare I say “hype” — around AI. But it merely reflects the current darling of the tech world, focusing on the technology itself, rather than the ultimate goal.

Food 105
article thumbnail

Redefining Hosting: A Customer-Driven Journey to Better Deployments

Monte Carlo

No two companies are ever quite the same. Some teams have more security needs. Other teams are concerned about costs or administration requirements. So, when it comes to how organizations choose to deploy new software, there’s never a one-size-fits-all approach. That’s particularly true when you’re working with a customer resource as critical as data.

AWS 52
article thumbnail

The Best AWS Glue Tutorial: 3 Major Aspects

Hevo

ETL (Extract, Transform, and Load) is an emerging topic in all IT Industries. Industries often look for an easy solution to do ETL on their data without spending much effort on coding. If you’re also looking for such a solution, then you’ve landed in the right place.

AWS 52
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Failing to Auto Scale Elasticsearch in Kubernetes

Zalando Engineering

Introduction In Lounge by Zalando, we run an Elasticsearch cluster in Kubernetes to store user facing article descriptions. Our business model is such that we receive about three times the normal load during the busy hour in the morning and therefore we use schedules to automatically scale in and out applications to handle that peak. If scaling out in the morning fails, we face a potential catastrophe.

AWS 85
article thumbnail

Setting up Redshift Data Lake Export: Made Easy 101

Hevo

AWS (Amazon Web Services) is one of the leading providers of Cloud Services. It provides Cloud services like Amazon Redshift, Amazon S3, and many others for Data Storage. Extract, Transform, Load are 3 important steps performed in the field of Data Warehousing and Databases.

article thumbnail

Data Quality Anomaly Detection: Everything You Need to Know

Monte Carlo

I bet you’re tired of hearing it at this point: garbage in, garbage out. It’s the mantra for data teams, and it underlines the importance of data quality anomaly detection for any organization. The quality of the input affects the quality of the output – and in order for data teams to produce high-quality data products, they need high-quality data from the very start.

article thumbnail

Oracle Streams CDC: Detailed Guide

Hevo

Introduction The purpose of this post is to introduce you to Oracle Streams concepts. You will learn about the various components that make up the Oracle Streams technology. Towards the end, you will find practical examples detailing how to implement Oracle Streams CDC in a production environment. What is Oracle Streams?

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

8 Powerful Benefits of Change Data Capture

Hevo

As data grows at a massive scale, industries are adopting new ways to manage data effectively. One of the most popular techniques for managing data is CDC. The benefits of change data capture (CDC) enables organizations to capture changes made to data sources.

Data 40
article thumbnail

Snowflake Security & Sharing Best Practices

Hevo

Businesses today are overflowing with data and thus are majorly dependent on big data platforms that support digital transformation through which they can streamline the flow of data for real-time insights delivery and better decision making. This article will take you through some of the important aspects of Snowflake security and sharing practices.

article thumbnail

Best Snowflake Performance Tuning Tactics

Hevo

In recent years, businesses worldwide have scaled up their Data Collection operations, leading to the term ‘Big Data.’ Today, companies collect information from various sources, including Business Transactions, Industrial Equipment, Social Media, and more. Accordingly, these organizations need an efficient way of storing and analyzing this information.

Media 40
article thumbnail

Snowflake Security & Sharing Best Practices

Hevo

Businesses today are overflowing with data and thus are majorly dependent on big data platforms that support digital transformation through which they can streamline the flow of data for real-time insights delivery and better decision making. This article will take you through some of the important aspects of Snowflake security and sharing practices.

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Cloud Data Ingestion Simplified 101

Hevo

The surge in Big Data and Cloud Computing has created a huge demand for real-time Data Analytics. Companies rely on complex ETL (Extract Transform and Load) Pipelines that collect data from sources in the raw form and deliver it to a storage destination in a form suitable for analysis.

article thumbnail

8 Powerful Benefits of Change Data Capture

Hevo

As data grows at a massive scale, industries are adopting new ways to manage data effectively. One of the most popular techniques for managing data is CDC. The benefits of change data capture (CDC) enables organizations to capture changes made to data sources.

Data 40
article thumbnail

Setting up Redshift Data Lake Export: Made Easy 101

Hevo

AWS (Amazon Web Services) is one of the leading providers of Cloud Services. It provides Cloud services like Amazon Redshift, Amazon S3, and many others for Data Storage. Extract, Transform, Load are 3 important steps performed in the field of Data Warehousing and Databases.

article thumbnail

Data Ingestion Azure Data Factory Simplified 101

Hevo

As data collection within organizations proliferates rapidly, developers are automating data movement through Data Ingestion techniques. However, implementing complex Data Ingestion techniques can be tedious and time-consuming for developers.

article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

article thumbnail

How to Set Up Amazon Redshift ODBC Driver Connection

Hevo

Are you trying to set up an Amazon Redshift ODBC Driver connection? Have you looked all over the internet to achieve it? If yes, then this blog will answer all your queries. ODBC (Open Database Connectivity) is an interface by Microsoft. You can use it to connect your application to a database.

article thumbnail

Oracle Streams CDC: Detailed Guide

Hevo

Introduction The purpose of this post is to introduce you to Oracle Streams concepts. You will learn about the various components that make up the Oracle Streams technology. Towards the end, you will find practical examples detailing how to implement Oracle Streams CDC in a production environment. What is Oracle Streams?

article thumbnail

Debezium Testing for CDC using Test Containers: 3 Easy Steps

Hevo

Debezium is a distributed, open-sourced platform for tracking real-time changes in databases. It is called an event streaming platform as it converts data changes on databases into events, and when such changes are accessed by different applications to process the information further.

article thumbnail

Redshift Incremental Load: 2 Easy Methods

Hevo

Data loading is a surmountable task for organizations all over the world. While several platforms make this task easier, several data loading issues surface regularly. Amazon’s Redshift is a popular choice for data loading in an organized manner. In this blog, you will learn how to perform Redshift Incremental Load.

Data 40
article thumbnail

Apache Airflow® 101 Essential Tips for Beginners

Apache Airflow® is the open-source standard to manage workflows as code. It is a versatile tool used in companies across the world from agile startups to tech giants to flagship enterprises across all industries. Due to its widespread adoption, Airflow knowledge is paramount to success in the field of data engineering.

article thumbnail

A Guide to Snowpark in Snowflake[+4 Tips to Get the Most Value from Snowpark]

Hevo

Traditionally, when working with Spark workloads, you would have to run separate processing clusters for different languages. Capacity management and resource sizing are also a hassle. Snowflake addressed these problems by providing native support for different languages. With consistent security, governance policies, and simplified capacity management, Snowflake pulls ahead as a great alternative to Spark.

article thumbnail

Understanding Google BigQuery ML: Simplified 101

Hevo

In this article, you will learn about Google BigQuery ML and its features. You will also read about different Machine Learning models supported in it. Introduction to Google BigQuery ML It is a new feature of Google BigQuery that is currently in the beta phase.

article thumbnail

dbt Redshift: Set Up & 3 Best Use Cases Explained

Hevo

ChatGPT has transformed the way businesses look at AI to support their functions. It has started showing its power by automating customer support and improving customer experience. dbt (data build tool) is just like that. You can create your own transformations with dbt using SQL SELECT statements.

SQL 40