Sat.May 14, 2022 - Fri.May 20, 2022

article thumbnail

Setting up a local development environment for python data projects using Docker

Start Data Engineering

1. Introduction 2. Set up 3. Reproducibility 3.1. Docker 3.2. Docker Compose 4. Developer ergonomics 4.1. Formatting and testing 4.2. Makefile 5. Conclusion 6. Further reading 7. References 1. Introduction Data systems usually involve multiple systems, which makes local development challenging.

Project 148
article thumbnail

Azure Data Factory: Stored Procedure Activity

Azure Data Engineering

When it comes to transforming structured data, (e.g., applying business logic, standardization etc.) stored in a database, SQL is the most convenient and fit-to-purpose option. Stored procedures provide a way to store the transformation logic as a set of SQL statements that can be re-executed as pre-compiled code. The Stored Procedure Activity in Data Factory provides and simple and convenient way to execute Stored Procedures.

SQL 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

The Complete Collection of Data Science Books – Part 1

KDnuggets

Read the best books on Programming, Statistics, Data Engineering, Web Scraping, Data Analytics, Business Intelligence, Data Applications, Data Management, Big Data, and Cloud Architecture.

article thumbnail

What’s New in Apache Kafka 3.2.0

Confluent

I’m proud to announce the release of Apache Kafka 3.2.0 on behalf of the Apache Kafka® community. The 3.2.0 release contains many new features and improvements. This blog will highlight […].

Kafka 139
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Designing And Deploying IoT Analytics For Industrial Applications At Vopak

Data Engineering Podcast

Summary Industrial applications are one of the primary adopters of Internet of Things (IoT) technologies, with business critical operations being informed by data collected across a fleet of sensors. Vopak is a business that manages storage and distribution of a variety of liquids that are critical to the modern world, and they have recently launched a new platform to gain more utility from their industrial sensors.

Designing 100
article thumbnail

Becoming AI-First: How to Get There

Cloudera

Deciding to adopt an AI-first strategy is the easy part. Figuring out how to implement it takes a little more effort. It requires a clear-eyed vision built around well-defined goals and a realistic execution plan. Being AI-first means setting up your organization for the future. By leveraging data, analytics, and automation, a company can gain a better understanding of where it is and where it needs to go.

More Trending

article thumbnail

The Results Are in From The First Ever Data in Motion Report

Confluent

There’s an increasing need for businesses to act intelligently and in real time to win in today’s digital-first world. To achieve this, forward-thinking companies are modernizing their data infrastructure with […].

Data 75
article thumbnail

Insights And Advice On Building A Data Lake Platform From Someone Who Learned The Hard Way

Data Engineering Podcast

Summary Designing a data platform is a complex and iterative undertaking which requires accounting for many conflicting needs. Designing a platform that relies on a data lake as its central architectural tenet adds additional layers of difficulty. Srivatsan Sridharan has had the opportunity to design, build, and run data lake platforms for both Yelp and Robinhood, with many valuable lessons learned from each experience.

Data Lake 100
article thumbnail

#ClouderaLife Spotlight: Margot Tien, Software Engineer

Cloudera

From fashion to data flow, in this #ClouderaLife Spotlight Margot talks about her career transition from fashion design to cloud computing and her co-founding of Cloudera’s Asian American and Pacific Islander community Employee Resource Group amid the racial tensions of 2021. . It started with feeling stuck and ended with a brand-new career (BTW, lots of hard work in the middle).

article thumbnail

Natural Language Processing Key Terms, Explained

KDnuggets

This post provides a concise overview of 18 natural language processing terms, intended as an entry point for the beginner looking for some orientation on the topic.

Process 157
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

New Strategies Needed to Manage Acute Part Shortages

Teradata

Faced with persistent supply chain disruption automotive companies need a new approach to planning. Find out more.

article thumbnail

Optimizing dbt Models with Redshift Configurations

dbt Developer Hub

If you're reading this article, it looks like you're wondering how you can better optimize your Redshift queries - and you're probably wondering how you can do that in conjunction with dbt. In order to properly optimize, we need to understand why we might be seeing issues with our performance and how we can fix these with dbt sort and dist configurations.

article thumbnail

Stream Processing vs. Batch Processing: What to Know

Confluent

With more data being produced in real time by many systems and devices than ever before, it is critical to be able to process it in real time and get […].

Process 59
article thumbnail

How to Manage Your Complex IT Landscape with AIOps

KDnuggets

Complete guide and blog post series on IT Operations Management with AIOps. Using AI and Machine Learning to manage IT complexity to deliver world class IT service while keeping the lights on.

IT 123
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

What Does a Data Engineer do? A 2023 Guide with Tops Skills

Emeritus

With businesses increasingly relying on data for their day-to-day operations, the role of a data engineer has emerged as one of the most sought-after professions in the industry. But what does a data engineer do exactly? And why is it in demand? According to McKinsey, by 2025, smart workflows and seamless interactions between humans and… The post What Does a Data Engineer do?

article thumbnail

Data Engineering Annotated Monthly – April 2022

Big Data Tools

Long time no see! Sorry about the silence, but luckily we’re back. Hi, I’m Pasha Finkelshteyn , and I’ll be your guide through this month’s news. I’ll offer my impressions of recent developments in the data engineering space and highlight new ideas from the wider community. If you think I missed something worthwhile, catch me on Twitter and suggest a topic, link, or anything else you want to see.

article thumbnail

SQL and Complex Queries Are Needed for Real-Time Analytics

Rockset

This is the fourth post in a series by Rockset's CTO Dhruba Borthakur on Designing the Next Generation of Data Systems for Real-Time Analytics. We'll be publishing more posts in the series in the near future, so subscribe to our blog so you don't miss them! Posts published so far in the series: Why Mutability Is Essential for Real-Time Data Analytics Handling Out-of-Order Data in Real-Time Analytics Applications Handling Bursty Traffic in Real-Time Analytics Applications SQL and Complex Queries

SQL 52
article thumbnail

Operationalizing Machine Learning from PoC to Production

KDnuggets

Most companies haven’t seen ROI from machine learning since the benefit is only realized when the models are in production. Here’s how to make sure your ML project works.

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

What Does a Data Engineer Do and How Can You Become One?

Emeritus

Data is so ubiquitous and valuable that it is touted as the new currency. From data analytics to data engineering, everything is data-centric. As Carly Fiorina, the former Chief Executive Officer of Hewlett Packard, said, “The goal is to turn data into information, and information into insight.” Data allows leaders to make informed decisions that… The post What Does a Data Engineer Do and How Can You Become One?

article thumbnail

Data Engineering Annotated Monthly – April 2022

Big Data Tools

Long time no see! Sorry about the silence, but luckily we’re back. Hi, I’m Pasha Finkelshteyn , and I’ll be your guide through this month’s news. I’ll offer my impressions of recent developments in the data engineering space and highlight new ideas from the wider community. If you think I missed something worthwhile, catch me on Twitter and suggest a topic, link, or anything else you want to see.

article thumbnail

The Ultimate Guide To Data Lineage

Monte Carlo

Data lineage isn’t new, but automation has finally made it accessible and scalable—to a certain extent. In the old days (way back in the mid-2010s), lineage happened through a lot of manual work. This involved identifying data assets, tracking them to their ingestion sources, documenting those sources, mapping the path of data as it moved through various pipelines and stages of transformation, and pinpointing where the data was served up in dashboards and reports.

article thumbnail

A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability

KDnuggets

We give a taxonomy of the trustworthy GNNs in privacy, robustness, fairness, and explainability. For each aspect, we categorize existing works into various categories, give general frameworks in each category, and more.

article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

Stakeholder-friendly model names: Model naming conventions that give context

dbt Developer Hub

Analytics engineers (AEs) are constantly navigating through the names of the models in their project, so naming is important for maintainability in your project in the way you access it and work within it. By default, dbt will use your model file name as the view or table name in the database. But this means the name has a life outside of dbt and supports the many end users who will potentially never know about dbt and where this data came from, but still access the database objects in the datab

BI 52
article thumbnail

Scala 3: General Type Projections

Rock the JVM

Scala's general type projections are considered unsound and were removed in Scala 3: discover what this means and how it affects your code

Scala 52
article thumbnail

Data Stewards Have The Worst Seat At The Table

Monte Carlo

In his seminal 2017 blog post, The Downfall of the Data Engineer , Maxime Beauchemin wrote that the data engineer had the worst seat at the table. Data technology and teams have changed tremendously since that time, and now the Preset CEO and creator of Apache Airflow and Apache Superset has a brighter outlook on the future of the profession. I have also seen what was once a thankless position turn into a strategic driver of company value as data expanded beyond dashboards to machine learning mo

article thumbnail

6 Soft Skills for Data Scientists Working Remotely

KDnuggets

As a data scientist, you might have a great portfolio of technical skills, but if you can’t communicate effectively, you won’t be able to convey your ideas clearly during virtual meetings.

Portfolio 108
article thumbnail

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Speaker: Evelyn Chou

Choosing the right business intelligence (BI) platform can feel like navigating a maze of features, promises, and technical jargon. With so many options available, how can you ensure you’re making the right decision for your organization’s unique needs? 🤔 This webinar brings together expert insights to break down the complexities of BI solution vetting.

article thumbnail

EXTRACT SQL function: Why we love it

dbt Developer Hub

There are so many different date functions in SQL—you have DATEDIFF , DATEADD , DATE_PART, and DATE_TRUNC to name a few. They all have their different use cases and understanding how and when they should be used is a SQL fundamental to get down. Are any of those as easy to use as the EXTRACT function? Well, that debate is for another time… In this post, we’re going to give a deep dive into the EXTRACT function, how it works, and why we use it.

SQL 40
article thumbnail

The 6 Python Machine Learning Tools Every Data Scientist Should Know About

KDnuggets

Let's look at six must-have tools every data scientist should use.

article thumbnail

HuggingFace Has Launched a Free Deep Reinforcement Learning Course

KDnuggets

Hugging Face has released a free course on Deep RL. It is self-paced and shares a lot of pointers on theory, tutorials, and hands-on guides.

IT 108
article thumbnail

Reinforcement Learning for Newbies

KDnuggets

A simple guide to reinforcement learning for a complete beginner. The blog includes definitions with examples, real-life applications, key concepts, and various types of learning resources.

article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.