October, 2022

article thumbnail

Build Data Engineering Projects, with Free Template

Start Data Engineering

1. Introduction 2. Data project template 2.1. Prerequisites 2.2. Setup infra 2.3. Tear down infra 3. Set up data infrastructure 3.1. Run data infra on your laptop with containers 3.2. Manage cloud infrastructure with code 4. Set up development workflow 4.1. CI: Automated tests & checks before the merge with GitHub Actions 4.2. CD: Deploy to production servers with GitHub Actions 4.3.

Project 148
article thumbnail

Expanding The Reach of Business Intelligence Through Ubiquitous Embedded Analytics With Sisense

Data Engineering Podcast

Summary Business intelligence has grown beyond its initial manifestation as dashboards and reports. In its current incarnation it has become a ubiquitous need for analytics and opportunities to answer questions with data. In this episode Amir Orad discusses the Sisense platform and how it facilitates the embedding of analytics and data insights in every aspect of organizational and end-user experiences.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

The Big Tech Hiring Slowdown Is Here and it will Hurt

The Pragmatic Engineer

This issue was written in Oct 2022, sent out to all subscribers of The Pragmatic Engineer Newsletter in October 2022. The observations on how Big Tech hiring will slow down have since been validated, with Meta not only laying off in November, but also rescinding offers in January 2023, and Amazon doing the same. If you want to get the pulse of the industry in your inbox, subscribe.

IT 130
article thumbnail

Rust for Data Engineering

Simon Späti

Will Rust kill Python for Data Engineers? If you only came here to know this, my answer is no. Betteridge’s Law strikes again! But then again, you have to ask: was Python made for Data Engineering in the first place? Rust may not replace Python outright, but it has consumed more and more of JavaScript tooling and there are increasingly many projects trying to do the same with Python/Data Engineering.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Independent Anniversary

Jesse Anderson

I have a calendar reminder that tells me when I founded Big Data Institute. It just told me I founded the company eight years ago. The reminder is called “Independent Anniversary.” It’s the day I split off and executed my vision for an independent, big data consulting company. Independence has all sorts of manifestations. For you, it’s an independent look at technology and vendors from someone who’s worked at a vendor (Cloudera) and worked in distributed systems for even longer.

article thumbnail

Easy Guide To Data Preprocessing In Python

KDnuggets

Preprocessing data for machine learning models is a core general skill for any Data Scientist or Machine Learning Engineer. Follow this guide using Pandas and Scikit-learn to improve your techniques and make sure your data leads to the best possible outcome.

Python 160

More Trending

article thumbnail

How To Bring Agile Practices To Your Data Projects

Data Engineering Podcast

Summary Agile methodologies have been adopted by a majority of teams for building software applications. Applying those same practices to data can prove challenging due to the number of systems that need to be included to implement a complete feature. In this episode Shane Gibson shares practical advice and insights from his years of experience as a consultant and engineer working in data about how to adopt agile principles in your data work so that you can move faster and provide more value to

Project 130
article thumbnail

Pollen’s enormous debt left behind: exclusive details

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. To get this newsletter every week, subscribe. Pollen, the events festival tech startup, went bankrupt in August after raising more than $200M in venture funding. In an exclusive investigative article , I covered the events and details leading up this bankruptcy.

Banking 130
article thumbnail

Rust for Data Engineering

Simon Späti

Will Rust kill Python for Data Engineers? If you only came here to know this, my answer is no. Betteridge’s Law strikes again! But then again, you have to ask: was Python made for Data Engineering in the first place? Rust may not replace Python outright, but it has consumed more and more of JavaScript tooling and there are increasingly many projects trying to do the same with Python/Data Engineering.

article thumbnail

The Art and Science of Data Storytelling with Brent Dykes

Jesse Anderson

My guest this week is Brent Dykes , Founder and Chief Data Storyteller at Analytics Hero. Before he founded his own company, he was at Omniture, Adobe, and Domo. Analytics Hero is a consulting business based around data storytelling Data storytelling was a new concept to me. Brent defines it as “as a structured approach for communicating insights to a targeted audience using narrative elements and explanatory visuals.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Frameworks for Approaching the Machine Learning Process

KDnuggets

This post is a summary of 2 distinct frameworks for approaching machine learning tasks, followed by a distilled third. Do they differ considerably (or at all) from each other, or from other such processes available?

article thumbnail

ClearScape Analytics: Delivering Value Across the Modern Enterprise

Teradata

ClearScape Analytics provides robust functionality giving people across the organization the ability to efficiently execute their roles in the analytics process on a common platform.

Process 105
article thumbnail

Analytics Engineering Without The Friction Of Complex Pipeline Development With Optimus and dbt

Data Engineering Podcast

Summary One of the most impactful technologies for data analytics in recent years has been dbt. It’s hard to have a conversation about data engineering or analysis without mentioning it. Despite its widespread adoption there are still rough edges in its workflow that cause friction for data analysts. To help simplify the adoption and management of dbt projects Nandam Karthik helped create Optimus.

article thumbnail

Is the strategy of joining late-stage startups for the financial upside, a dead end?

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of six topics in today’s subscriber-only The Scoop issue. To get this newsletter every week, subscribe. Between 2010 to 2021, one of the best strategies for maximizing your total compensation as a software engineer was to follow this recipe: Identify late-stage, fast-growing, private companies which seemed close to going public.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Introducing Stream Designer: The Visual Builder for Streaming Data Pipelines

Confluent

Confluent’s new Stream Designer is the industry’s first visual interface for rapidly building, testing, and deploying streaming data pipelines natively on Apache Kafka.

article thumbnail

#ClouderaLife Spotlight: Elias Avila, Sr. Staff Proactive Support Engineer

Cloudera

As we wrap up Hispanic Heritage month this #ClouderaLife Spotlight features Elias Avila, senior staff proactive support engineer for Cloudera. In this spotlight, we talk about his career in technology and his philosophy for getting the most out of work in terms of satisfaction and advancement. We also talk about his upbringing in the primarily Mexican American community of Salinas, California, and the important role Hispanics play in California’s Central Valley. .

article thumbnail

Sparse Matrix Representation in Python

KDnuggets

Leveraging sparse matrix representations for your data when appropriate can spare you memory storage. Have a look at the reasons why, see how to create sparse matrices in with Python, and compare the memory requirements for standard and sparse representations of the same data.

Python 160
article thumbnail

The Good and the Bad of Apache Kafka Streaming Platform

AltexSoft

We say ‘xerox’ speaking of any photocopy, whether or not it was created by a machine from the Xerox corporation. We describe information search on the Internet with just one word — ‘google’. We ‘photoshop pictures’ instead of editing them on the computer. And COVID-19 made ‘zoom’ a synonym for a videoconference. Kafka can continue the list of brand names that became generic terms for the entire type of technology.

Kafka 93
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Going From Transactional To Analytical And Self-managed To Cloud On One Database With MariaDB

Data Engineering Podcast

Summary The database market has seen unprecedented activity in recent years, with new options addressing a variety of needs being introduced on a nearly constant basis. Despite that, there are a handful of databases that continue to be adopted due to their proven reliability and robust features. MariaDB is one of those default options that has continued to grow and innovate while offering a familiar and stable experience.

Database 100
article thumbnail

Will Facebook / Meta do engineering layoffs?

The Pragmatic Engineer

Part of this article was originally published in The Scoop #27 , for subscribers of The Pragmatic Engineer Newsletter last week. I decided to publish this section for everyone to read after the Business Insider article claiming that 15% of Facebook employees - 12,000 people - may lose their jobs started to spread within the media. The Business Insider article was not specific to software engineers but still spread heavily within tech circles.

article thumbnail

Bringing Data Into Real Time: What You Missed at Current 2022

Confluent

Current 2022 is a wrap! Here are some of the top keynote speeches, exciting new data streaming technologies, popular sessions, and where to find videos online.

Data 105
article thumbnail

AI at Scale isn’t Magic, it’s Data – Hybrid Data

Cloudera

A recent VentureBeat article , “4 AI trends: It’s all about scale in 2022 (so far),” highlighted the importance of scalability. I recommend you read the entire piece, but to me the key takeaway – AI at scale isn’t magic, it’s data – is reminiscent of the 1992 presidential election, when political consultant James Carville succinctly summarized the key to winning – “it’s the economy”.

article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

How to Build a Data Science Enablement Team: A Complete Guide

KDnuggets

A Data Science Enablement Team consists of people from various departments like marketing, sales, product development, etc. They are responsible for providing the necessary tools and resources to help the data scientists do their job more efficiently.

article thumbnail

Generative AI Models Explained

AltexSoft

Take a look at the featured image above. Beautiful, isn’t it? The interesting thing is, it isn’t a painting drawn by some famous artist, nor is it a photo taken by a satellite. The image you see has been generated with the help of Midjourney — a proprietary artificial intelligence program that creates pictures from textual descriptions. Neural nets can create images, video, and audio content that not every person can.

article thumbnail

An Exploration Of The Open Data Lakehouse And Dremio's Contribution To The Ecosystem

Data Engineering Podcast

Summary The "data lakehouse" architecture balances the scalability and flexibility of data lakes with the ease of use and transaction support of data warehouses. Dremio is one of the companies leading the development of products and services that support the open lakehouse. In this episode Jason Hughes explains what it means for a lakehouse to be "open" and describes the different components that the Dremio team build and contribute to.

Data Lake 100
article thumbnail

Why a Cookieless Identity Solution is Critical to Future Advertising

Teradata

Implementing a cookieless identity solution will help businesses maintain advertising efforts amid the phaseout of third-party cookies.

98
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

article thumbnail

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Netflix Tech

by Jun He , Akash Dwivedi , Natallia Dzenisenka , Snehal Chennuru , Praneeth Yenugutala , Pawan Dixit At Netflix, Data and Machine Learning (ML) pipelines are widely used and have become central for the business, representing diverse use cases that go beyond recommendations, predictions and data transformations. A large number of batch workflows run daily to serve various business needs.

Java 83
article thumbnail

How to Distribute Machine Learning Workloads with Dask

Cloudera

Tell us if this sounds familiar. You’ve found an awesome data set that you think will allow you to train a machine learning (ML) model that will accomplish the project goals; the only problem is the data is too big to fit in the compute environment that you’re using. In the day and age of “big data,” most might think this issue is trivial, but like anything in the world of data science things are hardly ever as straightforward as they seem. .

article thumbnail

The ABCs of NLP, From A to Z

KDnuggets

There is no shortage of tools today that can help you through the steps of natural language processing, but if you want to get a handle on the basics this is a good place to start. Read about the ABCs of NLP, all the way from A to Z.

Process 160
article thumbnail

DataOps Observability: Taming the Chaos (part 1)

DataKitchen

Part 1: Defining the Problems. This is the first post in DataKitchen’s four-part series on DataOps Observability. Observability is a methodology for providing visibility of every journey that data takes from source to customer value across every tool, environment, data store, team, and customer so that problems are detected and addressed immediately.

article thumbnail

What Is Entity Resolution? How It Works & Why It Matters

Entity Resolution Sometimes referred to as data matching or fuzzy matching, entity resolution, is critical for data quality, analytics, graph visualization and AI. Learn what entity resolution is, why it matters, how it works and its benefits. Advanced entity resolution using AI is crucial because it efficiently and easily solves many of today’s data quality and analytics problems.