June, 2020

article thumbnail

Aws Account

Start Data Engineering

1. AWS account Sign up for an AWS account at AWS Sign Up. You will be eligible for some free services for the first time sign up, ref: AWS Free Tier get your access key by clicking on your name -> My Security Credentials on the top pane and then clicking Create New Access Key.

AWS 130
article thumbnail

Business Intelligence meets Data Engineering with Emerging Technologies

Simon Späti

Today we have more requirements with ever-growing tools and framework, complex cloud architectures, and with data stack that is changing rapidly. I hear claims: “Business Intelligence (BI) takes too long to integrate new data”, or “understanding how the numbers match up is very hard and needs lots of analysis”. The goal of this article is to make business intelligence easier, faster and more accessible with techniques from the sphere of data engineering.

article thumbnail

EC2 & Session Manager (Toronto Project)

Team Data Science

Welcome back to this Toronto Specific data engineering project. We left off last time concluding finance has the largest demand for data engineers who have skills with AWS, and sketched out what our data ingestion pipeline will look like. I began building out the data ingestion pipeline by launching an EC2 instance. I should note that if you have created an AWS account, but have not yet created an Identity Access Management (IAM) admin role, and are therefore still using root credentials, I am s

Project 130
article thumbnail

Stream Processing with IoT Data: Challenges, Best Practices, and Techniques

Confluent

The rise of IoT devices means that we have to collect, process, and analyze orders of magnitude more data than ever before. As sensors and devices become ever more ubiquitous, […].

Process 126
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

There Are No Perfect Words…

Teradata

Juneteenth has been declared a U.S. holiday at Teradata, as we stand with the black community and reflect on what we can do to fight racism and injustice, and embrace diversity.

119
119
article thumbnail

Data Collection And Management To Power Sound Recognition At Audio Analytic

Data Engineering Podcast

Summary We have machines that can listen to and process human speech in a variety of languages, but dealing with unstructured sounds in our environment is a much greater challenge. The team at Audio Analytic are working to impart a sense of hearing to our myriad devices with their sound recognition technology. In this episode Dr. Chris Mitchell and Dr.

More Trending

article thumbnail

Netflix Studio Engineering Overview

Netflix Tech

By Steve Urban , Sridhar Seetharaman , Shilpa Motukuri , Tom Mack , Erik Strauss , Hema Kannan , CJ Barker Netflix is revolutionizing the way a modern studio operates. Our mission in Studio Engineering is to build a unified, global, and digital studio that powers the effective production of amazing content. [link] Netflix produces some of the world’s most beloved and award-winning films and series, including The Irishman, The Crown, La Casa de Papel, Ozark, and Tiger King.

article thumbnail

Understanding Azure Synapse Analytics

Advancing Analytics: Data Engineering

You might have seen that I’ve been pretty busy recently, digging into the new Azure Synapse Analytics preview, announced back at Microsoft Build 2020. I’ve explored the spark engine, SQL serverless/On-Demand and various other bits… but I’m still getting the same question of “Cool!…. but what actually is it?”. One of the problems here is that Azure SQL Data Warehouse was rebranded as “Azure Synapse Analytics”… but it’s not the same as the full workspace.

SQL 59
article thumbnail

The Cost of Apache Kafka: An Engineer’s Guide to Pricing Out DIY Operations

Confluent

When I have a small software project that I want to share with the world, I don’t write my own version control system with a web UI. I don’t even […].

Kafka 123
article thumbnail

Modernization Means Simplicity and Sophistication

Teradata

When it comes to being a modern data warehouse, your age really is just a number. It’s the underlying capabilities that actually count. Read more.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Bringing Business Analytics To End Users With GoodData

Data Engineering Podcast

Summary The majority of analytics platforms are focused on use internal to an organization by business stakeholders. As the availability of data increases and overall literacy in how to interpret it and take action improves there is a growing need to bring business intelligence use cases to a broader audience. GoodData is a platform focused on simplifying the work of bringing data to employees and end users.

article thumbnail

3 Key techniques, to optimize your Apache Spark code

Start Data Engineering

Intro A lot of tutorials show how to write spark code with just the API and code samples, but they do not explain how to write “efficient Apache Spark” code.

Coding 130
article thumbnail

12 Data Quality Metrics That ACTUALLY Matter

Monte Carlo

One of our customers recently posed this question related to data quality metrics: I would like to set up an OKR for ourselves [the data team] around data availability. I’d like to establish a single data quality KPI that would summarize availability, freshness, quality. What’s the best way to do this? I can’t tell you how much joy this request brought me.

Data 59
article thumbnail

How will Cloud HR Software change the human resources function?

U-Next

Today, the way Human Resources function has changed from how it used to operate in the 90s and early 2000. Introduction of Human Resource Management (HRM) software in the late 90s had already revolutionized the way HR departments across various industries functioned. The HR department is not contained in the back-office anymore, where dealing with paperwork and recruiting were the only processes they were involved in.

Cloud 59
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

My Python/Java/Spring/Go/Whatever Client Won’t Connect to My Apache Kafka Cluster in Docker/AWS/My Brother’s Laptop. Please Help!

Confluent

tl;dr When a client wants to send or receive a message from Apache Kafka®, there are two types of connection that must succeed: The initial connection to a broker (the […].

Kafka 123
article thumbnail

How to Leverage Advanced Analytics in the Healthcare Domain

Teradata

Learn how Teradata Vantage's advanced analytics capabilities can analyze and predict useful diagnoses and insights in biomedicine and healthcare.

article thumbnail

Accelerate Your Machine Learning With The StreamSQL Feature Store

Data Engineering Podcast

Summary Machine learning is a process driven by iteration and experimentation which requires fast and easy access to relevant features of the data being processed. In order to reduce friction in the process of developing and delivering models there has been a recent trend toward building a dedicated feature. In this episode Simba Khadder discusses his work at StreamSQL building a feature store to make creation, discovery, and monitoring of features fast and easy to manage.

article thumbnail

What, why, when to use Apache Kafka, with an example

Start Data Engineering

I have seen, heard and been asked questions and comments like What is Kafka and When should I use it? I don’t understand why we have to use Kafka The objective of this post is to get you up to speed with what Apache Kafka is, when to use them and the foundational concepts of Apache Kafka with a simple example. What is Apache Kafka First let’s understand what Apache Kafka is.

Kafka 130
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

How to Create Near Real-time Models With Just dbt + SQL

dbt Developer Hub

Before I dive into how to create this, I have to say this. You probably don’t need this. I, along with my other Fishtown colleagues, have spent countless hours working with clients that ask for near-real-time streaming data. However, when we start digging into the project, it is often realized that the use case is not there. There are a variety of reasons why near real-time streaming is not a good fit.

SQL 52
article thumbnail

PgBouncer on Kubernetes and how to achieve minimal latency

Zalando Engineering

Introduction In the new Postgres Operator release 1.5 we have implemented couple of new interesting features , including connection pooling support. Master Wq says there is "No greatest tool", to run something successfully in production one needs to understand pros and cons. Let's try to dig into the topic, and take a look at the performance aspect of connection pooler support, mostly from a scaling perspective.

article thumbnail

Spring for Apache Kafka – Beyond the Basics: Can Your Kafka Consumers Handle a Poison Pill?

Confluent

You know the fundamentals of Apache Kafka®. You are a Spring Boot developer working with Apache Kafka. You have chosen Spring Kafka for your integration. You have implemented your first […].

Kafka 122
article thumbnail

Rising from the Ashes

Teradata

Teradata's own Sir Freek Cox on dedicating one's life to charity and good works. Read more.

105
105
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Data Management Trends From An Investor Perspective

Data Engineering Podcast

Summary The landscape of data management and processing is rapidly changing and evolving. There are certain foundational elements that have remained steady, but as the industry matures new trends emerge and gain prominence. In this episode Astasia Myers of Redpoint Ventures shares her perspective as an investor on which categories she is paying particular attention to for the near to medium term.

article thumbnail

A proven approach to land a Data Engineering job

Start Data Engineering

I have seen and been asked the following questions by students, backend engineers and analysts who want to get into the data engineering industry. What approach should i take to land a Data Engineering job? I really want to get into DE. What can I do to learn more about it? In this article, I will try to provide a general approach that you as a beginner, student, backend engineer or analyst can use to land your first data engineering job.

article thumbnail

Comparing Akka Streams, Kafka Streams and Spark Streaming

Rock the JVM

Explore how Akka Streams, Kafka Streams, and Spark Streaming stack up and find out which one is best for your use case

Kafka 52
article thumbnail

Learnings from Distributed XGBoost on Amazon SageMaker

Zalando Engineering

Overview XGBoost is a popular Python library for gradient boosted decision trees. The implementation allows practitioners to distribute training across multiple compute instances (or workers), which is especially useful for large training sets. One tool used at Zalando for deploying production machine learning models is the managed service from Amazon called SageMaker.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

How Merging Companies Will Give Rise to Unified Data Streams

Confluent

Company mergers are becoming more common as businesses strive to improve performance and grow market share by saving costs and eliminating competition through acquisitions. But how do business mergers relate […].

Data 116
article thumbnail

Announcing Vantage Trial

Teradata

Vantage Trial provides free, 30-day access to Teradata Vantage in the cloud along with easy-to-use, web-based tools and applications for performing advanced analytics. Learn more.

Cloud 98
article thumbnail

Building A Data Lake For The Database Administrator At Upsolver

Data Engineering Podcast

Summary Data lakes offer a great deal of flexibility and the potential for reduced cost for your analytics, but they also introduce a great deal of complexity. What used to be entirely managed by the database engine is now a composition of multiple systems that need to be properly configured to work in concert. In order to bring the DBA into the new era of data management the team at Upsolver added a SQL interface to their data lake platform.

Data Lake 100
article thumbnail

JOINs and Aggregations Using Real-Time Indexing on MongoDB Atlas

Rockset

MongoDB.live took place last week, and Rockset had the opportunity to participate alongside members of the MongoDB community and share about our work to make MongoDB data accessible via real-time external indexing. In our session, we discussed the need for modern data-driven applications to perform real-time aggregations and joins, and how Rockset uses MongoDB change streams and Converged Indexing to deliver fast queries on data from MongoDB.

MongoDB 52
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.