Sat.Sep 25, 2021 - Fri.Oct 01, 2021

article thumbnail

Building Uber’s Fulfillment Platform for Planet-Scale using Google Cloud Spanner

Uber Engineering

Introduction. The Fulfillment Platform is a foundational Uber domain that enables the rapid scaling of new verticals. The platform handles billions of database transactions each day, ranging from user actions (e.g., a driver starting a trip) and system actions … The post Building Uber’s Fulfillment Platform for Planet-Scale using Google Cloud Spanner appeared first on Uber Engineering Blog.

article thumbnail

dbt(Data Build Tool) Tutorial

Start Data Engineering

1. Introduction 2. Dbt, the T in ELT 3. Project 3.1. Prerequisites 3.2. Configurations and connections 3.2.1. profiles.yml 3.2.2. dbt_project.yml 3.3 Data flow 3.3.1. Source 3.3.2. Snapshots 3.3.3. Staging 3.3.4. Marts 3.3.4.1. Core 3.3.4.2. Marketing 3.4. dbt run 3.5. dbt test 3.6. dbt docs 3.7. Scheduling 4. Conclusion 5. Further reading 6. References 1.

Building 130
article thumbnail

How to Take Notes in 2021?

Simon Späti

Taking notes helps you not to forget things, teaches you to express yourself, brainstorms your thoughts, research a topic, and so many more things. I used to take notes all my life. Maybe it’s because I’m Swiss, they say we are well organised. I used to write in OneNote for 10+ years. I have notebooks for my bachelor studies and every workplace I worked.

IT 130
article thumbnail

How to Securely Connect Confluent Cloud with Services on AWS, Azure, and GCP

Confluent

The rise of fully managed cloud services fundamentally changed the technology landscape and introduced benefits like increased flexibility, accelerated deployment, and reduced downtime. Confluent offers a portfolio of fully managed […].

Cloud 122
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Delivering Your Personal Data Cloud With Prifina

Data Engineering Podcast

Summary The promise of online services is that they will make your life easier in exchange for collecting data about you. The reality is that they use more information than you realize for purposes that are not what you intended. There have been many attempts to harness all of the data that you generate for gaining useful insights about yourself, but they are generally difficult to set up and manage or require software development experience.

Cloud 100
article thumbnail

#ClouderaLife Spotlight: Liz Lashgari, Senior Employee Relations Manager

Cloudera

September 15th marks the beginning of National Hispanic Heritage Month – a month in which the contributions and influence of Hispanic people on the history, culture and achievements of the US are recognized. To commemorate the month, we are spotlighting an employee who is as active within the community as they are in the company and LatinX Employee Resource Group (ERG).

More Trending

article thumbnail

Kafka Connect Fundamentals: What is Kafka Connect?

Confluent

Apache Kafka® is an enormously successful piece of data infrastructure, functioning as the ubiquitous distributed log underlying the modern enterprise. It is scalable, available as a managed service, and has […].

Kafka 98
article thumbnail

Digging Into Data Reliability Engineering

Data Engineering Podcast

Summary The accuracy and availability of data has become critically important to the day-to-day operation of businesses. Similar to the practice of site reliability engineering as a means of ensuring consistent uptime of web services, there has been a new trend of building data reliability engineering practices in companies that rely heavily on their data.

article thumbnail

Serving the Public Through Data

Cloudera

Digital transformation has been talked about for many years, but the pandemic has accelerated the digital transformation journeys for many enterprises. Forced to adapt to changes in the business landscape and customer behavior, businesses have adopted more digital tools and technologies to drive innovation and increase resilience. . While going digital may be commonly associated with the private sector, governments and the organizations in the public sector have much to gain by going digital as

Medical 88
article thumbnail

MLOps vs DevOps! Here's How They Fit Together

ProjectPro

The word machine learning is buzzing so loud that almost every IT professional has heard this term by now. With time, machine learning has become more applied, and every industry is leveraging it. Most software applications today have sophisticated machine learning algorithms in action behind the scenes - Welcome to the world of MLOps that makes these ML models successful in production.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Trigger AWS Lambda Functions Directly from a Confluent Cloud Apache Kafka Topic

Confluent

The distributed architecture of Apache Kafka® can cause the operational burden of managing it to quickly become a limiting factor for adoption and developer agility. For this reason, it is […].

Kafka 97
article thumbnail

Everything You Need to Know About DataOps Solutions

DataKitchen

The post Everything You Need to Know About DataOps Solutions first appeared on DataKitchen.

52
article thumbnail

Closing the Gap Between the Digital Haves and Have-Nots

Cloudera

by Pedro Pereira. The digital race is on. To pull ahead of the pack, a company needs to know what to do with its data. Without a data-driven strategy, you’re bound to lose ground to competitors who apply their data to operational improvements, product development, go-to-market strategies, and the customer experience. It isn’t enough to collect, interpret, and act on the data.

article thumbnail

Machine Learning (ML) vs NLP - What's the Difference?

ProjectPro

The term artificial intelligence is always synonymously used Awith complex terms like Machine learning, Natural Language Processing, and Deep Learning that are intricately woven with each other. One of the trending debates is that of the differences between natural language processing and machine learning. This post attempts to explain two of the crucial sub-domains of artificial intelligence - Machine Learning vs.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Rockset Is Now SOC 2 Type II Compliant

Rockset

The Rockset team is proud to announce that we have been accredited as SOC 2 Type II compliant. Our customers entrust Rockset with their data, and now they have rigorous, independent assurance that we protect it by following security best practices. What is SOC 2 Type II? SOC is one of several System and Organization Controls audits developed by the American Institute of CPAs (AICPA), the world’s largest member association of accountants.

article thumbnail

The Airflow Smart Sensor Service

Airbnb Tech

Consolidating long-running, lightweight tasks for improved resource utilization By: Yingbo Wang , Kevin Yang Introduction Airflow is a platform to programmatically author, schedule, and monitor data pipelines. A typical Airflow cluster supports thousands of workflows, called DAGs (directed acyclic graphs), and there could be tens of thousands of concurrently running tasks at peak hours.

article thumbnail

Migrate to CDP Private Cloud Base – A Step by Step Guide

Cloudera

Our recent blog discussed the four paths to get from legacy platforms to CDP Private Cloud Base. In this blog and accompanying video, we will deep dive into the mechanics of running an in-place upgrade from CDH5 or CDH6 to CDP Private Cloud Base. The overall upgrade follows a seven-step process illustrated below. In the video below we walk through a complete end to end upgrade of CDH to CDP Private Cloud Base.

Cloud 76
article thumbnail

The Ultimate Guide to Statistics for Machine Learning Beginners

ProjectPro

Probability and Statistics are two intertwined topics that smoothen one’s path to becoming a Machine Learning pro. In this blog, you will find a detailed description of all you need to learn about probability and statistics for machine learning. If you are a regular user of social media sites, you must have encountered on your timeline at least one of the memes that reflect machine learning is nothing but glamorised statistics.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Rockset Elevates Security Posture with RBAC Custom Roles & Views

Rockset

Summary: Over-privileged accounts create security vulnerabilities by expanding an organization’s attack surface Rockset has released new security features that allow admins to limit access to certain users to a specific subset of data without exposing the complete data set RBAC with Custom Roles enables admins to create scoped down user roles with limited privileges.

SQL 52
article thumbnail

Survey: Cloud, Data Management, and Emerging Technology Needs Are Driving Changes to 2022 IT Plans

Teradata

Our latest global industry survey, in partnership with Vanson Bourne, reveals that enterprises are contemplating long-term, data-focused IT investments to address changing market conditions.

article thumbnail

Group vs Fine-Grained Access Control in Cloudera Data Platform Public Cloud

Cloudera

Cloudera Data platform ( CDP ) provides a Shared Data Experience ( SDX ) for centralized data access control and audit in the Enterprise Data Cloud. The Ranger Authorization Service (RAZ) is a new service added to help provide fine-grained access control (FGAC) for cloud storage. We covered the value this new capability provides in a previous blog. RAZ for S3 and RAZ for ADLS introduce FGAC and Audit on CDP’s access to files and directories in cloud storage making it consistent with the re

article thumbnail

Connecting a Linux VPS to an AWS VPC using a S2S VPN with Static Routing

Hepta Analytics

This blogpost will cover how to connect a standalone Virtual Private Server (VPS) running Linux (specifically Debian) to AWS’ Virtual Private Cloud (VPC) using a site-to-site VPN with Static Routing. This blogpost is relevant for those who find themselves having to integrate their AWS infrastructure with external sites where they do not own, or do not have permission to configure the gateway device (e.g. a Cisco ASA appliance).

AWS 52
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Correlation vs. Covariance

ProjectPro

Are you tired of searching the web for ‘correlation vs. covariance’ to understand the two terms better? If yes, read this article that compares correlation vs. covariance and explains the two popular statistical tools in detail. After the birth of the new domain of Data Science, data has become a prized possession for most companies. They rely on data science algorithms to understand customer behavior, predict sales, etc.

article thumbnail

Databricks Delta Cache and Spark Cache

Advancing Analytics: Data Engineering

As data sizes and demand increases as time goes on, you often see slowness on Databricks this can be due to number of factors from security, network transfers, read/write requests, and memory space. A common cause of this is when Databricks has to contently reads parquet files from the file system, increasing the I/O and network throughput. Databricks has to manage and monitor the cluster to ensure it does not exceed the I/O treads threshold and that the workers have enough memory to cope with t

SQL 52
article thumbnail

Best Practices for Leveraging Orphan Data in Your Analytical Ecosystem

Teradata

Businesses struggle to manage orphan data -- data not maintained by traditional transaction systems. Learn what your company can do to turn orphan data challenges into competitive advantages.

Data 52
article thumbnail

Overcoming the Limitations of Client-Side Form Tracking With Webhooks

RudderStack

How to use RudderStack’s Webhook Source to submit form data to RudderStack without it being susceptible to client-side script blocking tools.

IT 40
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

15 Object Detection Project Ideas with Source Code for Practice

ProjectPro

Artificial intelligence is booming. According to Andrew Ng, AI will transform almost every major industry in the world, and we will witness a massive shift in the way these industries operate. There is new research in the field of AI almost everyday, and new applications of AI are being implemented in industries. The AI market is growing rapidly. By 2030, AI will lead to an estimated 26% increase in global GDP.

Coding 52
article thumbnail

Web Scraping & Getting Data with Beautiful Soup | Domino

Domino Data Lab: Data Engineering

Data is all around us, from the spreadsheets we analyse on a daily basis, to the weather forecast we rely on every morning or the webpages we read. In many cases, the data we consume is simply given to us, and a simple glance is enough to make a decision. For example, knowing that the chance of rain today is 75% all day makes me take my umbrella with me.

Data 40
article thumbnail

Armadillo makes audio players in Android easy

Scribd Technology

Armadillo is the fully featured audio player library Scribd uses to play and download all of its audiobooks and podcasts, which is now open source. It specializes in playing HLS or MP3 content that is broken down into chapters or tracks. It leverages Google’s Exoplayer library for its audio engine. Exoplayer wraps a variety of low level audio and video apis but has few opinions of its own for actually using audio in an Android app.

Media 40
article thumbnail

RudderStack and Mixpanel Announce Partnership Advancing Product Analytics for the Modern Data Stack

RudderStack

Mixpanel and RudderStack are proud to partner together to deliver better analytics to product teams everywhere, fueled by rich data from the data warehouse.

article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.