Thu.May 30, 2024

article thumbnail

Building Data Platforms (from scratch)

Confessions of a Data Guy

Of all the duties that Data Engineers take on during the regular humdrum of business and work, it’s usually filled with the same old, same old. Build new pipeline, update pipeline, new data model, fix bug, etc, etc. It’s never-ending. It’s a constant stream of data, new and old, spilling into our Data Warehouses and […] The post Building Data Platforms (from scratch) appeared first on Confessions of a Data Guy.

Building 184
article thumbnail

Python Essentials for Data Engineers

Start Data Engineering

Introduction Data is stored on disk and processed in memory Running the code Run on Codespaces Run on your laptop Using python REPL Python basics Python is used for extracting data from sources, transforming it, & loading it into a destination [Extract & Load] Read and write data to any system [Transform] Process data in Python or instruct the database to process it [Data Quality] Define what you expect of your data and check if your data confirms it [Code Testing] Ensure your code does

Python 147
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Infoshare 2024: Stream processing fallacies, part 1

Waitingforcode

Last week I was speaking in Gdansk on the DataMass track at Infoshare. As it often happens, the talk time slot impacted what I wanted to share but maybe it's for good. Otherwise, you wouldn't read stream processing fallacies!

Process 130
article thumbnail

Introducing the Robinhood Crypto Trading API

Robinhood

Robinhood Crypto customers in the United States can now use our API to view crypto market data, manage portfolios and account information, and place crypto orders programmatically Today, we are excited to announce the Robinhood Crypto trading API , ushering in a new era of convenience, efficiency, and strategy for our most seasoned crypto traders. Robinhood Crypto customers in the United States can use our new trading API to set up advanced and automated trading strategies that allow them to st

Insurance 142
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Introducing Salesforce BYOM for Databricks

databricks

Salesforce and Databricks are excited to announce an expanded strategic partnership that delivers a powerful new integration - Salesforce Bring Your Own Model.

137
137
article thumbnail

Snowflake Ventures Expands Investment in Sigma, Deepening Commitment to Bringing World-Class BI Directly into the AI Data Cloud

Snowflake

We’re excited to announce today that we’re reinforcing our commitment and deepening our partnership with Sigma with an expanded investment from Snowflake Ventures. Sigma is a leading business intelligence and analytics solution that makes it easy for employees to explore live data, create compelling visualizations and collaborate with colleagues. Sigma allows employees to break free of dashboards and build workflows, powered by write-back to Snowflake through their unique Input Tables capability

BI 100

More Trending

article thumbnail

5 Best End-to-End Open Source MLOps Tools

KDnuggets

Explore free and open-source MLOps tools for enhanced data privacy and control over your models and code.

Coding 118
article thumbnail

Bringing Financial Services Business Use Cases to Life: Leveraging Data Analytics, ML/AI, and Gen AI

Cloudera

The financial services industry is undergoing a significant transformation, driven by the need for data-driven insights, digital transformation, and compliance with evolving regulations. In this context, Cloudera and TAI Solutions have partnered to help financial services customers accelerate their data-driven transformation, improve customer centricity, ensure compliance with regulations, enhance risk management, and drive innovation.

article thumbnail

How AI is Transforming the Retail Industry

KDnuggets

Let’s go beyond the traditional retail industry and discuss how advanced AI-powered innovations are driving business growth.

Retail 106
article thumbnail

Orchestrating a Dynamic Time-series Pipeline with Azure Data Factory and Databricks

Towards Data Science

Explore how to build, trigger and parameterize a time-series data pipeline in Azure, accompanied by a step-by-step tutorial Continue reading on Towards Data Science »

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Future-Proof Your IBM AIX and IBM i Systems with Cloud-Based Data Protection

Precisely

Key Takeaways: Cloud-based High Availability Disaster Recovery (HA-DR) solutions enhance operational efficiency, leveraging automation to streamline recovery processes and reduce downtime expenses. Adopting unique cloud HA-DR strategies improves data redundancy and security, aligns with strict regulatory standards, and proactively manages disaster risks.

Systems 69
article thumbnail

Terraforming Dataform

Towards Data Science

MLOps: Datapipeline Orchestration Dataform 101, Part 2: Provisioning with Least Privilege Access Control A typical positioning of Dataform in a data pipeline [Image by author] This is the concluding part of Dataform 101 showing the fundamentals of setting up Dataform with a focus on its authentication flow. This second part focussed on terraform implementation of the flow explained in part 1.

article thumbnail

RAG vs Fine Tuning: How to Choose the Right Method

Monte Carlo

Generative AI has the potential to transform your business and your data engineering team, but only when it’s done right. So how can your data team actually drive value with your LLM or GenAI initiative? Leading organizations are often deciding between two emerging frameworks that differentiate their AI for business value: RAG vs fine tuning. What’s the difference between retrieval augmented generation (RAG) vs fine tuning?

article thumbnail

Generative AI on Architecture Diagram Creation : Part-2

RandomTrees

Creating diagrams has become essential in today’s data-driven world, whether you’re visualizing cloud architectures, documenting processes, or mapping complex systems. Fortunately, a range of powerful tools makes this task easier than ever. From free, open-source options to advanced, collaborative platforms, these tools cater to various needs and preferences.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Managing and Understanding Player Feedback at Scale

databricks

Whether you are working on a live title, pre/post production, ongoing maintenance, future releases, another version of a game, or a brand new.

article thumbnail

How to Excel in the Growing Field of Data Science

Elder Research

If you’re interested in data science careers, keep reading. This article features some helpful tips for those new to the field.

article thumbnail

Introducing Editing Templates in the Geodatabase

ArcGIS

Editing Templates can be stored in the Geodatabase with an Add-In.

article thumbnail

Unify your data: AI and Analytics in an Open Lakehouse

Cloudera

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission-critical, large-scale data analytics and AI use cases—including enterprise data warehouses. Nearly two years ago, Cloudera announced the general availability of Apache Iceberg in the Cloudera platform, which helps users avoid vendor lock-in and implement an open lakehouse.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!