Sat.Oct 21, 2023 - Fri.Oct 27, 2023

article thumbnail

Defining A Strategy For Your Data Products

Data Engineering Podcast

Summary The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the development, delivery, and evolution of data products.

BI 162
article thumbnail

Code Review on Printed Paper: an Excerpt from the Twitoons Comic Book

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover two out of seven topics from today’s full issue on The Man Behind the Big Tech Comics. To get full issues twice a week, subscribe here.

Coding 186
article thumbnail

Drag, Drop, Analyze: The Rise of No-Code Data Science

KDnuggets

No-code or low-code functionalities in data science have gained significant traction in recent years. These solutions are well-proven and matured, and they make data science more accessible to a wider range of people.

article thumbnail

Automating dead code cleanup

Engineering at Meta

Meta’s Systematic Code and Asset Removal Framework (SCARF) has a subsystem for identifying and removing dead code. SCARF combines static and dynamic analysis of programs to detect dead code from both a business and programming language perspective. SCARF automatically creates change requests that delete the dead code identified from the program analysis, minimizing developer costs.

Coding 142
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Snowflake To Acquire Ponder, Boosting Python Capabilities In the Data Cloud

Snowflake

Python’s popularity has more than doubled in the past decade¹ and it is quickly becoming the preferred language for development across machine learning, application development, pipelines, and more. One of our goals at Snowflake is to ensure we continue to deliver a best-in-class platform for Python developers. Snowflake customers are already harnessing the power of Python through Snowpark , a set of runtimes and libraries that securely deploy and process non-SQL code directly in Snowflake.

Python 141
article thumbnail

High resolution data updates to Living Atlas World Elevation Layers and Tools (October 2023)

ArcGIS

In October 2023, elevation layers have been updated with high-res datasets of France, New Zealand, USA, Italy along with global bathymetry.

Datasets 135

More Trending

article thumbnail

Introducing Predictive Optimization: Faster Queries, Cheaper Storage, No Sweat

databricks

Predictive Optimization intelligently optimizes your Lakehouse table data layouts for peak performance and cost-efficiency - without you needing to lift a finger.

Data 133
article thumbnail

6 Steps to Avoid Messy Data in Your Warehouse

Start Data Engineering

1. Introduction 2. Six Steps for a Clean Data Warehouse 2.1. Understand the business 2.2. Make data easy to use with the appropriate data model 2.3. Good input data is necessary for a good data warehouse 2.4. Define Source of Truth (SOT) and trace its usage 2.5. Keep stakeholders in the loop for a more significant impact 2.6. Watch out for org-level red flags ?

article thumbnail

What's new in Apache Spark 3.5.0 - Structured Streaming

Waitingforcode

It's time to start the series covering Apache Spark 3.5.0 features. As the first topic I'm going to cover Structured Streaming which has got a lot of RocksDB improvements and some major API changes.

IT 130
article thumbnail

5 Free Books to Master Machine Learning

KDnuggets

Machine Learning is one of the most exciting fields in computer science today. In this article, we will take a look at the five best yet free books to learn machine learning in 2023.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Announcing Apache Flink 1.18

Confluent

Read updates and improvements in Apache Flink 1.18, including dynamic fine-grained rescaling via REST API, Java 17 support, and faster rescaling & batch performance improvements.

Java 125
article thumbnail

5 Things you didn’t know about Buck2

Engineering at Meta

Meta has a very large monorepo, with many different programming languages. To optimize build and performance, we developed our own build system called Buck , which was first open-sourced in 2013. Buck2 is the recently open-sourced successor. In our internal tests at Meta, we observed that Buck2 completed builds approximately 2x as fast as Buck1. Below are five interesting facts you might not have known about Buck2.

article thumbnail

Learn How to Build Airtight Data Pipelines for your AI Initiatives

databricks

"I can't think of anything that's been more powerful since the desktop computer." — Michael Carbin, Associate Professor, MIT, and Founding Advisor, MosaicML A.

article thumbnail

KDnuggets News, October 27: 5 Free Books to Master Data Science • 7 Steps to Mastering LLMs

KDnuggets

This week on KDnuggets: Go from learning what large language models are to building and deploying LLM apps in 7 steps • Check this list of free books for learning Python, statistics, linear algebra, machine learning and deep learning • And much, much more!

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

nixtract 0.1.0

Tweag

Tweag is excited to announce the first release of nixtract 0.1.0 ! This is our first step towards a broader effort to make Nix the best tool to tackle tomorrow’s challenges of the Software Supply Chain. In order to understand why we need nixtract , let me tell you about the “secret” value of Nixpkgs. Is it a bird? A plane? It’s a graph! The Nix language allows you to define the “recipe” to build anything into a package, like the sources and the steps to make the package, but also the dependencie

Metadata 114
article thumbnail

Date and DateTime Manipulation in Polars

Confessions of a Data Guy

One thing all Data Engineers are doomed to do in purgatory will be to solve different date and datetime problems in an endless loop. I’m sure of it. I can’t imagine anything worse, so that must be it. Either way the constant need to manipulate dates and datetimes are just a way of life, something […] The post Date and DateTime Manipulation in Polars appeared first on Confessions of a Data Guy.

article thumbnail

Master Data Management: Common Misconceptions You Should Know

Precisely

When most people think of master data management, they first think of customers and products. This is logical, as the core mission of any company is to develop products and services, find the right customers, and consistently deliver excellence. But master data encompasses so much more than data about customers and products. It includes information about suppliers, employees, and target prospects.

article thumbnail

10 Basic Statistical Concepts in Plain English

KDnuggets

Explore 10 foundational statistical concepts made simple, from probability distributions to the central limit theorem, for better data understanding.

Data 144
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Makeathon 2023

Pinterest Engineering

Each year, we host Makeathon, our annual internal version of a hackathon, where employees from across the business collaborate for three days to bring their dream passion projects to life. The ideas they pitch have a goal to improve our product, culture, internal processes or a combination of the three. This year, Makeathon was hosted from August 7–August 11.

Medical 108
article thumbnail

Cloudera and AMD Spur Data Scientists to Take Climate Action

Cloudera

The world faces multiple environmental sustainability challenges — from the climate crisis and water scarcity to food production and urban resilience. Overcoming these hurdles offers opportunities for innovation through technology and artificial intelligence. That’s why Cloudera and AMD have partnered to host the Climate and Sustainability Hackathon.

Food 108
article thumbnail

How Lakehouse AI improves model accuracy with real-time computations

databricks

The predictive quality of a machine learning model is a direct reflection of the quality of data used to train and serve the.

article thumbnail

7 Steps to Mastering Data Wrangling with Pandas and Python

KDnuggets

Starting out on your data journey? Here’s a 7-step learning path to master data wrangling with pandas.

Python 143
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

10 Biggest Cybersecurity Trends in 2023

Knowledge Hut

Cybersecurity is a method of safeguarding networks and devices from external attacks. The cybersecurity trend shows a growing emphasis on protection, leading to an increased need for Cyber Security specialists. They are hired by businesses to secure secret information, preserve staff productivity, and boost customer trust in products and services. Cyber security is governed by the industry standard of confidentiality, integrity, and availability, or CIA.

article thumbnail

ThoughtSpot announces our 2023 Partner Award winners

ThoughtSpot

To our entire partner ecosystem, I want to personally thank each of you for your incredible contributions over the past year. Our partners play a vital role in driving ThoughtSpot’s mission of becoming a more fact-driven world. Together, we help organizations leverage AI and natural language search to discover insights and make data-driven decisions for their businesses.

article thumbnail

How Providence Health Built a Model marketplace using Databricks?

databricks

Providence's MLOps Platform Providence is a healthcare organization with 120,000 caregivers serving over 50 hospitals and 1,000 clinics across seven states. Providence is.

article thumbnail

The Top 5 Cloud Machine Learning Platforms & Tools

KDnuggets

What are the top 5 cloud machine learning platforms in the market today. Our list will help provide some vital insights into which platform might best cater to your specific machine learning needs. See what KDnuggets recommends.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Werner Gains Advanced Geospatial Capabilities with Snowflake and CARTO

Snowflake

Founded nearly 70 years ago, Werner Enterprises is a North American transportation and logistics leader that operates a fleet of almost 8,300 trucks and 30,000 trailers out of 16 terminals across the United States. The company generates a massive amount of data on the constantly changing, real-time location of each of its assets. Collecting and analyzing this geospatial data is vital for smart decision-making.

article thumbnail

Top 20+ Cyber Security Projects for 2023 [With Source Code]

Knowledge Hut

Cybersecurity has become an integral component of every industry as the world advances technologically. In recent years, an increasing number of young professionals have shown interest in this field. If you are pursuing a course in this field, you should complete a project on cybersecurity as your area of competence. Beginners with theoretical knowledge should not undertake an impossible endeavor.

Coding 98
article thumbnail

Announcing GA of Predictive I/O for Updates: Faster DML Queries, Right Out of the Box

databricks

Announcing GA of Predictive I/O for Updates, which harnesses Photon and AI atop Deletion Vectors in order to significantly speed up MERGE, UPDATE and DELETE operations.

105
105
article thumbnail

How Predictive Analytics is Revolutionizing Decision-Making in Tech

KDnuggets

Learn how predictive analytics work in a business environment.

article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.