Python for Alteryx users


Overview of tools

We will recreate this Alteryx workflow using Python

Alteryx workflow

Text input

Text Input

import pandas as pd

text_input = pd.DataFrame(
   {"Person": ["You", "Me", "Everyone else"], "Number": [1, 1, 99]}
)
text_input
Person Number
0 You 1
1 Me 1
2 Everyone else 99

Input data

Input data

import pandas as pd

input_data = pd.read_csv("https://query.data.world/s/whbjtalht4sigazly6hdnnb5ynqald")
input_data.head()
Category City Country Customer Name Discount Number of Records Order Date Order ID Postal Code Manufacturer Product Name Profit Quantity Region Sales Segment Ship Date Ship Mode State Sub-Category
0 Furniture Henderson United States Claire Gute 0.00 1 11/8/2017 CA-2017-152156 42420.0 Bush Bush Somerset Collection Bookcase 41.9136 2 South 261.9600 Consumer 11/11/2017 Second Class Kentucky Bookcases
1 Furniture Henderson United States Claire Gute 0.00 1 11/8/2017 CA-2017-152156 42420.0 Hon Hon Deluxe Fabric Upholstered Stacking Chairs,… 219.5820 3 South 731.9400 Consumer 11/11/2017 Second Class Kentucky Chairs
2 Office Supplies Los Angeles United States Darrin Van Huff 0.00 1 6/12/2017 CA-2017-138688 90036.0 Universal Self-Adhesive Address Labels for Typewriters b… 6.8714 2 West 14.6200 Corporate 6/16/2017 Second Class California Labels
3 Furniture Fort Lauderdale United States Sean O’Donnell 0.45 1 10/11/2016 US-2016-108966 33311.0 Bretford Bretford CR4500 Series Slim Rectangular Table -383.0310 5 South 957.5775 Consumer 10/18/2016 Standard Class Florida Tables
4 Office Supplies Fort Lauderdale United States Sean O’Donnell 0.20 1 10/11/2016 US-2016-108966 33311.0 Eldon Eldon Fold ‘N Roll Cart System 2.5164 2 South 22.3680 Consumer 10/18/2016 Standard Class Florida Storage

Select tool

Select tool

selected_data = (
   input_data.drop(columns=["Ship Date", "Ship Mode"])  # Example of deselecting
   .loc[
       :, ["Order ID", "Category", "Sub-Category", "Product Name", "Sales", "Quantity"]
   ]  # Example of selecting
   .rename(columns={"Sub-Category": "Sub-Cat", "Product Name": "Product"})
   .astype({"Quantity": "int64", "Category": "str"})
)
selected_data.head()
Order ID Category Sub-Cat Product Sales Quantity
0 CA-2017-152156 Furniture Bookcases Bush Somerset Collection Bookcase 261.9600 2
1 CA-2017-152156 Furniture Chairs Hon Deluxe Fabric Upholstered Stacking Chairs,… 731.9400 3
2 CA-2017-138688 Office Supplies Labels Self-Adhesive Address Labels for Typewriters b… 14.6200 2
3 US-2016-108966 Furniture Tables Bretford CR4500 Series Slim Rectangular Table 957.5775 5
4 US-2016-108966 Office Supplies Storage Eldon Fold ‘N Roll Cart System 22.3680 2

Formula tool

Formula tool

selected_data["Total Sales"] = selected_data["Sales"] * selected_data["Quantity"]
selected_data.head()
Order ID Category Sub-Cat Product Sales Quantity Total Sales
0 CA-2017-152156 Furniture Bookcases Bush Somerset Collection Bookcase 261.9600 2 523.9200
1 CA-2017-152156 Furniture Chairs Hon Deluxe Fabric Upholstered Stacking Chairs,… 731.9400 3 2195.8200
2 CA-2017-138688 Office Supplies Labels Self-Adhesive Address Labels for Typewriters b… 14.6200 2 29.2400
3 US-2016-108966 Furniture Tables Bretford CR4500 Series Slim Rectangular Table 957.5775 5 4787.8875
4 US-2016-108966 Office Supplies Storage Eldon Fold ‘N Roll Cart System 22.3680 2 44.7360

Filter tool

Filter tool

filtered_data = selected_data[selected_data["Sales"] > 1000]


# To get the false
sales_under_1000 = selected_data[selected_data["Sales"] < 1000]

filtered_data.head()
Order ID Category Sub-Cat Product Sales Quantity Total Sales
10 CA-2015-115812 Furniture Tables Chromcraft Rectangular Conference Tables 1706.184 9 15355.656
24 CA-2016-106320 Furniture Tables Bretford CR4500 Series Slim Rectangular Table 1044.630 3 3133.890
27 US-2016-150630 Furniture Bookcases Riverside Palais Royal Lawyers Bookcase, Royal… 3083.430 7 21584.010
35 CA-2017-117590 Technology Phones GE 30524EE4 1097.544 7 7682.808
54 CA-2017-105816 Technology Phones AT&T CL83451 4-Handset Telephone 1029.950 5 5149.750
sales_under_1000.head()
Order ID Category Sub-Cat Product Sales Quantity Total Sales
0 CA-2017-152156 Furniture Bookcases Bush Somerset Collection Bookcase 261.9600 2 523.9200
1 CA-2017-152156 Furniture Chairs Hon Deluxe Fabric Upholstered Stacking Chairs,… 731.9400 3 2195.8200
2 CA-2017-138688 Office Supplies Labels Self-Adhesive Address Labels for Typewriters b… 14.6200 2 29.2400
3 US-2016-108966 Furniture Tables Bretford CR4500 Series Slim Rectangular Table 957.5775 5 4787.8875
4 US-2016-108966 Office Supplies Storage Eldon Fold ‘N Roll Cart System 22.3680 2 44.7360

Sort tool

Sort tool

sorted_data = filtered_data.sort_values(by="Sales", ascending=False)
sorted_data.head()
Order ID Category Sub-Cat Product Sales Quantity Total Sales
9941 CA-2015-145317 Technology Machines Cisco TelePresence System EX90 Videoconferenci… 22638.480 6 135830.880
8093 CA-2017-118689 Technology Copiers Canon imageCLASS 2200 Advanced Copier 17499.950 5 87499.750
4905 CA-2018-140151 Technology Copiers Canon imageCLASS 2200 Advanced Copier 13999.960 4 55999.840
9707 CA-2018-127180 Technology Copiers Canon imageCLASS 2200 Advanced Copier 11199.968 4 44799.872
5297 CA-2018-166709 Technology Copiers Canon imageCLASS 2200 Advanced Copier 10499.970 3 31499.910

Summarize tool

Summarize tool

summarized_data = (
   sorted_data.groupby(["Category", "Sub-Cat"])
   .agg({"Sales": "sum"})
   .rename(columns={"Sales": "Sales by Sub-Cat"})
).reset_index()
summarized_data.head()
Category Sub-Cat Sales by Sub-Cat
0 Furniture Bookcases 49521.8256
1 Furniture Chairs 129547.8520
2 Furniture Furnishings 5638.9320
3 Furniture Tables 102125.0490
4 Office Supplies Appliances 36742.7900
summarized_data_alternative = sorted_data.groupby(["Category", "Sub-Cat"]).agg(
   Sales_by_Sub_Cat=pd.NamedAgg(column="Sales", aggfunc=sum)
).reset_index()
summarized_data_alternative.head()
Category Sub-Cat Sales_by_Sub_Cat
0 Furniture Bookcases 49521.8256
1 Furniture Chairs 129547.8520
2 Furniture Furnishings 5638.9320
3 Furniture Tables 102125.0490
4 Office Supplies Appliances 36742.7900

Join tool

Join tool

joined_data = pd.merge(
   sorted_data, summarized_data, on=["Category", "Sub-Cat"], how="inner"
)
joined_data.head()
Order ID Category Sub-Cat Product Sales Quantity Total Sales Sales by Sub-Cat
0 CA-2015-145317 Technology Machines Cisco TelePresence System EX90 Videoconferenci… 22638.480 6 135830.880 162022.19
1 US-2017-107440 Technology Machines 3D Systems Cube Printer, 2nd Generation, Magenta 9099.930 7 63699.510 162022.19
2 CA-2017-158841 Technology Machines HP Designjet T520 Inkjet Large Format Printer … 8749.950 5 43749.750 162022.19
3 CA-2015-139892 Technology Machines Lexmark MX611dhe Monochrome Laser Printer 8159.952 8 65279.616 162022.19
4 US-2018-168116 Technology Machines Cubify CubeX 3D Printer Triple Head Print 7999.980 4 31999.920 162022.19
left_join = pd.merge(
   sorted_data, summarized_data, on=["Category", "Sub-Cat"], how="left", indicator=True
)
left_join = left_join[left_join["_merge"] == "left_only"]
left_join.drop(columns="_merge", inplace=True)
left_join.head()
Order ID Category Sub-Cat Product Sales Quantity Total Sales Sales by Sub-Cat

Union tool

Union tool

unioned_data = joined_data.append(sales_under_1000)
unioned_data.head()
Category Order ID Product Quantity Sales Sales by Sub-Cat Sub-Cat Total Sales
0 Technology CA-2015-145317 Cisco TelePresence System EX90 Videoconferenci… 6 22638.480 162022.19 Machines 135830.880
1 Technology US-2017-107440 3D Systems Cube Printer, 2nd Generation, Magenta 7 9099.930 162022.19 Machines 63699.510
2 Technology CA-2017-158841 HP Designjet T520 Inkjet Large Format Printer … 5 8749.950 162022.19 Machines 43749.750
3 Technology CA-2015-139892 Lexmark MX611dhe Monochrome Laser Printer 8 8159.952 162022.19 Machines 65279.616
4 Technology US-2018-168116 Cubify CubeX 3D Printer Triple Head Print 4 7999.980 162022.19 Machines 31999.920

Output tool

Output tool

unioned_data.to_csv("Sales summary.csv", index=False)