Photonic Logo
Projects Blog About

A Beginner's Guide to Association Rule Mining

4 May 2025

Contents
  • Key Concepts
  • Items and Itemsets
  • Transactions and Rules
  • Metrics
  • Support
  • Confidence
  • Lift
  • A Summary of Algorithms
  • Apriori
  • ECLAT
  • FP-Growth
  • An Example With mlxtend
  • Setting Up
  • Loading the Database
  • Generating Frequent Itemsets
  • Generating Association Rules
  • Challenges of Association Rule Mining
  • Conclusion
  • Bibliography
  • Image Credits
  • Footnotes

When you are shopping in the supermarket, do you ever notice how some items are placed next to each other? For example, do you notice that chips and soda are often placed next to each other? They are not placed there without reason — they have cold, hard data to back up the association of buying chips and soda together. And one powerful technique used to determine this association is Association Rule Mining (ARM).

Association Rule Mining (also known as Association Rule Learning) is a key concept in data mining that uncovers interesting correlations and relationships within data. It is used in recommendations of products, analysis of customer behaviour, and anomaly detection. At its core, association rule mining helps find the likelihood that BBB follows from AAA.

In this post, we’ll look at the key concepts involved in ARM, list out some algorithms used in ARM, and give an example of ARM using the mlxtend library.

Key Concepts

Let us start with some basic definitions.

Items and Itemsets

The most fundamental definition is for items.

An item is a binary attribute.

For example, an item in a shopping cart is a product — the shopping cart could either contain the item or not1. But it doesn’t have to be representing an actual product. As another example, if we think of the object as a book, an item could be whether the book is red, or blue, or green. Note that each of these colours is a separate item — we are just interested in the question “is the book red” or “is the book blue”.

Another important concept is a group of items.

An itemset is a collection of items. It is a subset of all the possible items.

In the shopping cart example, the set {Milk,Bread,Butter}\{\text{Milk}, \text{Bread}, \text{Butter}\}{Milk,Bread,Butter} is one itemset (representing that the shopping cart contains milk, bread, and butter). For the book example, one possible itemset is {Red,Hardback,Trade}\{\text{Red}, \text{Hardback}, \text{Trade}\}{Red,Hardback,Trade} (which represents the idea that the book is a red, hardback book printed in US trade size).

Transactions and Rules

A transaction is an itemset together with a unique transaction ID. The full list of transactions that we are considering is called the database.

For example, in the context of a supermarket, a transaction could be the products that the customer bought along with the invoice number. For the book example, the transaction could be the attributes of the book along with the International Standard Book Number (ISBN) of the book. In any case, the transaction should be uniquely identifiable within the database.

We can now define rules.

Suppose XXX and YYY are disjoint itemsets (i.e., XXX and YYY do not have a common item). An association rule denoted X  ⟹  YX \implies YX⟹Y means that if a transaction contains XXX, then the transaction also contains YYY. The itemset XXX is called the antecedent and YYY is called the consequent.

For the shopping cart example, we could define the rule {Milk,Bread}  ⟹  {Cereal}\{\text{Milk}, \text{Bread} \}\implies \{\text{Cereal}\}{Milk,Bread}⟹{Cereal}, which means that if a shopping cart contains both milk and bread, then it must contain cereal. Here the antecedent is that the shopping cart contains both milk and bread and the consequent is that the shopping cart also contains cereal.

Metrics

To measure the usefulness of the created association rule, we have a few metrics. To aid in the description of these metrics, consider the following database of transactions taken from Wikipedia:

Transaction IDItemset
1{Milk,Bread,Fruit}\{\text{Milk}, \text{Bread}, \text{Fruit}\}{Milk,Bread,Fruit}
2{Butter,Eggs,Fruit}\{\text{Butter}, \text{Eggs}, \text{Fruit}\}{Butter,Eggs,Fruit}
3{Beer,Diapers}\{\text{Beer}, \text{Diapers}\}{Beer,Diapers}
4{Milk,Bread,Butter,Eggs,Fruit}\{\text{Milk}, \text{Bread}, \text{Butter}, \text{Eggs}, \text{Fruit}\}{Milk,Bread,Butter,Eggs,Fruit}
5{Bread}\{\text{Bread}\}{Bread}

Let XXX and YYY be two disjoint itemsets, X  ⟹  YX\implies YX⟹Y be an association rule, and TTT be the database above.

Support

Support is an indication of the frequency of occurrence of either an itemset or an association rule within the database.

The support of an itemset is the number of transactions that contain all items in the itemset divided by the size of the database2. Formally, the support of an itemset XXX is

supp(X)=∣{(i,t)∈T  ∣  X⊆t}∣∣T∣\mathrm{supp}(X)=\frac{\left|\left\{(i,t) \in T \;\vert\; X \subseteq t \right\}\right|}{|T|} supp(X)=∣T∣∣{(i,t)∈T∣X⊆t}∣​

where TTT is the database containing transactions (i,t)(i, t)(i,t) (with iii being the transaction ID and ttt being the transaction itemset).

Consider the itemset X={Fruit}X = \{\text{Fruit}\}X={Fruit}. The number of transactions that contain Fruit\text{Fruit}Fruit is 3 (transactions 1, 2, and 4), so its support is 35\frac3553​ (i.e., supp({Milk})=35\mathrm{supp}(\{\text{Milk}\}) = \frac35supp({Milk})=53​). For the itemset X={Milk,Bread}X = \{\text{Milk}, \text{Bread}\}X={Milk,Bread}, only transactions 1 and 4 contain all items in XXX and thus its support is 25\frac2552​. In general, the larger the itemset XXX, the smaller its support.

What about the support of an association rule?

The support of an association rule X  ⟹  YX \implies YX⟹Y is the support of the itemset containing the union of items from XXX and YYY. Written another way,

supp(X  ⟹  Y)=supp(X∪Y)\mathrm{supp}(X \implies Y) = \mathrm{supp}(X \cup Y) supp(X⟹Y)=supp(X∪Y)

Think of the support of an association rule this way: for a rule to be ‘supported’, the items in the antecedent and consequent must be present in the itemset of the transaction. This is the same thing as saying that the transaction must contain the items in the union of both antecedent and consequent itemsets, hence the definition.

For example, the support for the association rule {Milk}  ⟹  {Bread}\{\text{Milk}\} \implies \{\text{Bread}\}{Milk}⟹{Bread} is the support of {Milk,Bread}\{\text{Milk}, \text{Bread}\}{Milk,Bread}, which is 25\frac2552​ as we found earlier.

Confidence

The confidence of an association rule X  ⟹  YX \implies YX⟹Y can be thought of as the likelihood that the association rule is actually true — when we see XXX in a transaction’s itemset, how likely will YYY also be there?

The confidence of an association rule X  ⟹  YX \implies YX⟹Y is the proportion of transactions containing XXX that also contain YYY. Formally, the confidence of the association rule X  ⟹  YX \implies YX⟹Y is

conf(X  ⟹  Y)=supp(X∪Y)supp(X)=supp(X  ⟹  Y)supp(X)\mathrm{conf}(X \implies Y) = \frac{\mathrm{supp}(X \cup Y)}{\mathrm{supp}(X)} = \frac{\mathrm{supp}(X \implies Y)}{\mathrm{supp}(X)} conf(X⟹Y)=supp(X)supp(X∪Y)​=supp(X)supp(X⟹Y)​

As an example, let’s find conf({Fruit}  ⟹  {Bread})\mathrm{conf}(\{\text{Fruit}\} \implies \{\text{Bread}\})conf({Fruit}⟹{Bread}). One can calculate that supp({Fruit}  ⟹  {Bread})=25\mathrm{supp}(\{\text{Fruit}\} \implies \{\text{Bread}\}) = \frac25supp({Fruit}⟹{Bread})=52​. Earlier, we also found that supp({Fruit})=35\mathrm{supp}(\{\text{Fruit}\}) = \frac35supp({Fruit})=53​, which means that the confidence is 2535=23\frac{\frac25}{\frac35} = \frac2353​52​​=32​. This suggests that whenever fruit is bought, bread is also bought 23\frac2332​ of the time. As another example, let’s calculate conf({Milk}  ⟹  {Bread})\mathrm{conf}(\{\text{Milk}\} \implies \{\text{Bread}\})conf({Milk}⟹{Bread}). We found earlier that supp({Milk}  ⟹  {Bread})=25\mathrm{supp}(\{\text{Milk}\} \implies \{\text{Bread}\}) = \frac25supp({Milk}⟹{Bread})=52​. One can also see that supp({Milk})=25\mathrm{supp}(\{\text{Milk}\}) = \frac25supp({Milk})=52​, which means that the confidence is 2525=1\frac{\frac25}{\frac25} = 152​52​​=1. This denotes the idea that every time a customer buys milk, they also buy bread.

Lift

The lift of an association rule tells us how well such an association rule predicts the occurrence of the consequence given the antecedent as compared to a random model. This is defined formally below.

The lift of an association rule X  ⟹  YX \implies YX⟹Y is the ratio of the observed support to the expected support if the occurrence of XXX and YYY are independent to each other. Symbolically,

lift(X  ⟹  Y)=supp(X  ⟹  Y)supp(X)×supp(Y)\mathrm{lift}(X \implies Y) = \frac{\mathrm{supp}(X \implies Y)}{\mathrm{supp}(X) \times \mathrm{supp}(Y)} lift(X⟹Y)=supp(X)×supp(Y)supp(X⟹Y)​

Let us consider the rule {Fruit}  ⟹  {Bread}\{\text{Fruit}\} \implies \{\text{Bread}\}{Fruit}⟹{Bread}. Its support is 25\frac2552​ as found earlier, with supp({Fruit})=35\mathrm{supp}(\{\text{Fruit}\}) = \frac35supp({Fruit})=53​ and supp({Bread})=35\mathrm{supp}(\{\text{Bread}\}) = \frac35supp({Bread})=53​, so the lift of this rule is 2535×35=109\frac{\frac25}{\frac35 \times \frac35} = \frac{10}953​×53​52​​=910​. A lift of more than 1 indicates that they are dependent on each other, so this rule is potentially useful. Consider instead the rule {Butter}  ⟹  {Bread}\{\text{Butter}\} \implies \{\text{Bread}\}{Butter}⟹{Bread}. Here its support is 15\frac1551​ (since transaction 4 is the only transaction containing Butter\text{Butter}Butter and Bread\text{Bread}Bread), with the support of Butter\text{Butter}Butter and Bread\text{Bread}Bread being 25\frac2552​ and 35\frac3553​ respectively. Hence the lift of the rule {Butter}  ⟹  {Bread}\{\text{Butter}\} \implies \{\text{Bread}\}{Butter}⟹{Bread} is 1525×35=56\frac{\frac15}{\frac25 \times \frac35} = \frac5652​×53​51​​=65​, which actually indicates that Butter\text{Butter}Butter and Bread\text{Bread}Bread are substitutes for each other since the presence of Butter\text{Butter}Butter appears to have a negative impact on the presence of Bread\text{Bread}Bread.

A Summary of Algorithms

Let’s now look at a quick summary of the common association rule mining algorithms. We’ll not go into too much detail about how they work, but we will highlight the main pros and cons of the algorithms.

These algorithms are used to generate frequent itemsets, which are itemsets which have a support of at least a certain threshold. Another step is needed to convert these frequent itemsets to association rules. That’s where the other two metrics come in — if the association rule generated does not meet a certain confidence (or lift), it will be discarded. For example, in mlxtend, the association_rules() function helps convert the frequent itemsets into association rules3.

Apriori

One of the earliest association rule mining algorithms, Apriori was proposed by Agrawal and Srikant in 1994 for frequent item set identification, and named so as it uses prior knowledge (i.e., a priori) of frequent itemset properties.

However, a big limitation is its poor performance when a large proportion of transactions in the database (>70%) contain a frequent itemset. When this occurs, its speed drops rapidly and will take a while longer to generate the frequent itemsets. The Apriori algorithm also needs to scan the database k+1k + 1k+1 times where kkk is the size of the largest frequent itemset that still surpasses the support threshold.

ECLAT

Due to the poor performance of the Apriori algorithm, several alternatives were developed. One of which, the Equivalence Class Clustering and Bottom-Up Lattice Traversal (ECLAT) algorithm by Zaki in 2000, is a backtracking algorithm that works on the frequent itemset lattice graph.

Although ECLAT is faster than Apriori, it comes with reduced interpretability — it does not include confidence or lift metrics. This is partly the reason why it is faster, but this does mean that the ability to interpret its results is reduced.

FP-Growth

Frequent Pattern Growth (FP-Growth) was an algorithm introduced by Han, Pei, and Yin in 2000 and quickly emerged as a popular alternative to the Apriori algorithm since it is faster. In particular, FP-Growth forgoes candidate frequent itemset generation which makes it faster than Apriori. It also provides more metrics than ECLAT and is on par with ECLAT’s speed.

An Example With mlxtend

With the background on the key concepts of association rule mining and the main algorithms used, let’s try our hand at actually finding these association rules in the database given above using the mlxtend library.

Setting Up

First install the mlxtend library. This will also install some useful data processing libraries like NumPy and Pandas.

pip install mlxtend~=0.23.4

IMPORTANT

Make sure that pip refers to the Python 3 pip command. Otherwise, use pip3 instead.

Loading the Database

Let’s load our itemsets in the database so that we can use mlxtend’s association rule mining tools on it.

id,itemset
1,"Milk,Bread,Fruit"
2,"Butter,Eggs,Fruit"
3,"Beer,Diapers"
4,"Milk,Bread,Butter,Eggs,Fruit"
5,"Bread"

Assuming that the database is named database.csv, we can load it in a Pandas dataframe by running the following code.

import pandas as pd
orig_df = pd.read_csv("database.csv", index_col="id")

But what we actually need is a list of itemsets, not a database! With some processing we can achieve this:

rows = orig_df.values.tolist()
dataset = [row[0].split(",") for row in rows]  # row[0] since each row has only one column

We can now use dataset to generate the frequent itemsets and then the association rules.

Generating Frequent Itemsets

With our dataset created, we are almost ready to generate the frequent itemsets. However, we first need to encode the itemsets in the ‘canonical’ transaction form4. We can do this using the TransactionEncoder() class provided in mlxtend:

from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

We can now use mlxtend’s frequent_patterns module on df to generate the frequent itemsets. Let’s use the fpgrowth() function, specifying a minimum support value of 0.4 and saying that we want the column names to appear in the itemsets (instead of the column indices).

from mlxtend.frequent_patterns import fpgrowth
frequent_itemsets = fpgrowth(df, min_support=0.4, use_colnames=True)

Let’s take a look at what the frequent_itemsets dataframe contains:

print(frequent_itemsets)
    support               itemsets
0       0.6                (Fruit)
1       0.6                (Bread)
2       0.4                 (Milk)
3       0.4                 (Eggs)
4       0.4               (Butter)
5       0.4         (Fruit, Bread)
6       0.4          (Milk, Bread)
7       0.4          (Fruit, Milk)
8       0.4   (Milk, Fruit, Bread)
9       0.4          (Eggs, Fruit)
10      0.4         (Butter, Eggs)
11      0.4        (Butter, Fruit)
12      0.4  (Butter, Fruit, Eggs)

Generating Association Rules

With the frequent_itemsets dataframe, we can now find the appropriate association rules. Luckily we don’t have to manually write the brute force code ourselves — mlextend’s frequent_pattern module also provides the association_rules() function to generate them for us. Here we specify that the association rules that are to be output should have a confidence of above 0.8. We could of course use lift in place of confidence; we’ll leave this as an exercise for the reader.

from mlxtend.frequent_patterns import association_rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.8)

Taking a look at the rules dataframe reveals some interesting rules (the output is truncated for brevity):

print(rules)
        antecedents      consequents  antecedent support  consequent support  support  confidence      lift
0            (Milk)          (Bread)                 0.4                 0.6      0.4         1.0  1.666667
1            (Milk)          (Fruit)                 0.4                 0.6      0.4         1.0  1.666667
2     (Fruit, Milk)          (Bread)                 0.4                 0.6      0.4         1.0  1.666667
3     (Bread, Milk)          (Fruit)                 0.4                 0.6      0.4         1.0  1.666667
4    (Fruit, Bread)           (Milk)                 0.4                 0.4      0.4         1.0  2.500000
5            (Milk)   (Fruit, Bread)                 0.4                 0.4      0.4         1.0  2.500000
6            (Eggs)          (Fruit)                 0.4                 0.6      0.4         1.0  1.666667
7          (Butter)           (Eggs)                 0.4                 0.4      0.4         1.0  2.500000
8            (Eggs)         (Butter)                 0.4                 0.4      0.4         1.0  2.500000
9          (Butter)          (Fruit)                 0.4                 0.6      0.4         1.0  1.666667
10  (Butter, Fruit)           (Eggs)                 0.4                 0.4      0.4         1.0  2.500000
11   (Butter, Eggs)          (Fruit)                 0.4                 0.6      0.4         1.0  1.666667
12    (Eggs, Fruit)         (Butter)                 0.4                 0.4      0.4         1.0  2.500000
13         (Butter)    (Eggs, Fruit)                 0.4                 0.4      0.4         1.0  2.500000
14           (Eggs)  (Butter, Fruit)                 0.4                 0.4      0.4         1.0  2.500000

From the generated rules, we can see that all of them have a confidence of 1.0. Some of the rules’ lift scores are also quite high — the rule {Butter}  ⟹  {Eggs}\{\text{Butter}\} \implies \{\text{Eggs}\}{Butter}⟹{Eggs} for example has a lift value of 2.5. Some of the rules are quite unexpected as well — the rule {Milk}  ⟹  {Frut}\{\text{Milk}\} \implies \{\text{Frut}\}{Milk}⟹{Frut} for example has a confidence of 1 and lift value of 1.666667.

NOTE

The rules dataframe also includes other metrics, which we have omitted in the output above and will not explore here. Interested readers could go to mlxtend’s official documentation to find out more about these metrics.

Filtering for certain rules comes down to manipulation of the rules dataframe. For example, to filter rules that have Butter as an antecedent, we can run:

rules[rules["antecedents"] == {"Butter"}]

Also, if only the support metric data is needed (or if the other metrics cannot be calculated), we can set support_only = True when using the association_rules() function:

frequent_itemsets = fpgrowth(df, min_support=0.4, use_colnames=True, support_only=True)

Challenges of Association Rule Mining

Although association rule mining only requires a few lines of code (as demonstrated earlier), there are a few challenges that needs to be addressed when dealing with real-world data.

  • Sparse data: If the itemsets in each transaction in the database only contain a few items, but there is a large variety of items, data processing and storage may become an issue when most of the data in te_ary and df (see above) is going to be empty.
  • Thresholding: The threshold values of min_support = 0.4 and min_threshold = 0.8 were not chosen at random — they were chosen to ensure that we don’t get too many or too few rules. For real data, these values would have to be adjusted so that the processing time doesn’t become too long and that the rules generated are relevant.
  • Meaningless, Trivial, or Repeated Rules: One must be careful to watch out for rules that are meaningless in the context of the other generated rules. For example, in the rules dataframe above, notice that rule 0 says that {Milk}  ⟹  {Bread}\{\text{Milk}\} \implies \{\text{Bread}\}{Milk}⟹{Bread} but rule 2 says that {Fruit,Milk}  ⟹  {Bread}\{\text{Fruit}, \text{Milk}\} \implies \{\text{Bread}\}{Fruit,Milk}⟹{Bread}. But clearly {Milk}\{\text{Milk}\}{Milk} is a subset of {Fruit,Milk}\{\text{Fruit}, \text{Milk}\}{Fruit,Milk}, and thus rule 2 is unnecessary. One must be careful to filter out rules that are unhelpful.

Conclusion

Association rule mining is an oft-forgotten type of unsupervised machine learning, but its applications to product recommendation, bioinformatics, anomaly detection, web usage mining, customer behaviour analysis etc. cannot be understated. I hope that this post managed to cover the basics of this wonderful approach to uncovering hidden relationships between variables in your datasets.

Happy association rule mining!

Bibliography

  • Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining Association Rules between Sets of Items in Large Databases. ACM SIGMOD Record, 22(2), 207-216. https://doi.org/10.1145/170036.170072
  • Wikipedia contributors. (2025, January 25). Association rule learning. In Wikipedia, The Free Encyclopedia. Retrieved 10:20, May 2, 2025, from https://en.wikipedia.org/w/index.php?title=Association_rule_learning&oldid=1271695648
  • Agrawal, R., & Srikant, R. (1994, September). Fast Algorithms for Mining Association Rules. Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, 487-499. https://www.vldb.org/conf/1994/P487.PDF
  • Zaki, M. J. (2000, May-June). Scalable Algorithms for Association Mining. IEEE Transactions on Knowledge and Data Engineering. 12(3), 372-390. https://doi.org/10.1109/69.846291
  • Han, J., Pei, J., & Yin, Y. (2000, May 16). Mining Frequent Patterns without Candidate Generation. Proceedings of the 2000 ACM SIGMOD international conference on Management of data (SIGMOD ‘00), 1-12. https://doi.org/10.1145/342009.335372
  • Heaton, J. (2017, January 30). Comparing Dataset Characteristics that Favor the Apriori, Eclat or FP-Growth Frequent Itemset Mining Algorithms. arXiv. https://arxiv.org/pdf/1701.09042
  • Raschka, S. (n.d.). association_rules: Association rules generation from frequent itemsets. Retrieved 14:30, May 3, 2025, from https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/

Image Credits

Cover image by Buzz from Pixabay, cropped from the original.

Footnotes

  1. Of course, in real life a shopping cart could contain any quantity of product (like two or three bags of chips). Here we are just interested whether the shopping cart has the product, not how many of that product is in the shopping cart. ↩

  2. This is similar to how the probability of an event AAA, commonly denoted P(A)P(A)P(A), occurring within a sample space of size NNN is the count of AAA within the sample space divided by NNN. In fact, some authors denote supp(X)\mathrm{supp}(X)supp(X) by P(X)P(X)P(X), but this is a bit confusing for my tastes. ↩

  3. In particular, according to the source code for association_rules(), it appears that it will generate all possible combinations of antecedents and consequents for a given frequent itemset, and keep only those whose metric exceeds a threshold value. Formally, for a given frequent itemset XXX, it finds all sets AAA (the antecedent) and X∖AX \setminus AX∖A (the consequent) such that the metric exceeds the threshold value. ↩

  4. In the original formulation for transactions in association rule mining by Agrawal, Imieliński, and Swami in 1993, transactions were simply a pair (i,t)(i,t)(i,t) where iii is the transaction ID and ttt is an array of values where t[j]=1t[j] = 1t[j]=1 if the jjjth item (in a list of all items) is present and t[j]=0t[j] = 0t[j]=0 otherwise. However we opt to let ttt be an itemset to avoid confusion. ↩

Suggest an editLicensingCite this post
NextSecurity over Insecurity: The Foundations of Authenticated Key Exchange

Citing This Post

APA7

Kan, O. K. (2025, May 4). A Beginner's Guide to Association Rule Mining. Photonic. Retrieved October 4, 2025, from https://photonic.dev/blog/2025-05-04/a-beginner's-guide-to-association-rule-mining/

BibTeX

@misc{Photonic_2025,  title={A Beginner's Guide to Association Rule Mining},  url={https://photonic.dev/blog/2025-05-04/a-beginner's-guide-to-association-rule-mining/},  journal={Photonic},  author={Kan, Onn Kit},  year={2025},  month={May},  day={4}}
  • GitHub Logo
  • Bandcamp Logo
  • LinkedIn Logo
Copyright © 2025 PhotonicGluon Version 1.2.11
Source Code Credits Licensing
Printed on TIMESTAMP