Title: | Numerical Association Rule Mining using Population-Based Nature-Inspired Algorithms |
---|---|
Description: | Framework is devoted to mining numerical association rules through the utilization of nature-inspired algorithms for optimization. Drawing inspiration from the 'NiaARM' 'Python' and the 'NiaARM' 'Julia' packages, this repository introduces the capability to perform numerical association rule mining in the R programming language. Fister Jr., Iglesias, Galvez, Del Ser, Osaba and Fister (2018) <doi:10.1007/978-3-030-03493-1_9>. |
Authors: | Iztok Jr. Fister [aut, cre, cph]
|
Maintainer: | Iztok Jr. Fister <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.0 |
Built: | 2025-03-04 13:42:19 UTC |
Source: | https://github.com/firefly-cpp/niarules |
This function adds an attribute to the existing list.
add_attribute(rules, name, type, border1, border2, value)
add_attribute(rules, name, type, border1, border2, value)
rules |
The current rules list. |
name |
The name of the feature in the rule. |
type |
The type of the feature in the rule. |
border1 |
The first border value in the rule. |
border2 |
The second border value in the rule. |
value |
The value associated with the rule. |
The updated rules list.
rules <- list() new_rules <- add_attribute(rules, "feature1", "numerical", 0.2, 0.8, "EMPTY")
rules <- list() new_rules <- add_attribute(rules, "feature1", "numerical", 0.2, 0.8, "EMPTY")
This function takes a candidate solution vector and a features list and builds rule.
build_rule(solution, features)
build_rule(solution, features)
solution |
The solution vector. |
features |
The features list. |
A rule.
This function calculates the border value for a feature based on the feature information and a given value.
calculate_border(feature_info, value)
calculate_border(feature_info, value)
feature_info |
Information about the feature. |
value |
The value to calculate the border for. |
The calculated border value.
feature_info <- list(type = "numerical", lower_bound = 0, upper_bound = 1) border_value <- calculate_border(feature_info, 0.5)
feature_info <- list(type = "numerical", lower_bound = 0, upper_bound = 1) border_value <- calculate_border(feature_info, 0.5)
This function calculates the fitness of an association rule using support and confidence.
calculate_fitness(supp, conf)
calculate_fitness(supp, conf)
supp |
The support of the association rule. |
conf |
The confidence of the association rule. |
The fitness of the association rule.
This function calculates the selected category based on a given value and the total number of categories.
calculate_selected_category(value, num_categories)
calculate_selected_category(value, num_categories)
value |
The value to calculate the category for. |
num_categories |
The total number of categories. |
The calculated selected category.
selected_category <- calculate_selected_category(0.3, 5)
selected_category <- calculate_selected_category(0.3, 5)
This function checks if the attribute conditions specified in the association rule are satisfied for a given instance row.
check_attribute(attribute, instance_row)
check_attribute(attribute, instance_row)
attribute |
An attribute with type and name information. |
instance_row |
A row representing an instance in the dataset. |
TRUE if conditions are satisfied, FALSE otherwise.
This function calculates the cut point, denoting which part of the vector belongs to the antecedent and which to the consequent of the mined association rule.
cut_point(sol, num_attr)
cut_point(sol, num_attr)
sol |
The cut value from the solution vector. |
num_attr |
The number of attributes in the association rule. |
The cut point value.
This function uses Differential Evolution, a stochastic population-based optimization algorithm, to find the optimal numerical association rule.
differential_evolution( d = 10, np = 10, f = 0.5, cr = 0.9, nfes = 1000, features, data, is_time_series = FALSE )
differential_evolution( d = 10, np = 10, f = 0.5, cr = 0.9, nfes = 1000, features, data, is_time_series = FALSE )
d |
Dimension of the problem (default: 10). |
np |
Population size (default: 10). |
f |
The differential weight, controlling the amplification of the difference vector (default: 0.5). |
cr |
The crossover probability, determining the probability of a component being replaced (default: 0.9). |
nfes |
The maximum number of function evaluations (default: 1000). |
features |
A list containing information about features, including type and bounds. |
data |
A data frame representing instances in the dataset. |
is_time_series |
A boolean indicating whether the dataset is time series. |
A list containing the best solution, its fitness value, and the number of function evaluations and list of identified association rules.
Storn, R., & Price, K. (1997). "Differential Evolution – A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces." Journal of Global Optimization, 11(4), 341–359. doi:10.1023/A:1008202821328
This function evaluates the fitness of an association rule using support and confidence. If time series data is used, it restricts evaluation to the specified time range.
evaluate(solution, features, instances, is_time_series = FALSE)
evaluate(solution, features, instances, is_time_series = FALSE)
solution |
A vector representing a candidate solution. |
features |
A list containing information about features. |
instances |
A data frame representing dataset instances. |
is_time_series |
A boolean flag indicating if time series filtering is required. |
A list containing fitness and identified rules.
Fister, I., Iglesias, A., Galvez, A., Del Ser, J., Osaba, E., & Fister, I. (2018). "Differential evolution for association rule mining using categorical and numerical attributes." In Intelligent Data Engineering and Automated Learning–IDEAL 2018: 19th International Conference, Madrid, Spain, November 21–23, 2018, Proceedings, Part I (pp. 79-88). Springer International Publishing. doi:10.1007/978-3-030-03496-2_9
Fister Jr, I., Podgorelec, V., & Fister, I. (2021). "Improved nature-inspired algorithms for numeric association rule mining." In Intelligent Computing and Optimization: Proceedings of the 3rd International Conference on Intelligent Computing and Optimization 2020 (ICO 2020) (pp. 187-195). Springer International Publishing. doi:10.1007/978-3-030-68154-8_19
This function analyzes the given dataset and extracts information about each feature.
extract_feature_info(data, timestamp_col = "timestamp")
extract_feature_info(data, timestamp_col = "timestamp")
data |
The dataset to analyze. |
timestamp_col |
Optional. The name of the timestamp column to exclude from features. |
A list containing information about each feature, including type and bounds/categories.
This function returns the position of a feature in the vector, considering the type of the feature.
feature_position(features, feature)
feature_position(features, feature)
features |
The features list. |
feature |
The name of the feature to find. |
The position of the feature.
features <- list( feature1 = list(type = "numerical"), feature2 = list(type = "categorical"), feature3 = list(type = "numerical") ) position <- feature_position(features, "feature2")
features <- list( feature1 = list(type = "numerical"), feature2 = list(type = "categorical"), feature3 = list(type = "numerical") ) position <- feature_position(features, "feature2")
This function ensures that all values greater than 1.0 are set to 1.0, and all values less than 0.0 are set to 0.0.
fix_borders(vector)
fix_borders(vector)
vector |
A numeric vector to be processed. |
A numeric vector with borders fixed.
This function formats the parts of an association rule into a string.
format_rule_parts(parts)
format_rule_parts(parts)
parts |
A list containing parts of an association rule. |
A formatted string representing the rule parts.
This function maps the lower and upper bounds of the solution vector to a subset of the dataset.
map_to_ts(lower, upper, instances)
map_to_ts(lower, upper, instances)
lower |
The lower bound in [0, 1]. |
upper |
The upper bound in [0, 1]. |
instances |
The full dataset. |
A list with 'low', 'up', and 'filtered_instances'.
This function uses PSO, a stochastic population-based optimization algorithm, to find the optimal numerical association rule.
particle_swarm_optimization( d = 10, np = 10, w = 0.7, c1 = 1.5, c2 = 1.5, nfes = 1000, features, data, is_time_series = FALSE )
particle_swarm_optimization( d = 10, np = 10, w = 0.7, c1 = 1.5, c2 = 1.5, nfes = 1000, features, data, is_time_series = FALSE )
d |
Dimension of the problem (default: 10). |
np |
Population size (default: 10). |
w |
Inertia weight (default: 0.7). |
c1 |
Cognitive coefficient (default: 1.5). |
c2 |
Social coefficient (default: 1.5). |
nfes |
The maximum number of function evaluations (default: 1000). |
features |
A list containing information about features, including type and bounds. |
data |
A data frame representing instances in the dataset. |
is_time_series |
A boolean indicating whether the dataset is time series. |
A list containing the best solution, its fitness value, and the number of function evaluations and list of identified association rules.
Kennedy, J., & Eberhart, R. (1995). "Particle swarm optimization." Proceedings of ICNN'95 - International Conference on Neural Networks, 4, 1942–1948. IEEE. doi:10.1109/ICNN.1995.488968
This function prints association rules including antecedent, consequence, support, confidence, and fitness. For time series datasets, it also includes the start and end timestamps instead of indices.
print_association_rules(rules, is_time_series = FALSE, timestamps = NULL)
print_association_rules(rules, is_time_series = FALSE, timestamps = NULL)
rules |
A list containing association rules. |
is_time_series |
A boolean flag indicating if time series information should be included. |
timestamps |
A vector of timestamps corresponding to the time series data. |
Prints the association rules.
This function prints the information extracted about each feature.
print_feature_info(feature_info)
print_feature_info(feature_info)
feature_info |
The list containing information about each feature. |
A message is printed to the console for each feature, providing information about the feature's type, and additional details such as lower and upper bounds for numerical features, or categories for categorical features. No explicit return value is generated.
Calculate the dimension of the problem, excluding timestamps.
problem_dimension(feature_info, is_time_series = FALSE)
problem_dimension(feature_info, is_time_series = FALSE)
feature_info |
A list containing information about each feature. |
is_time_series |
Boolean indicating if time series data is present. |
The calculated dimension based on the feature types.
Reads a dataset from a CSV file and optionally parses a timestamp column.
read_dataset( dataset_path, timestamp_col = "timestamp", timestamp_formats = c("%d/%m/%Y %H:%M:%S", "%H:%M:%S %d/%m/%Y") )
read_dataset( dataset_path, timestamp_col = "timestamp", timestamp_formats = c("%d/%m/%Y %H:%M:%S", "%H:%M:%S %d/%m/%Y") )
dataset_path |
A string specifying the path to the CSV file. |
timestamp_col |
A string specifying the timestamp column name (default: '"timestamp"'). |
timestamp_formats |
A vector of date-time formats to try for parsing timestamps. |
A data frame containing the dataset.
This function generates a vector of random solutions for a specified length.
rs(candidate_len)
rs(candidate_len)
candidate_len |
The length of the vector of random solutions. |
A vector of random solutions between 0 and 1.
candidate_len <- 10 random_solutions <- rs(candidate_len) print(random_solutions)
candidate_len <- 10 random_solutions <- rs(candidate_len) print(random_solutions)
This function calculates the support and confidence for the given antecedent and consequent in the dataset instances.
supp_conf(antecedent, consequent, instances, features)
supp_conf(antecedent, consequent, instances, features)
antecedent |
The antecedent part of the association rule. |
consequent |
The consequent part of the association rule. |
instances |
A data frame representing instances in the dataset. |
features |
A list containing information about features, including type and bounds. |
A list containing support and confidence values.
This function writes association rules to a CSV file. For time series datasets, it also includes start and end timestamps instead of indices.
write_association_rules_to_csv( rules, file_path, is_time_series = FALSE, timestamps = NULL )
write_association_rules_to_csv( rules, file_path, is_time_series = FALSE, timestamps = NULL )
rules |
A list of association rules. |
file_path |
The file path for the CSV output. |
is_time_series |
A boolean flag indicating if time series information should be included. |
timestamps |
A vector of timestamps corresponding to the time series data. |
No explicit return value. The function writes association rules to a CSV file.