Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The number of buckets in bucket sort is wrong #849

Open
Wangsutan opened this issue Dec 26, 2024 · 0 comments
Open

The number of buckets in bucket sort is wrong #849

Wangsutan opened this issue Dec 26, 2024 · 0 comments

Comments

@Wangsutan
Copy link

The number of buckets in bucket sort should not be set based on the length of the array, otherwise, there will be very few or even no data in each bucket. Please modify this bug.

My code:

use plotters::prelude::*;
use rand::Rng;
use rayon::prelude::*;
use std::collections::HashMap;

pub fn bucket_sort<T>(arr: &[T], edge: usize, bucket_step: usize) -> Vec<T>
where
    T: PartialOrd + Copy + Ord + Into<f64> + Send + Sync,
{
    let arr_len: usize = arr.len();
    if arr_len < 2 {
        return arr.to_vec();
    }

    let arr_max: T = *arr.iter().max().unwrap();
    let arr_min: T = *arr.iter().min().unwrap();
    let arr_range: f64 = arr_max.into() - arr_min.into();
    let bucket_idx_right_edge: usize = arr_range as usize / bucket_step;
    let bucket_len: usize = bucket_idx_right_edge + 1;
    let mut buckets: Vec<Vec<T>> = vec![vec![]; bucket_len];

    // 计算每个值与最小值的距离,并将其映射到非负范围内
    for &x in arr {
        let value_distance_from_min = if arr_range != 0.0 {
            x.into() - arr_min.into()
        } else {
            return arr.to_vec();
        };
        let bucket_idx: usize =
            (value_distance_from_min / arr_range * (bucket_len as f64)).floor() as usize;
        buckets[bucket_idx.min(bucket_idx_right_edge)].push(x);
    }

    if arr_len < edge {
        // Sequential
        for bucket in buckets.iter_mut() {
            if !bucket.is_empty() {
                bucket.sort();
            }
        }
    } else {
        // Parallel
        buckets.par_iter_mut().for_each(|bucket| {
            if !bucket.is_empty() {
                bucket.sort();
            }
        });
    }

    let mut result: Vec<T> = Vec::with_capacity(arr_len);
    for bucket in buckets {
        if !bucket.is_empty() {
            result.extend(bucket);
        }
    }

    result
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut rng: rand::prelude::ThreadRng = rand::thread_rng();
    let arr: Vec<i32> = (0..4096).map(|_| rng.gen_range(-128..128)).collect();

    // 统计每个值的出现次数
    let mut value_counts: HashMap<i32, usize> = HashMap::new();
    for &value in &arr {
        *value_counts.entry(value).or_insert(0) += 1;
    }
    let min_value: f64 = *value_counts.keys().min().unwrap() as f64;
    let max_value: f64 = *value_counts.keys().max().unwrap() as f64;
    let max_frequency: f64 = *value_counts.values().max().unwrap() as f64;

    let data: Vec<(f64, f64)> = value_counts
        .into_iter()
        .map(|(k, v)| (k as f64, v as f64))
        .collect::<Vec<_>>();

    let root = SVGBackend::new("plot.svg", (640, 480)).into_drawing_area();
    root.fill(&WHITE)?;
    let mut chart = ChartBuilder::on(&root)
        .caption("数据分布散点图", ("sans-serif", 30))
        .set_label_area_size(LabelAreaPosition::Left, 40)
        .set_label_area_size(LabelAreaPosition::Bottom, 40)
        .build_cartesian_2d(min_value..max_value, 0.0..max_frequency)?;
    chart.configure_mesh().draw()?;
    chart.draw_series(
        data.iter()
            .map(|(x, y)| Circle::new((*x, *y), 2, BLUE.filled()))
            .collect::<Vec<_>>(),
    )?;

    let sorted: Vec<i32> = bucket_sort(&arr, 1000, 10);
    println!("{:?}", sorted);

    Ok(())
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant