Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confused about dataset #31

Open
chenj02 opened this issue Sep 18, 2024 · 11 comments
Open

Confused about dataset #31

chenj02 opened this issue Sep 18, 2024 · 11 comments

Comments

@chenj02
Copy link

chenj02 commented Sep 18, 2024

Great work! But I'm a litle confused about training data. It seems that StyleShot need data like, content-style-stylized image, but in the data tutorial I didn't see this preparation step? Can you explain more clearly? Thanks!

@Jeoyal
Copy link
Contributor

Jeoyal commented Sep 18, 2024

Hi @chenj02 , thank your for your interest in our work. StyleShot follows the self-supervised training paradigm, which means reference image-style image-ground truth are processed from the same image input.
Specifically, when training stage 1, the input image will be partitioned and further processed into style embeddings by our style-aware encoder.
And in stage 2, the input image is contoured into a content image.
Although the style and content inputs are derived from the same image, they are decoupled to distinctly represent style and content, respectively.

@chenj02
Copy link
Author

chenj02 commented Sep 18, 2024

Hi @chenj02 , thank your for your interest in our work. StyleShot follows the self-supervised training paradigm, which means reference image-style image-ground truth are processed from the same image input. Specifically, when training stage 1, the input image will be partitioned and further processed into style embeddings by our style-aware encoder. And in stage 2, the input image is contoured into a content image. Although the style and content inputs are derived from the same image, they are decoupled to distinctly represent style and content, respectively.

oh! Thank you for quick reply! I understand it! I have one more question: will the StyleGallery be fully organized and open-sourced?

@Jeoyal
Copy link
Contributor

Jeoyal commented Sep 18, 2024

Hi, we have released the JSON file and dataset tutorial of StyleGallery.
StyleGallery includes three open-sourced dataset JourneyDB, WIKIART, and a subset of stylized images from MultiGen-20M ( A subset of LAION-Aesthetics).
Before proceeding StyleGallery with further processing, you need to download these three datasets.

@chenj02
Copy link
Author

chenj02 commented Sep 18, 2024

Thanks very much!

@Delicious-Bitter-Melon
Copy link

Hi, we have released the JSON file and dataset tutorial of StyleGallery. StyleGallery includes three open-sourced dataset JourneyDB, WIKIART, and a subset of stylized images from MultiGen-20M ( A subset of LAION-Aesthetics). Before proceeding StyleGallery with further processing, you need to download these three datasets.

The URL about [WIKIART (https://huggingface.co/datasets/huggan/wikiart) is used to download the dataset in parquet format, do you provide the code to convert it to the PNG or JPG format as you use?

@Jeoyal
Copy link
Contributor

Jeoyal commented Oct 28, 2024

Hi, we have released the JSON file and dataset tutorial of StyleGallery. StyleGallery includes three open-sourced dataset JourneyDB, WIKIART, and a subset of stylized images from MultiGen-20M ( A subset of LAION-Aesthetics). Before proceeding StyleGallery with further processing, you need to download these three datasets.

The URL about [WIKIART (https://huggingface.co/datasets/huggan/wikiart) is used to download the dataset in parquet format, do you provide the code to convert it to the PNG or JPG format as you use?

Just read the images sequentially from the .parquet file and save them.

@Delicious-Bitter-Melon
Copy link

Hi, we have released the JSON file and dataset tutorial of StyleGallery. StyleGallery includes three open-sourced dataset JourneyDB, WIKIART, and a subset of stylized images from MultiGen-20M ( A subset of LAION-Aesthetics). Before proceeding StyleGallery with further processing, you need to download these three datasets.

The URL about [WIKIART (https://huggingface.co/datasets/huggan/wikiart) is used to download the dataset in parquet format, do you provide the code to convert it to the PNG or JPG format as you use?

Just read the images sequentially from the .parquet file and save them.

Thanks for your quick reply. Is the image indexed starting from 00000.jpg or 00001.jpg?

@Jeoyal
Copy link
Contributor

Jeoyal commented Oct 28, 2024

Hi, we have released the JSON file and dataset tutorial of StyleGallery. StyleGallery includes three open-sourced dataset JourneyDB, WIKIART, and a subset of stylized images from MultiGen-20M ( A subset of LAION-Aesthetics). Before proceeding StyleGallery with further processing, you need to download these three datasets.

The URL about [WIKIART (https://huggingface.co/datasets/huggan/wikiart) is used to download the dataset in parquet format, do you provide the code to convert it to the PNG or JPG format as you use?

Just read the images sequentially from the .parquet file and save them.

Thanks for your quick reply. Is the image indexed starting from 00000.jpg or 00001.jpg?

00000.jpg

@Jeoyal
Copy link
Contributor

Jeoyal commented Oct 28, 2024

Hi, we have released the JSON file and dataset tutorial of StyleGallery. StyleGallery includes three open-sourced dataset JourneyDB, WIKIART, and a subset of stylized images from MultiGen-20M ( A subset of LAION-Aesthetics). Before proceeding StyleGallery with further processing, you need to download these three datasets.

The URL about [WIKIART (https://huggingface.co/datasets/huggan/wikiart) is used to download the dataset in parquet format, do you provide the code to convert it to the PNG or JPG format as you use?

Here is my load script:

class MyDataset(torch.utils.data.Dataset):

def __init__(self, path):
    super().__init__()
    self.path = path
    self.pars = os.listdir(self.path)
    self.data_paths = []
    for p in self.pars:
        table = parquet.read_table(os.path.join(self.path, p))
        df = table.to_pandas()
        for i in range(len(df)):
            self.data_paths.append([p, i])

def __getitem__(self, idx):
    par, index = self.data_paths[idx][0], self.data_paths[idx][1]
    data = parquet.read_table(os.path.join(self.path, par)).to_pandas()
    data = data.iloc[index]
    img_byte = data['image']['bytes']
    bytes_stream = BytesIO(img_byte)
    image = Image.open(bytes_stream)
    return image, data['artist'], data['genre'],  data['style']

def __len__(self):
    return len(self.data_paths)

@Delicious-Bitter-Melon
Copy link

Hi, we have released the JSON file and dataset tutorial of StyleGallery. StyleGallery includes three open-sourced dataset JourneyDB, WIKIART, and a subset of stylized images from MultiGen-20M ( A subset of LAION-Aesthetics). Before proceeding StyleGallery with further processing, you need to download these three datasets.

The URL about [WIKIART (https://huggingface.co/datasets/huggan/wikiart) is used to download the dataset in parquet format, do you provide the code to convert it to the PNG or JPG format as you use?

Here is my load script:

class MyDataset(torch.utils.data.Dataset):

def __init__(self, path):
    super().__init__()
    self.path = path
    self.pars = os.listdir(self.path)
    self.data_paths = []
    for p in self.pars:
        table = parquet.read_table(os.path.join(self.path, p))
        df = table.to_pandas()
        for i in range(len(df)):
            self.data_paths.append([p, i])

def __getitem__(self, idx):
    par, index = self.data_paths[idx][0], self.data_paths[idx][1]
    data = parquet.read_table(os.path.join(self.path, par)).to_pandas()
    data = data.iloc[index]
    img_byte = data['image']['bytes']
    bytes_stream = BytesIO(img_byte)
    image = Image.open(bytes_stream)
    return image, data['artist'], data['genre'],  data['style']

def __len__(self):
    return len(self.data_paths)

When I use the "os.listdir(self.path)", I get the sequence starting from train-00067-of-00072.parquet instead of train-00000-of-00072.parquet. So do you begin to extract images from train-00067-of-00072.parquet?
image

@Jeoyal
Copy link
Contributor

Jeoyal commented Oct 28, 2024

Hi, we have released the JSON file and dataset tutorial of StyleGallery. StyleGallery includes three open-sourced dataset JourneyDB, WIKIART, and a subset of stylized images from MultiGen-20M ( A subset of LAION-Aesthetics). Before proceeding StyleGallery with further processing, you need to download these three datasets.

The URL about [WIKIART (https://huggingface.co/datasets/huggan/wikiart) is used to download the dataset in parquet format, do you provide the code to convert it to the PNG or JPG format as you use?

Here is my load script:
class MyDataset(torch.utils.data.Dataset):

def __init__(self, path):
    super().__init__()
    self.path = path
    self.pars = os.listdir(self.path)
    self.data_paths = []
    for p in self.pars:
        table = parquet.read_table(os.path.join(self.path, p))
        df = table.to_pandas()
        for i in range(len(df)):
            self.data_paths.append([p, i])

def __getitem__(self, idx):
    par, index = self.data_paths[idx][0], self.data_paths[idx][1]
    data = parquet.read_table(os.path.join(self.path, par)).to_pandas()
    data = data.iloc[index]
    img_byte = data['image']['bytes']
    bytes_stream = BytesIO(img_byte)
    image = Image.open(bytes_stream)
    return image, data['artist'], data['genre'],  data['style']

def __len__(self):
    return len(self.data_paths)

When I use the "os.listdir(self.path)", I get the sequence starting from train-00067-of-00072.parquet instead of train-00000-of-00072.parquet. So do you begin to extract images from train-00067-of-00072.parquet? image

I am start from 00000 to 00072. You might need to sorted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants