Confused about dataset #31

chenj02 · 2024-09-18T02:18:59Z

Great work! But I'm a litle confused about training data. It seems that StyleShot need data like, content-style-stylized image, but in the data tutorial I didn't see this preparation step? Can you explain more clearly? Thanks!

Jeoyal · 2024-09-18T02:58:29Z

Hi @chenj02 , thank your for your interest in our work. StyleShot follows the self-supervised training paradigm, which means reference image-style image-ground truth are processed from the same image input.
Specifically, when training stage 1, the input image will be partitioned and further processed into style embeddings by our style-aware encoder.
And in stage 2, the input image is contoured into a content image.
Although the style and content inputs are derived from the same image, they are decoupled to distinctly represent style and content, respectively.

chenj02 · 2024-09-18T03:03:26Z

Hi @chenj02 , thank your for your interest in our work. StyleShot follows the self-supervised training paradigm, which means reference image-style image-ground truth are processed from the same image input. Specifically, when training stage 1, the input image will be partitioned and further processed into style embeddings by our style-aware encoder. And in stage 2, the input image is contoured into a content image. Although the style and content inputs are derived from the same image, they are decoupled to distinctly represent style and content, respectively.

oh! Thank you for quick reply! I understand it! I have one more question: will the StyleGallery be fully organized and open-sourced?

Jeoyal · 2024-09-18T03:14:53Z

Hi, we have released the JSON file and dataset tutorial of StyleGallery.
StyleGallery includes three open-sourced dataset JourneyDB, WIKIART, and a subset of stylized images from MultiGen-20M ( A subset of LAION-Aesthetics).
Before proceeding StyleGallery with further processing, you need to download these three datasets.

chenj02 · 2024-09-18T03:31:05Z

Thanks very much!

Delicious-Bitter-Melon · 2024-10-28T02:02:52Z

Hi, we have released the JSON file and dataset tutorial of StyleGallery. StyleGallery includes three open-sourced dataset JourneyDB, WIKIART, and a subset of stylized images from MultiGen-20M ( A subset of LAION-Aesthetics). Before proceeding StyleGallery with further processing, you need to download these three datasets.

The URL about [WIKIART (https://huggingface.co/datasets/huggan/wikiart) is used to download the dataset in parquet format, do you provide the code to convert it to the PNG or JPG format as you use?

Jeoyal · 2024-10-28T03:20:27Z

Hi, we have released the JSON file and dataset tutorial of StyleGallery. StyleGallery includes three open-sourced dataset JourneyDB, WIKIART, and a subset of stylized images from MultiGen-20M ( A subset of LAION-Aesthetics). Before proceeding StyleGallery with further processing, you need to download these three datasets.

The URL about [WIKIART (https://huggingface.co/datasets/huggan/wikiart) is used to download the dataset in parquet format, do you provide the code to convert it to the PNG or JPG format as you use?

Just read the images sequentially from the .parquet file and save them.

Delicious-Bitter-Melon · 2024-10-28T04:30:30Z

Hi, we have released the JSON file and dataset tutorial of StyleGallery. StyleGallery includes three open-sourced dataset JourneyDB, WIKIART, and a subset of stylized images from MultiGen-20M ( A subset of LAION-Aesthetics). Before proceeding StyleGallery with further processing, you need to download these three datasets.

The URL about [WIKIART (https://huggingface.co/datasets/huggan/wikiart) is used to download the dataset in parquet format, do you provide the code to convert it to the PNG or JPG format as you use?

Just read the images sequentially from the .parquet file and save them.

Thanks for your quick reply. Is the image indexed starting from 00000.jpg or 00001.jpg?

Jeoyal · 2024-10-28T04:32:33Z

Hi, we have released the JSON file and dataset tutorial of StyleGallery. StyleGallery includes three open-sourced dataset JourneyDB, WIKIART, and a subset of stylized images from MultiGen-20M ( A subset of LAION-Aesthetics). Before proceeding StyleGallery with further processing, you need to download these three datasets.

The URL about [WIKIART (https://huggingface.co/datasets/huggan/wikiart) is used to download the dataset in parquet format, do you provide the code to convert it to the PNG or JPG format as you use?

Just read the images sequentially from the .parquet file and save them.

Thanks for your quick reply. Is the image indexed starting from 00000.jpg or 00001.jpg?

00000.jpg

Jeoyal · 2024-10-28T04:38:35Z

Hi, we have released the JSON file and dataset tutorial of StyleGallery. StyleGallery includes three open-sourced dataset JourneyDB, WIKIART, and a subset of stylized images from MultiGen-20M ( A subset of LAION-Aesthetics). Before proceeding StyleGallery with further processing, you need to download these three datasets.

The URL about [WIKIART (https://huggingface.co/datasets/huggan/wikiart) is used to download the dataset in parquet format, do you provide the code to convert it to the PNG or JPG format as you use?

Here is my load script:

class MyDataset(torch.utils.data.Dataset):

def __init__(self, path):
    super().__init__()
    self.path = path
    self.pars = os.listdir(self.path)
    self.data_paths = []
    for p in self.pars:
        table = parquet.read_table(os.path.join(self.path, p))
        df = table.to_pandas()
        for i in range(len(df)):
            self.data_paths.append([p, i])

def __getitem__(self, idx):
    par, index = self.data_paths[idx][0], self.data_paths[idx][1]
    data = parquet.read_table(os.path.join(self.path, par)).to_pandas()
    data = data.iloc[index]
    img_byte = data['image']['bytes']
    bytes_stream = BytesIO(img_byte)
    image = Image.open(bytes_stream)
    return image, data['artist'], data['genre'],  data['style']

def __len__(self):
    return len(self.data_paths)

Delicious-Bitter-Melon · 2024-10-28T05:12:53Z

Hi, we have released the JSON file and dataset tutorial of StyleGallery. StyleGallery includes three open-sourced dataset JourneyDB, WIKIART, and a subset of stylized images from MultiGen-20M ( A subset of LAION-Aesthetics). Before proceeding StyleGallery with further processing, you need to download these three datasets.

The URL about [WIKIART (https://huggingface.co/datasets/huggan/wikiart) is used to download the dataset in parquet format, do you provide the code to convert it to the PNG or JPG format as you use?

Here is my load script:

class MyDataset(torch.utils.data.Dataset):
def __init__(self, path):
    super().__init__()
    self.path = path
    self.pars = os.listdir(self.path)
    self.data_paths = []
    for p in self.pars:
        table = parquet.read_table(os.path.join(self.path, p))
        df = table.to_pandas()
        for i in range(len(df)):
            self.data_paths.append([p, i])

def __getitem__(self, idx):
    par, index = self.data_paths[idx][0], self.data_paths[idx][1]
    data = parquet.read_table(os.path.join(self.path, par)).to_pandas()
    data = data.iloc[index]
    img_byte = data['image']['bytes']
    bytes_stream = BytesIO(img_byte)
    image = Image.open(bytes_stream)
    return image, data['artist'], data['genre'],  data['style']

def __len__(self):
    return len(self.data_paths)

When I use the "os.listdir(self.path)", I get the sequence starting from train-00067-of-00072.parquet instead of train-00000-of-00072.parquet. So do you begin to extract images from train-00067-of-00072.parquet?

Jeoyal · 2024-10-28T05:37:54Z

Hi, we have released the JSON file and dataset tutorial of StyleGallery. StyleGallery includes three open-sourced dataset JourneyDB, WIKIART, and a subset of stylized images from MultiGen-20M ( A subset of LAION-Aesthetics). Before proceeding StyleGallery with further processing, you need to download these three datasets.

The URL about [WIKIART (https://huggingface.co/datasets/huggan/wikiart) is used to download the dataset in parquet format, do you provide the code to convert it to the PNG or JPG format as you use?

Here is my load script:
class MyDataset(torch.utils.data.Dataset):
def __init__(self, path):
    super().__init__()
    self.path = path
    self.pars = os.listdir(self.path)
    self.data_paths = []
    for p in self.pars:
        table = parquet.read_table(os.path.join(self.path, p))
        df = table.to_pandas()
        for i in range(len(df)):
            self.data_paths.append([p, i])

def __getitem__(self, idx):
    par, index = self.data_paths[idx][0], self.data_paths[idx][1]
    data = parquet.read_table(os.path.join(self.path, par)).to_pandas()
    data = data.iloc[index]
    img_byte = data['image']['bytes']
    bytes_stream = BytesIO(img_byte)
    image = Image.open(bytes_stream)
    return image, data['artist'], data['genre'],  data['style']

def __len__(self):
    return len(self.data_paths)
When I use the "os.listdir(self.path)", I get the sequence starting from train-00067-of-00072.parquet instead of train-00000-of-00072.parquet. So do you begin to extract images from train-00067-of-00072.parquet?

I am start from 00000 to 00072. You might need to sorted.

fumoab · 2024-12-26T12:10:06Z

Hi, we have released the JSON file and dataset tutorial of StyleGallery. StyleGallery includes three open-sourced dataset JourneyDB, WIKIART, and a subset of stylized images from MultiGen-20M ( A subset of LAION-Aesthetics). Before proceeding StyleGallery with further processing, you need to download these three datasets.

The URL about [WIKIART (https://huggingface.co/datasets/huggan/wikiart) is used to download the dataset in parquet format, do you provide the code to convert it to the PNG or JPG format as you use?

Here is my load script:
class MyDataset(torch.utils.data.Dataset):
def __init__(self, path):
    super().__init__()
    self.path = path
    self.pars = os.listdir(self.path)
    self.data_paths = []
    for p in self.pars:
        table = parquet.read_table(os.path.join(self.path, p))
        df = table.to_pandas()
        for i in range(len(df)):
            self.data_paths.append([p, i])

def __getitem__(self, idx):
    par, index = self.data_paths[idx][0], self.data_paths[idx][1]
    data = parquet.read_table(os.path.join(self.path, par)).to_pandas()
    data = data.iloc[index]
    img_byte = data['image']['bytes']
    bytes_stream = BytesIO(img_byte)
    image = Image.open(bytes_stream)
    return image, data['artist'], data['genre'],  data['style']

def __len__(self):
    return len(self.data_paths)
When I use the "os.listdir(self.path)", I get the sequence starting from train-00067-of-00072.parquet instead of train-00000-of-00072.parquet. So do you begin to extract images from train-00067-of-00072.parquet?
I am start from 00000 to 00072. You might need to sorted.

Hello, thank you very much for your work on the stylized dataset. As a beginner in this field, I want to conduct experiments by training with my own private dataset. I plan to create my own content images and manually create their content text descriptions. For the style aspect, I will use your StyleGallery.jsonl file. Could you please provide me with some steps or suggestions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confused about dataset #31

Confused about dataset #31

chenj02 commented Sep 18, 2024 •

edited

Loading

Jeoyal commented Sep 18, 2024 •

edited

Loading

chenj02 commented Sep 18, 2024

Jeoyal commented Sep 18, 2024

chenj02 commented Sep 18, 2024

Delicious-Bitter-Melon commented Oct 28, 2024

Jeoyal commented Oct 28, 2024

Delicious-Bitter-Melon commented Oct 28, 2024

Jeoyal commented Oct 28, 2024

Jeoyal commented Oct 28, 2024

Delicious-Bitter-Melon commented Oct 28, 2024

Jeoyal commented Oct 28, 2024

fumoab commented Dec 26, 2024

Confused about dataset #31

Confused about dataset #31

Comments

chenj02 commented Sep 18, 2024 • edited Loading

Jeoyal commented Sep 18, 2024 • edited Loading

chenj02 commented Sep 18, 2024

Jeoyal commented Sep 18, 2024

chenj02 commented Sep 18, 2024

Delicious-Bitter-Melon commented Oct 28, 2024

Jeoyal commented Oct 28, 2024

Delicious-Bitter-Melon commented Oct 28, 2024

Jeoyal commented Oct 28, 2024

Jeoyal commented Oct 28, 2024

Delicious-Bitter-Melon commented Oct 28, 2024

Jeoyal commented Oct 28, 2024

fumoab commented Dec 26, 2024

chenj02 commented Sep 18, 2024 •

edited

Loading

Jeoyal commented Sep 18, 2024 •

edited

Loading