Skip to content

ANSI and Unicode are encoding standards used across the world by writers and common users. ANSI is an older encoding version and is used in operating systems like Windows 95/ 98 and much older systems. Unicode is a newer version of encoding used in the current day operating systems

License

Notifications You must be signed in to change notification settings

mohsin-riad/automation-bijoy-to-avro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Bijoy To Avro

aka - ANSI To UNICODE conversion

GitHub license macOS made-with-python Only 32 Kb

Intro

ANSI and Unicode are encoding standards used across the world by writers and common users. ANSI is an older encoding version and is used in operating systems like Windows 95/ 98 and much older systems. Unicode is a newer version of encoding used in the current day operating systems

Installation

Open terminal

    git clone https://github.com/mohsin-riad/automation-bijoy-to-avro.git
    cd automation-bijoy-to-avro/Source/

Workflow

This repository contains conversion of legacy .doc version of ansi document to .docx version of unicode document. which ultimately being converted to .txt document.

Methodology

  • Initially choose Traget directory /Target_Path

  • Run all cells following instrcutions

  • Goto /Target_Path and CLI use command cd .. to backtrack to previous directory

  • Where you will find directory Name starts with /mod-*

  • Sample Traget directory structure

--- Root directory (name doesn't matter)
    |- Traget_directory
        |- a
        |- |- a1 
        |- |- |- a11 
        |- |- |- a12 
        |- |- a2 
        |- b
        |- |- b1 
        |- |- b2 
        |- |- b3 
        |- c
        |- d
        |- e
        |- f

Sample input directory visualization

Sample Output directory visualization

  • Here you can see that the files are being converted to .txt having UNICODE data

Sample Input File

Sample Output File


Conclusion

ভাষা হোক ঊন্মুক্ত

About

ANSI and Unicode are encoding standards used across the world by writers and common users. ANSI is an older encoding version and is used in operating systems like Windows 95/ 98 and much older systems. Unicode is a newer version of encoding used in the current day operating systems

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published