• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

How can we force Microsoft to change a default setting for Notepad?

You could have downloaded and installed Notepad++ in the time it took to post this.

Thank you.

-
Notepad++ doesn't appear in Microsoft's Store, although the guide for it is. Now how stupid is that? Ok, now I'm off to find it on the web.
 
Last edited:
Where can I go to find Notepad++? I found one, but when I tried to download it, all it did was download some program called One Launch, but no Noepad++.

-
 
Last edited:
No, it's just a single quote mark (").

I don't know why my "ANSI" text files do that when they're opened with "UTF-8" coding, but that's exactly what happens, but they reappear when I use Ethan's way of opening them.

The ANSI character set is an 8-bit fixed width ("byte") character set. This means its characters are represented by numbers in the range 0 - 255. All the characters less than 128 are the same as ASCII. All the numbers from 160 to 255 represent the same characters as ISO-8859-1. ISO-8859-1 is the default character set used in web pages, if no other character set is specified in the HTML.

UTF-8 is a multibyte 8-bit character set. This means that, although the unit of encoding is a number between 0 and 255, UTF-8 can use multiple numbers to define one character. The way it does this is by using the top few bits of the byte to encode continuation bytes.

If the top bit of the first byte is 0, for example "A" is character 65, which, in binary is 01000001, then the character has only one byte. All of the first 128 characters in UTF-8 are one byte and they correspond to ASCII. If the top bit is 1, then either it represents the start of a multibyte character, or it is a continuation of a multibyte character. The patterns look like this

110xxxxx => there is one continuation character
1110xxxx => there are two continuation characters
11110xxx => there are three continuation characters
10xxxxxx => a continuation character

The 'x's represent parts of the Unicode code point

An example: The Euro symbol € is not defined in ASCII or ISO-8859-1. It is defined in ANSI though and is character number 128. In Unicode, its code point is 20ac (in hex) In UTF-8, it takes three bytes to define it: 226, 130, 172. In binary, these numbers are:

11100010 => Indicates two continuation characters
10000010 => Indicates the first continuation character
10101100 => Indicates the second continuation character

Another example: the opening "smart" quote. In ANSI, this is character number 145. In binary, it looks like this:

10010001

If you tried to read this as UTF-8, you would see it as a continuation character all on its own. It doesn't mean anything, so Notepad just puts some character in that is meant to indicate "invalid character". In your case it seems to be putting an @ sign in. That doesn't mean there really is an @ there, just that Notepad found a character it can't interpret as UTF-8 and it's a placeholder.

Everybody should be using UTF-8 for everything now. Once we are all doing that, all of these character encoding problems will be a thing of the past.

Once you have got Notepad++, you should start saving your web pages as UTF-8. However, you can't just do that because, as I said, the default encoding for a web page is ISO-8859-1. You need to add

<meta charset="utf-8">

to the head section of each web page you convert to UTF-8 to tell browsers that it is the encoding for that page.
 
The ANSI character set is an 8-bit fixed width ("byte") character set. This means its characters are represented by numbers in the range 0 - 255. All the characters less than 128 are the same as ASCII. All the numbers from 160 to 255 represent the same characters as ISO-8859-1. ISO-8859-1 is the default character set used in web pages, if no other character set is specified in the HTML.

UTF-8 is a multibyte 8-bit character set. This means that, although the unit of encoding is a number between 0 and 255, UTF-8 can use multiple numbers to define one character. The way it does this is by using the top few bits of the byte to encode continuation bytes.

If the top bit of the first byte is 0, for example "A" is character 65, which, in binary is 01000001, then the character has only one byte. All of the first 128 characters in UTF-8 are one byte and they correspond to ASCII. If the top bit is 1, then either it represents the start of a multibyte character, or it is a continuation of a multibyte character. The patterns look like this

110xxxxx => there is one continuation character
1110xxxx => there are two continuation characters
11110xxx => there are three continuation characters
10xxxxxx => a continuation character

The 'x's represent parts of the Unicode code point

An example: The Euro symbol € is not defined in ASCII or ISO-8859-1. It is defined in ANSI though and is character number 128. In Unicode, its code point is 20ac (in hex) In UTF-8, it takes three bytes to define it: 226, 130, 172. In binary, these numbers are:

11100010 => Indicates two continuation characters
10000010 => Indicates the first continuation character
10101100 => Indicates the second continuation character

Another example: the opening "smart" quote. In ANSI, this is character number 145. In binary, it looks like this:

10010001

If you tried to read this as UTF-8, you would see it as a continuation character all on its own. It doesn't mean anything, so Notepad just puts some character in that is meant to indicate "invalid character". In your case it seems to be putting an @ sign in. That doesn't mean there really is an @ there, just that Notepad found a character it can't interpret as UTF-8 and it's a placeholder.

Everybody should be using UTF-8 for everything now. Once we are all doing that, all of these character encoding problems will be a thing of the past.

Once you have got Notepad++, you should start saving your web pages as UTF-8. However, you can't just do that because, as I said, the default encoding for a web page is ISO-8859-1. You need to add

<meta charset="utf-8">

to the head section of each web page you convert to UTF-8 to tell browsers that it is the encoding for that page.


Thank you for all that great information, but unfortunately, I have more than a couple thousand individual webpages that I'd have to add the tag <meta charset="utf-8"> to, and that's too much work for me.

-
 
Thank you for all that great information, but unfortunately, I have more than a couple thousand individual webpages that I'd have to add the tag <meta charset="utf-8"> to, and that's too much work for me.

-

Yes. I'd just go page by page. As you find you need to update a page, do it just for that page.
 
for i in $(ls -R .); do sed -i '' 's/head\>/head\>\n\<meta charset\=\"UTF\-8\"\>/'; done

(Probably copy a few of your files to a test directory, and test it there to make sure you have all your escapes correct and the results are consistent.)
 
I have to get Notepad++ before I can do that.

I tried going to the site that was recommended here (https://notepad-plus-plus.org/downloads/), but all I can get to download is a One Launch app that I didn't like at all.

What am I doing wrong?

-
The download site often has ads that have big DOWNLOAD buttons which are not related at all. Look for this text:
  • Download 64-bit x64
and a big graphic of a cardboard box with Notepad++ on it. The green download button under that should give you npp.8.5.7.Installer.x64.exe.

There are also links further down to other formats, e.g. a 32-bit (if you are running a very old version of Windows, which is unlikely).

Or use this direct link to the Windows 64-bit installer: https://github.com/notepad-plus-plu...s/download/v8.5.7/npp.8.5.7.Installer.x64.exe

FYI, the bit of script theprestige posted will only work in Linux (or a bash prompt in Windows). There are ways to achieve the same thing in PowerShell, but I would recommend you don't attempt either.
 
Last edited:
I have to get Notepad++ before I can do that.

I tried going to the site that was recommended here (https://notepad-plus-plus.org/downloads/), but all I can get to download is a One Launch app that I didn't like at all.

What am I doing wrong?

-

What you're doing wrong: Running a browser without a decent ad-blocker, and without having developed the skill of identifying the actual product download link among the ad links for downloading other things.

I'd say, run Chrome and Ublock Origin, but honestly it's a whole family of skills and no little aptitude, to get to the point where you're seeing past all the nonsense that clutters up a typical download page. Even when it's one of the "good guys", like the Notepad++ folks.

So I'm gonna say that for some users, Microsoft Notepad is probably the right answer. And I apologize for adding to the confusion. Good luck!
 
What you're doing wrong: Running a browser without a decent ad-blocker, and without having developed the skill of identifying the actual product download link among the ad links for downloading other things.

I'd say, run Chrome and Ublock Origin, but honestly it's a whole family of skills and no little aptitude, to get to the point where you're seeing past all the nonsense that clutters up a typical download page. Even when it's one of the "good guys", like the Notepad++ folks.
So I'm gonna say that for some users, Microsoft Notepad is probably the right answer. And I apologize for adding to the confusion. Good luck!


Thank you. I really do appreciate your insights and everyone else's too.

I'm not really an expert on computers, but I have been working with them since 1974 when I trained to be an ECM (ElectronicCounterMeasures) tech---basically what that involved was radar detection and jamming.

1974 was also the year after the Altair 8800 Microcomputer came out, and writing the code for that is what got Bill Gates and Paul Allens' careers in OS software started.

Sorry if all of that was boring.

-
 
for i in $(ls -R .); do sed -i '' 's/head\>/head\>\n\<meta charset\=\"UTF\-8\"\>/'; done

(Probably copy a few of your files to a test directory, and test it there to make sure you have all your escapes correct and the results are consistent.)

But you also need to convert the entire file into UTF-8 at the same time.
 

Back
Top Bottom