🔥 | Latest

Beautiful, Community, and God: 21 Answers votes oldest newest You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts. so many times but it is not getting to me. Even enhanced irregular regular expressions as used by Perl are not up to the task of parsing HTML. You will never make me crack. HTML is a language of sufficient complexity that it cannot be parsed by regular expressions. Even Jon Skeet cannot parse HTML using regular expressions. Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide. The <center> cannot hold it is too late. The force of regex and HTML together in the same conceptual space will destroy your mind like so much watery putty. If you parse HTML with regex you are giving in to Them and their blasphemous ways which doom us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes. HTML-plus-regexp will liquify the nerves of the sentient whilst you observe, your psyche withering in the onslaught of horror. Regex-based HTML parsers are the cancer that is killing StackOverflow it is too late it is too late we cannot be saved the trangession of a child ensures regex will consume all living tissue (except for HTML which it cannot, as previously prophesied) dear lord help us how can anyone survive this scourge using regex to parse HTML has doomed humanity to an eternity of dread torture and security holes using regex as a tool to process HTML establishes a breach between this world and the dread realm of čorrupt entities (like SGML entities, but more corrupt) a mere glimpse of the world of reg ex parsers for HTML will instantly transport a programmer's consciousness into a world of ceaseless screaming, he comes-the pestilent slithy regex-infection will devour your HTML parser, application and existence for all time like Vişual Basic only worse he comes he comes do not fight he comes, his uņholy radiance destroping all enlightenment, HTML tags leaking frọm your eyesiike liquid pain, the song of regular expression parsing-will extinguish the voices of mortal man from the sphere I can see it can you see t it is beautiful the f inal snuf fing of the lies of Man ALL ĮS LOSTALL IS LOST the pony he comes he comes hejcemes the ichor permeates al MY FACEFACEh god no NO NOOoo Ne stop the an ges aře not reâj ZALGO IS TON THË PONY, HECOMES 1811 Have you tried using an XML parser instead? link edit flag edited Nov 14 at 0:18 community wiki bobince regex and html
Beautiful, Community, and God: 21 Answers
 votes
 oldest
 newest
 You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that
 can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many
 times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is
 insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular
 language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break
 down HTML into its meaningful parts. so many times but it is not getting to me. Even enhanced irregular
 regular expressions as used by Perl are not up to the task of parsing HTML. You will never make me
 crack. HTML is a language of sufficient complexity that it cannot be parsed by regular expressions. Even
 Jon Skeet cannot parse HTML using regular expressions. Every time you attempt to parse HTML with
 regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp.
 Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together
 like love, marriage, and ritual infanticide. The <center> cannot hold it is too late. The force of regex and
 HTML together in the same conceptual space will destroy your mind like so much watery putty. If you
 parse HTML with regex you are giving in to Them and their blasphemous ways which doom us all to
 inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes.
 HTML-plus-regexp will liquify the nerves of the sentient whilst you observe, your psyche withering in the
 onslaught of horror. Regex-based HTML parsers are the cancer that is killing StackOverflow it is too late
 it is too late we cannot be saved the trangession of a child ensures regex will consume all living tissue
 (except for HTML which it cannot, as previously prophesied) dear lord help us how can anyone survive
 this scourge using regex to parse HTML has doomed humanity to an eternity of dread torture and
 security holes using regex as a tool to process HTML establishes a breach between this world and the
 dread realm of čorrupt entities (like SGML entities, but more corrupt) a mere glimpse of the world of reg
 ex parsers for HTML will instantly transport a programmer's consciousness into a world of ceaseless
 screaming, he comes-the pestilent slithy regex-infection will devour your HTML parser, application and
 existence for all time like Vişual Basic only worse he comes he comes do not fight he comes, his uņholy
 radiance destroping all enlightenment, HTML tags leaking frọm your eyesiike liquid pain, the song of
 regular expression parsing-will extinguish the voices of mortal man from the sphere I can see it can you
 see t it is beautiful the f inal snuf fing of the lies of Man ALL ĮS LOSTALL IS LOST the pony he
 comes he comes hejcemes the ichor permeates al MY FACEFACEh god no NO NOOoo Ne
 stop the an ges aře not reâj ZALGO IS TON THË PONY, HECOMES
 1811
 Have you tried using an XML parser instead?
 link edit flag
 edited Nov 14 at 0:18
 community wiki
 bobince
regex and html

regex and html

Parents, Java, and Knowledge: C CA C++ Python Java Every programmer must have deep knowledge of C programming because C is parents of all programming language
Parents, Java, and Knowledge: C
 CA
 C++
 Python
 Java
Every programmer must have deep knowledge of C programming because C is parents of all programming language

Every programmer must have deep knowledge of C programming because C is parents of all programming language

Af, Books, and Crying: ti skerb Retweeted Shan AF RJ mesa 15 - AF SP mesa 71 @ShanaBRX Jun 14 Fuck everyone who whines about ao3 News All News May 2019 Newsletter, Volume 135 Published: Thu 13 Jun 2019 01:03PM 03 Comments: 4 Recently, the Archive of Our Own has received an influx of new Chinese users, a result of tightening content restrictions on other platforms. We would like to extend our warmest welcome to them, and remind everyone that our committees are working to make AO3 as accessible as possible in languages other than English Read more... 20 t 2.8K 6.4K Show this thread wetwareproblem: wrangletangle: zoe2213414: eabevella: naryrising: You can read the post here for more info, but I wanted to just add a bit about what this entails from my POV, on the Support team.  Somewhere between ¼ to 1/3 of all our tickets last month were in Chinese (somewhere upwards of 300 out of 1200 or so), almost all from users just setting up their accounts or trying to find out how to get an invitation.  A lot of the tickets are what I’d characterize as “intro” tickets - they say hi, list favourite fandoms or pairings, or provide samples of fic they’ve written. Although this isn’t necessary on AO3, this is not uncommon in Chinese fandom sites that you have to prove your credentials to get in (in fact it wasn’t uncommon in English-language fandom sites 15-20 years ago).  We respond to all of these tickets, even the ones that just say hi.  We check whether the user has managed to receive their invite or get their account sent up, and if they haven’t, we help them do so.  This means taking every single ticket through our Chinese translation team twice, once so we make sure we understand the initial ticket, and then again to translate our reply.  This is a challenging process, although we’ve found ways to streamline it and can normally get a reply out pretty quickly (like within a few days).  We do it because this is part of why AO3 exists in the first place - to provide a safe haven where users can post their works without worrying about censorship or sudden crackdowns on certain kinds of content.  We do it because this is important, and helping these users get their accounts and be able to share their works safely is why we’re here.  We hope that we’ll be able to help as many of them as possible.   There have been a few (thankfully few, that I’ve seen) complaints about these new AO3 users not always knowing how things work - what language to tag with, or what fandom tags to use, for instance.  To this I would say: 1. Have patience and be considerate.  They are coming to a new site that they aren’t familiar with, and using it in a language they may not be expert in, and it might take a while to learn the ropes.  You can filter out works tagged in Chinese if you don’t want to see them.  Or just scroll past.   2. You can report works tagged with the wrong language or the wrong fandom to our Policy and Abuse team using the link at the bottom of any page.  This will not cause the authors to “get in trouble” (a concern I’ve heard before, as people are reluctant to report for these reasons).  It means the Policy and Abuse team will contact them to ask them to change the language/fandom tag, and if the creator doesn’t, they can edit it directly.  If you remember Strikethrough or the FF.net porn ban or similar purges, please keep them in mind and consider that these users are going through something similar or potentially worse.  This is why AO3 exists.  We are doing our best to try and help make the transition smooth.   I am a Taiwanese and I’d like to put some context behind the recent influx of China based AO3 users. China is tightening their freedom of speech in recent years after Xi has became the chairman (he even canceled the 10 years long term of service of chairman, meaning he can stay as the leader of China as long as he lives–he has became a dictator). They censor words that are deemed “sensitive”, you can’t type anything to criticize the chinise government. Big social media platform won’t even post the posts containing sensitive words. You don’t have the freedom of publish books without the books being approved by the government either. To disguise this whole Ninety Eighty-Four nightmare, they started to pick on the easy target: the women and the minorities (China is getting more and more misogynistic as a result of the government trying to control their male population through encouraging them to control the female population through “chinese tradition family value” but that’s another story). Last year, the chinese government arrested a woman who is a famous yaoi/BL novel writer named 天一 and sentenced her 10 years in jail for “selling obscene publications” and “illegal publication” (she’s not the only BL writer who got arrested. Meanwhile, multiple cases where men raped women only get about 2 years of jail time in China). It’s a warning to anyone who want to publish anything that’s “not approved” by the government that they can literally ruin you.  Just recently the chinese government “contacted” website owners of one of their largest romance/yaoi/slash fiction sites 晉江 and announced that for now on, for the sake of a Clean Society, they can’t write anything that’s slightly “obscene”. No sex scene, no sexual interaction, they can’t even write any bodily interaction below neck (I’m not kidding here). But that’s not their actual goal. They also listed other restriction such as: can’t write anything that’s about the government, the military, the police, “sensitive history”, “race problems”, which is… you basically can’t write anything that might be used as a tool to criticize the government (as many novels did). This recent development really hurt the chinese fanfic writers. They can’t write anything without the fear of being put on the guillotine by the government to show their control. Most of them don’t even think that deep politically, they just want to write slash fictions. But there are no platform safe in China, that’s why the sudden influx of chinese users to AO3. I bet it won’t be long before AO3 got banned in China, but until then, be a little bit patient to them. As much as I hate the chinese government, I pity their people. I’m crying so loud…As a Chinese, you don’t know how your kindness meant to us. When I’m young, I read 1984, and I thought this story is so unrealistic, but now, it’s getting tougher and tougher for fanfic and the writer in China. Thank you ao3. Thank you for the people who care about Chinese people. (hope I didn’t spell anything wrong) The OTW’s account on Weibo, the biggest Chinese social media site, is constantly fielding questions from Chinese users about how to get invitations, how to post, all of it. Chinese fans deeply want to learn how to use AO3. The difference between Lofter’s posting system and AO3′s is perhaps even wider than the gulf between Tumblr and AO3. But imagine if you had to navigate across that gap in a language you didn’t speak, using translation programs that don’t understand fan terminology. This is exactly what the AO3 was built to deal with. We just didn’t get a chance to get the internationalization done first, so things may be bumpy for a while. We are all part of fandom, so let’s take care not to leave anyone out. Just in case it isn’t clear to anyone? This. This right here is precisely why the AO3 doesn’t police content or remove things that are icky or obscene. Because it’s not you who defines what’s obscene. It’s the authorities.
Af, Books, and Crying: ti skerb Retweeted
 Shan AF RJ mesa 15 - AF SP mesa 71 @ShanaBRX Jun 14
 Fuck everyone who whines about ao3
 News
 All News
 May 2019 Newsletter, Volume 135
 Published: Thu 13 Jun 2019 01:03PM 03 Comments: 4
 Recently, the Archive of Our Own has received an influx of
 new Chinese users, a result of tightening content restrictions
 on other platforms. We would like to extend our warmest
 welcome to them, and remind everyone that our committees
 are working to make AO3 as accessible as possible in
 languages other than English
 Read more...
 20
 t 2.8K
 6.4K
 Show this thread
wetwareproblem:
wrangletangle:

zoe2213414:

eabevella:

naryrising:

You can read the post here for more info, but I wanted to just add a bit about what this entails from my POV, on the Support team.  Somewhere between ¼ to 1/3 of all our tickets last month were in Chinese (somewhere upwards of 300 out of 1200 or so), almost all from users just setting up their accounts or trying to find out how to get an invitation.  A lot of the tickets are what I’d characterize as “intro” tickets - they say hi, list favourite fandoms or pairings, or provide samples of fic they’ve written. Although this isn’t necessary on AO3, this is not uncommon in Chinese fandom sites that you have to prove your credentials to get in (in fact it wasn’t uncommon in English-language fandom sites 15-20 years ago).  We respond to all of these tickets, even the ones that just say hi.  We check whether the user has managed to receive their invite or get their account sent up, and if they haven’t, we help them do so.  This means taking every single ticket through our Chinese translation team twice, once so we make sure we understand the initial ticket, and then again to translate our reply. 
This is a challenging process, although we’ve found ways to streamline it and can normally get a reply out pretty quickly (like within a few days).  We do it because this is part of why AO3 exists in the first place - to provide a safe haven where users can post their works without worrying about censorship or sudden crackdowns on certain kinds of content.  We do it because this is important, and helping these users get their accounts and be able to share their works safely is why we’re here.  We hope that we’ll be able to help as many of them as possible.  
There have been a few (thankfully few, that I’ve seen) complaints about these new AO3 users not always knowing how things work - what language to tag with, or what fandom tags to use, for instance.  To this I would say:
1. Have patience and be considerate.  They are coming to a new site that they aren’t familiar with, and using it in a language they may not be expert in, and it might take a while to learn the ropes.  You can filter out works tagged in Chinese if you don’t want to see them.  Or just scroll past.  
2. You can report works tagged with the wrong language or the wrong fandom to our Policy and Abuse team using the link at the bottom of any page.  This will not cause the authors to “get in trouble” (a concern I’ve heard before, as people are reluctant to report for these reasons).  It means the Policy and Abuse team will contact them to ask them to change the language/fandom tag, and if the creator doesn’t, they can edit it directly. 
If you remember Strikethrough or the FF.net porn ban or similar purges, please keep them in mind and consider that these users are going through something similar or potentially worse.  This is why AO3 exists.  We are doing our best to try and help make the transition smooth.  

I am a Taiwanese and I’d like to put some context behind the recent influx of China based AO3 users.
China is tightening their freedom of speech in recent years after Xi has became the chairman (he even canceled the 10 years long term of service of chairman, meaning he can stay as the leader of China as long as he lives–he has became a dictator). 
They censor words that are deemed “sensitive”, you can’t type anything to criticize the chinise government. Big social media platform won’t even post the posts containing sensitive words. You don’t have the freedom of publish books without the books being approved by the government either.
To disguise this whole Ninety Eighty-Four nightmare, they started to pick on the easy target: the women and the minorities (China is getting more and more misogynistic as a result of the government trying to control their male population through encouraging them to control the female population through “chinese tradition family value” but that’s another story). 
Last year, the chinese government arrested a woman who is a famous yaoi/BL novel writer named 天一 and sentenced her 10 years in jail for “selling obscene publications” and “illegal publication” (she’s not the only BL writer who got arrested. Meanwhile, multiple cases where men raped women only get about 2 years of jail time in China). It’s a warning to anyone who want to publish anything that’s “not approved” by the government that they can literally ruin you.  
Just recently the chinese government “contacted” website owners of one of their largest romance/yaoi/slash fiction sites 
晉江

and announced that for now on, for the sake of a Clean Society, they can’t write anything that’s slightly “obscene”. No sex scene, no sexual interaction, they can’t even write any bodily interaction below neck (I’m not kidding here). 
But that’s not their actual goal. They also listed other restriction such as: can’t write anything that’s about the government, the military, the police, “sensitive history”, “race problems”, which is… you basically can’t write anything that might be used as a tool to criticize the government (as many novels did). 
This recent development really hurt the chinese fanfic writers. They can’t write anything without the fear of being put on the guillotine by the government to show their control. Most of them don’t even think that deep politically, they just want to write slash fictions. But there are no platform safe in China, that’s why the sudden influx of chinese users to AO3. 
I bet it won’t be long before AO3 got banned in China, but until then, be a little bit patient to them. As much as I hate the chinese government, I pity their people. 


I’m crying so loud…As a Chinese, you don’t know how your kindness meant to us. When I’m young, I read 1984, and I thought this story is so unrealistic, but now, it’s getting tougher and tougher for fanfic and the writer in China. Thank you ao3. Thank you for the people who care about Chinese people. (hope I didn’t spell anything wrong)

The OTW’s account on Weibo, the biggest Chinese social media site, is
 constantly fielding questions from Chinese users about how to get 
invitations, how to post, all of it. Chinese fans deeply want to learn 
how to use AO3. The difference between Lofter’s posting system and AO3′s
 is perhaps even wider than the gulf between Tumblr and AO3. But imagine
 if you had to navigate across that gap in a language you didn’t speak, 
using translation programs that don’t understand fan terminology.
This is exactly
 what the AO3 was built to deal with. We just didn’t get a chance to get
 the internationalization done first, so things may be bumpy for a 
while. We are all part of fandom, so let’s take care not to leave 
anyone out.


Just in case it isn’t clear to anyone? This. This right here is precisely why the AO3 doesn’t police content or remove things that are icky or obscene.
Because it’s not you who defines what’s obscene. It’s the authorities.

wetwareproblem: wrangletangle: zoe2213414: eabevella: naryrising: You can read the post here for more info, but I wanted to just add a b...