Commit Briefs

fcabe874f9 Rene Kita

Avoid potential UB when int is more then 32 bits (master)


d16111b25f Rene Kita

Handle <hr> tags


19298ee53a Rene Kita

Add missing break statement


2178041f18 Rene Kita

Set executable flag for test runner


763ef0b5be Rene Kita

Use unbuffered stdout when debugging


1a62661fd2 Rene Kita

Remove core file on OpenBSD when cleaning


3aa7cf79ac Rene Kita

Refactor handling of pre-formatted text


5938b5a758 Rene Kita

Do not put empty lines around pre toggle lines


81f14aeaac Rene Kita

Use switch instead of if-else


f599567197 Rene Kita

Reorder pre tag logic


Branches

Tags

This repository contains no tags

Tree

.builds/
.gitignorecommits | blame
COPYINGcommits | blame
INSTALLcommits | blame
Makefilecommits | blame
READMEcommits | blame
h2g.1commits | blame
h2g.ccommits | blame
t/

README

README

  h2g is a HTML to gemtext converter. It reads HTML from stdin and writes
  gemtext to stdout handling a subset of HTML elements and entities.

  The following HTML elements are recognized, the rest is ignored:
  * <a href=>
	A reference number is inserted instead of the link and the link is
	added to a list at the bottom of the document. Links to element
	identifier are ignored. In relative local links (starting with '.') a
	'.html' suffix is replaced with '.gmi'.
  * <b>
	Element is surrounded with '*'.
  * <br>
	A line break is enforced.
  * <em>, <i>, <u>
	Element is surrounded with '_'.
  * <h1> to <h6>
	Content is put on a single line and prefixed with the corresponding
	number of '#'. Block is enclosed with empty lines.
  * <img>
	Alt text is printed in place of the image and the source is added to
	the footnote link list.
  * <p>
	Block is enclosed with empty lines.
  * <pre>, <blockquote>
	Content is written as is, dropping leading and trailing empty lines.
	Block is enclosed with empty lines.
  * <table>, <tr>, <th>, <td>
	Tables are surrounded with empty lines. Each row is printed to a
	single line. A literal tab character is inserted between two <td>
	elements. <tr> is treated the same as <td>.
  * <li> inside <ol>
	Each <li> element is printed to a single line prefixed with a
	consecutively increasing number. Block is enclosed with empty lines.
  * <li> inside <ul>
	Each <li> element is printed to a single line prefixed with '*'. Block
	is enclosed with empty lines.
  * <s>
	For every word in the element a ^W is printed after the element.


CAVEATS

  * All input is ignored until a <body> element is found!


BUGS PATCHES FEATURE REQUESTS QUESTIONS INSULTS

  mail@rkta.de


EXAMPLE

Input:
------

<!DOCTYPE html>
<html lang="en">
<head>
<title>TITLE</title>
</head>
<body>
<header>
<H1>H1</H1>
</header>
<h2>H2</h2>
<p><s>A sentence</s>Paragraph <em>with</em> an <u>important</u>
<a href="./local.html"><b>local</b> link</a>.</p>
<img alt='alt text' src='./img.png'>
<pre>
	Pre-formatted
		text
</pre>
break<br>row
<ul> <li>List entry</li> </ul>
<ol> <li>Ordered list entry</li> </ol>
<table>
	<tr><th>Entity</th><th>Symbol</th></tr>
	<tr><td>&amp;amp;</td><td>&amp;</td></tr>
	<tr><td>&amp;apos;</td><td>&apos;</td></tr>
	<tr><td>&amp;gt;</td><td>&gt;</td></tr>
	<tr><td>&amp;lt;</td><td>&lt;</td></tr>
</table> </body> </html>


Output:
-------

# H1

## H2

A sentence^W^WParagraph _with_ an _important_ *local* link[0].

alt text[1]

```
	Pre-formatted
		text
```

break
row

* List entry

* 1) Ordered list entry

Entity	Symbol
&amp;	&
&apos;	'
&gt;	>
&lt;	<

=> ./local.gmi [0] local link
=> ./img.png [1] alt text