上图为用 MegaPortal 加载不同的 Stable Diffusion 模型,然后输入关于钢铁侠的提示语生成的图片。
The figure above shows images generated by inputting prompts about Iron Man using different Stable Diffusion models loaded with MegaPortal.
什么是 MegaPortal / What is MegaPortal?
MegaPortal 是一个为苹果设备设计的易用 AI 模型加载工具,下面是我让 ChatGPT 帮我想的三点卖点:
- 易于使用:MegaPortal代码段由基本的、易于使用的可视块组成,可以通过低成本进行配置。
- AI 适用于所有人:MegaPortal 不仅面向 AI 专家和开发人员,也面向非技术用户。它是一个用户友好型工具,可以被任何希望快速测试、利用或分享 AI 模型的人使用。
- 本地隐私优先:所有的 AI 模型都在您的设备上本地运行,所有的输入数据也在本地处理。MegaPortal 不会从您或您的设备中收集任何数据。
详情请访问 MegaPortal 的文档站点:https://docs.getmegaportal.com/ 。同时,MegaPortal 是一个小巧(4.5 MB, gzipped)且免费的应用 ️。
MegaPortal is an easy-to-use AI model loader designed for Apple devices. Here are three key selling points that ChatGPT helped me come up with:
- Easy to Use: MegaPortal snippets are composed of basic, easy-to-use visual blocks that can be configured with minimal effort.
- AI Accessible to All: MegaPortal is intended for use by not just AI experts and developers, but also non-technical users. It is a user-friendly tool that can be utilized by anyone looking to quickly test, utilize, or share AI models.
- Local and Privacy First: All AI Models Run Locally on your device, and all input data is processed locally as well. MegaPortal does not collect any data from you or your device.
For more details: https://docs.getmegaportal.com/
有什么用 / What is it used for?
MegaPortal 是一个文档类型软件,执行文件称为 Snippet,Snippet 由若干个 Block 组成,例如下面是一个 Snippet 的例子:
“MegaPortal is a document-type software, and its executable files are called snippets. Snippets are composed of multiple Blocks, as shown in the following example:
这个 Snippet 实现了一个人脸滤镜,以图片作为输出,先经过 AnimeGANv2 将图片变成 Anime 风格的图片,由于这个模型输出的是 512X512 大小的图片,再通过 SRGAN 超解析模型变成 2048X2048 大小的图片。相关模型介绍:
This snippet implements a filter, generating an image output that goes through the AnimeGANv2 model to transform the image into an anime-style image. Since this model outputs images with a size of 512×512, the SRGAN super-resolution model is then applied to upscale the image to a size of 2048×2048. Here are some brief introductions to the relevant models:
为了进一步介绍 MegaPortal 的用途,下面介绍三个本人配置的 Snippet:
To further illustrate the usage of MegaPortal, below are three Snippets that I have configured:
- 第一个视频演示的是上述人脸滤镜的效果,配置原理已经介绍过;
- 由于本人是原神玩家,第二个视频配置了一个用于推荐原神抽卡的 Snippet。它接受米游社的角色列表图片作为输入,通过 MegaPortal 的 Visual Text Recognition Block[1] 处理后得到图片中的文字信息。接着,再通过一个 Javascript Execution Block[2] 进行处理。这个 JS Block 主要承担相似文字拟合(由于原神生僻字太多,例如“云堇”可能会被识别成“云革”,需要进行拟合)、推荐和表单展示逻辑功能。顺带一提,Javascript Execution Block 是通过 VSCode 插件进行书写的,插件配备类型提示和编译功能。
- 第三个视频则是配置了一个利用 Stable Diffusion 模型来通过文字生成图片的 Snippet。第一个 Block 是 Javascript Execution Block[3],用来展示表单和将表单输入内容传递给第二个 Block。第二个 Block 则加载一个起步为 2G 的 SD 模型来生成图片。需要注意的是,由于 SD 模型较大,第一次运行需要一定的加载时间。同时,虽然上面视频演示的是 iPhone 的运行结果,但实际上,目前我只成功地跑起了一个标准的 SD 模型,其他模型运行时都会崩溃。这也是我在“寻求帮助”部分想要求助的内容。
- The first video demonstrates the effect of the above-mentioned filter, and how it worked has been introduced earlier.
- As a Genshin Impact player, the second video introduced a Snippet for recommending Genshin Impact gacha. It accepts the a picture of character list from miHoYo’s app as input, processes the text information in the image through MegaPortal’s Visual Text Recognition Block[4], and then processes it through a Javascript Execution Block[5]. This JS Block mainly handles similar text fitting (due to the many rare characters in Genshin Impact, such as “云堇” may be recognized as “云革”, which needs to be fitted), recommendation, and display results. By the way, the Javascript Execution Block is written through a VSCode plug-in, which is equipped with type prompts and compilation functions.
- The third video demonstrates a Snippet configured to generate images from text using the Stable Diffusion model. The first Block is a Javascript Execution Block[6], which is used to display the form and pass the input content to the second Block. The second Block loads an SD model sized as at least 2G to generate images. It should be noted that, due to the large size of the SD model, it takes some time to load when it is first run. Also, although the video above demonstrates the results of running on an iPhone, in reality, I have only been able to successfully run a standard SD model, and other models crash during runtime. This is also what I am seeking help with in the “Seeking Help” section.
在应用的 Snippet Center 中可以挖掘更多的有趣的 Snippet,你也可以配置自己的 Snippet,通过分享 Snippet 文件或者发布到 Snippet Center 中供其他用户下载。
In the Snippet Center of the application, you can discover more interesting Snippets, and you can also configure your own Snippets to share the Snippet file or publish it to the Snippet Center for other users to download.
使用教程 / How to use it
本环节主要以配置一个 Stable Diffusion 的 Snippet 为例,向大家讲解 MegaPortal 的使用方法。
In this section, we will mainly use configuring a Stable Diffusion Snippet as an example to introduce the usage of MegaPortal.
系统要求 / System Requirement
为了使用 MegaPortal 的全部功能(特指 Stable Diffusion),请将设备至少升级至 iOS/iPadOS 16.2, macOS 13.1.
To use all the features of MegaPortal, especially Stable Diffusion, please upgrade your device to at least iOS/iPadOS 16.2 and macOS 13.1.
下载 / Download
请访问 https://www.getmegaportal.com/ 下载最新版本,目前 iOS 和 macOS 版本都可下载.
Please visit https://www.getmegaportal.com/ to download the latest version. Currently, both iOS and macOS versions are available for download.
配置一个 AI 生成图片 Snippet / Configure an AI image Generation Snippet
下面将在 macOS 下为例,讲述配置一个 Snippet 的过程。
The following will describe the process of configuring a Snippet on macOS.
第一步,打开 MegaPortal,点击新建文档,创建一个空文件。然后,下载一个 Stable Diffusion 的 Snippet 作为模板配置,然后双击打开这个文件:
Step 1: Open MegaPortal and click “New Document” to create a new file. Then, download a Stable Diffusion Snippet as a template for configuration, and double-click to open the file.
由于模型中自带 Stable Diffusion 模型,打开后程序会自动开始下载相关模型,模型下载需要一定的时间。如果网络不好,请先关闭程序,在下面「Stable Diffusion 模型下载」部分用下载工具先下载一个模型。
As the model comes with a prepared Stable Diffusion model, the program will automatically start downloading related models when opened, which may take some time. If the network is not good, please close the program and download a model using a download tool in the “Stable Diffusion Model Download” section below.
当你下载完模型后,可以点开 AI Model Application 的 Block 进行配置,配置完后点「Save」保存,由于之前的下载可能在进行中,目前只能完全关闭 MegaPortal 才能中断下载。
After downloading the model, you can click on the AI Model Application block to configure it. Once configured, click “Save” to save the changes. Since the previous download may still be in progress, you may need to completely close MegaPortal to interrupt the download.
重新打开文件,点「播放」按钮即可运行这个 Snippet:
To run this Snippet, you can reopen the file and click on the “Play” button.
单击右键就可以将图片保存至 Photos 应用。
Right-click on the image and you can save it to the Photos app.
清理缓存 / Clear Cache
运行一段时间之后,应用会缓存很多的模型文件,可以通过下面步骤清理缓存:
- 点击「More」按钮;
- 点击 Configuration 按钮;
- 点击「Local Caches」按钮;
- 点击「Delete All」或者长按 / 右键某一条目进行删除;
To clear the cache of MegaPortal, you can follow these steps:
- Click the “More” button;
- Click the “Configuration” button;
- Click the “Local Caches” button;
- Click “Delete All” or long press/right click on a specific item to delete it.
缘起 / Source of Inspiration
回忆起之所以设计这个软件,我觉得有三个契机:
- 在过去一年时间我有幸接触到了 Standford 的 CS193P[7] 课程,学习了 SwiftUI 的开发,同时由于自己是前端通道的同学,所以同时顺便了解到了 iOS 上的 Web 容器 WKWebView 和 JS Runtime JavascriptCore 的细节使用方法。
- 由于过于一段时间参与了 aPaaS 项目的相关工作,在 aPaaS 项目得到了一些认知,这些认知帮助我能够更好得去抽象一个针对具体领域的效率提高工具的设计。
- 然后最重要的可能是,由于我是原神玩家,在原神中会遇到计算角色强度相关的问题,而计算角色强度需要用户将自己角色面板的数字输入一些小程序 / 网页中才能实现,这个过程比较繁琐,所谓懒惰是第一生产力,所以我就想着可以在手机上写一个程序,能够以视频或者图片作为输入,快速计算出角色强度。而解决这个问题第一步需要解决 OCR 的问题,在查阅和在 iOS 实现 OCR 相关功能的过程中,发现苹果 VNRecognizeTextRequest 乃至整个 AI 应用的相关 API 是一个整体的体系,复用程度比较高,可以稍微整理产品化。
I recall three factors that inspired me to design this software:
- In the past year, I had the opportunity to study SwiftUI development through Stanford’s CS193P course. At the same time, as a front-end engineer, I also learned about the details and usage of WKWebView and JS Runtime JavascriptCore on iOS.
- Due to my involvement in an aPaaS project for some time, I gained some knowledge that helped me better abstract the design of an efficiency improvement tool for a specific domain.
- The most important factor may be that I am a player of Genshin Impact, a game in which calculating the strength of characters involves inputting numbers from the character panel into some programs/websites. This process is quite cumbersome, and。I thought of creating a program on the phone that could quickly calculate the strength of characters using videos or images as input. Solving this problem first requires solving the OCR problem. While researching and implementing OCR-related functionality on iOS, I discovered that Apple’s VNRecognizeTextRequest and the entire AI application-related APIs form a system with a relatively high degree of reuse, which can be productized with some organization.
游戏中的原图
The original image in the game.
Text Recognition Block 得到的效果
The result obtained by the Text Recognition Block.
得到上述的结果后还需要经由一个 Javascript Block 来计算角色强度。
BTW,不道德晒抽卡:
20发大保底出双金,夜兰你是爱我滴
20发出武器
目前进展和后续功能
目前刚添加完 Stable Diffusion 功能,后续有空的话会为 Stable Diffusion 模型应用添加 image2image 功能,也就是下图中的功能:
寻求帮助 / Seeking Help
发这个帖的最重要目的是目前 Stable Diffusion 功能在 iPhone 上稳定运行仍未解决,大部分我转换的模型在 iPhone 上运行都会崩溃,而 MegaPortal 设计之初就是希望能够让用户在 macOS 上开发调试,然后将 Snippet 通过 iCloud 等方式同步到 iPhone 上,在 iPhone 上运行或者分享给别的用户,所以所有模型能够在 iPhone 上运行尤其重要,即使 Stable Diffusion 只能运行在性能较好的 iPhone 上。
同时,由于本人精力相对有限,完全吃透 Stable Diffusion 乃至 Pytorch 生态对于我来说难度非常大,所以希望有 iOS 开发经验和 Pytorch + Stable Diffusion 开发经验的的同学能够帮忙一起来看看这个问题 。
下面附带两组模型文件帮忙定位:
The most important purpose of this post is that the Stable Diffusion feature is still not stable on the iPhone, and most of the models I convert will crash when running on the iPhone. MegaPortal was designed to allow users to develop and debug on macOS, and then sync the Snippet to the iPhone via iCloud or other means to run on the iPhone or share with other users. Therefore, it is particularly important that all models can run on the iPhone, even if Stable Diffusion can only run on higher-performance iPhones.
At the same time, due to my limited energy, it is very difficult for me to fully understand Stable Diffusion and the Pytorch ecosystem, so I hope that fellow developers with experience in iOS development and Pytorch + Stable Diffusion development can help me to identify this issue.
Attached are two sets of model files to help with debugging.
会崩溃的模型 / crashing:
- https://model.getmegaportal.com/classicAnim-v1-einsum_compiled.zip
- https://huggingface.co/nitrosocke/classic-anim-diffusion/blob/main/classicAnim-v1.ckpt
不会崩溃的模型 / not crashing:
Stable Diffusion 模型下载 / Model Download
根据网上的开源 SD 项目,转换好了对应的 CoreML 格式的模型文件,大家可以根据需求下载。
- 由于为了节省空间和带宽,大部分模型中没有带有 Safety Checker,请不要产生 NSFW 内容
- 模型大小为 2G~4G,用公司 VPN 下载的话速度尚可。
The CoreML format models corresponding to the open-source SD project have been converted and are available for download as needed.
- Due to space and bandwidth constraints, most models do not come with a Safety Checker. Please refrain from creating NSFW content .
- The models range in size from 2GB to 4GB and can be downloaded at reasonable speeds using the company VPN.
Ghibli Style / 吉卜力风格
Links / 链接
- Download / 下载:https://model.getmegaportal.com/ghibli-diffusion-v1-einsum_compiled.zip
- Source / 源项目:https://huggingface.co/nitrosocke/Ghibli-Diffusion
Anime Style / 动漫风格
本人最喜欢的一类风格
Links / 链接
- Download / 下载:https://model.getmegaportal.com/8528-diffusion2_split-einsum_compiled.zip
- Source / 源项目:https://huggingface.co/852wa/8528-diffusion
Elden Ring Style / 指环王老头环风格
Links / 链接
- Download / 下载:https://model.getmegaportal.com/eldenRing-v3-pruned-einsum_compiled.zip
- Source / 源项目:https://huggingface.co/nitrosocke/elden-ring-diffusion
Classic Disney Style / 经典迪士尼风格
Links / 链接
- Download / 下载:https://model.getmegaportal.com/classicAnim-v1-einsum_compiled.zip
- Source / 源项目:https://huggingface.co/nitrosocke/classic-anim-diffusion
Redshift Style / RedShift 风格
Links / 链接
- Download / 下载:https://model.getmegaportal.com/redshift-einsum_compiled.zip
- Source / 源项目:https://huggingface.co/nitrosocke/redshift-diffusion
Spideverse Style / 蜘蛛人:新宇宙风格
Links / 链接
- Download / 下载:https://model.getmegaportal.com/spiderverse-v1-pruned-einsum_compiled.zip
- Source / 源项目:https://huggingface.co/nitrosocke/spider-verse-diffusion
Archer Style / 间谍亚契风格
Links / 链接
- Download / 下载:https://model.getmegaportal.com/archer-v1-einsum_compiled.zip
- Source / 源项目:https://huggingface.co/nitrosocke/archer-diffusion
Anime Style / 双城之战风格
Links / 链接
- Download / 下载:https://model.getmegaportal.com/arcane-diffusion-v3-einsum_compiled.zip
- Source / 源项目:https://huggingface.co/nitrosocke/Ghibli-Diffusion
Stable Diffusion 2-1 原始 / Original Stable Diffusion 2-1(iPhone 测试可用)
这个模型也是唯一在我拍的 iPhone 14 Pro Max 上测试可用的模型
Links / 链接
- Download / 下载:https://model.getmegaportal.com/coreml-stable-diffusion-2-1-base_split_einsum_compiled.zip
- Source / 源项目:
- https://huggingface.co/stabilityai/stable-diffusion-2-1
- https://huggingface.co/pcuenq/coreml-stable-diffusion-2-1-base
Midjourney v4
Links / 链接
- Download / 下载:https://model.getmegaportal.com/mdjrny-v4-einsum_compiled.zip
- Source / 源项目:https://huggingface.co/prompthero/openjourney/tree/main
Nitro Diffusion
这是一个多风格支持的模型,支持 archer style, arcane style or modern disney style
Links / 链接
- Download / 下载:https://model.getmegaportal.com/nitroDiffusion-v1-einsum_compiled.zip
- Source / 源项目:https://huggingface.co/nitrosocke/Nitro-Diffusion
致谢 / Thanks!
- 谢谢 ChatGPT 帮助翻译和编程问题的回答;
- 谢谢 Copilot 结对编程(尽管是雇佣关系 );
- 感谢社区各种开源的 AI 模型项目,此处不一一致谢了 。
- “Thank you, ChatGPT, for helping me with the translation and programming questions.”
- “Thank you, Copilot, for pair programming with me (even though it’s a hired relationship ).”
- “I am grateful to various open-source AI model projects in the community. I cannot thank each of them enough .”
参考资料
[1]Visual Text Recognition Block: https://docs.getmegaportal.com/docs/blocks/visual-text-recognition
[2]Javascript Execution Block: https://docs.getmegaportal.com/docs/blocks/javascript-execution
[3]Javascript Execution Block: https://docs.getmegaportal.com/docs/blocks/javascript-execution
[4]Visual Text Recognition Block: https://docs.getmegaportal.com/docs/blocks/visual-text-recognition
[5]Javascript Execution Block: https://docs.getmegaportal.com/docs/blocks/javascript-execution
[6]Javascript Execution Block: https://docs.getmegaportal.com/docs/blocks/javascript-execution
[7]CS193P: https://cs193p.sites.stanford.edu/